Science topic
Longitudinal Data Analysis - Science topic
Explore the latest questions and answers in Longitudinal Data Analysis, and find Longitudinal Data Analysis experts.
Questions related to Longitudinal Data Analysis
I am currently doing a study where I try to model the theory of planned behavior via structural equation modeling. I have longitudinal data (4 waves) with items assessing behavior so I want to include a cross-lagged effect from intention in one wave on behavior in the following wave. My problem is that I don't know if I should model the effects of attitude, subjective norm, and perceived behavioral control (predictors) on intention within the same wave or if these predictors should have an effect on intention in the following wave. I'm leaning towards the first option since it makes sense to me that the effects of the predictors on intention would be immediate (e.g. if I already have a positive attitude towards a behavior it wouldn't take until the next wave to have the intention to do it). Unfortunately, I couldn't find any information on whether the effects of the predictors on intention are immediate or not. The original papers by Ajzen didn't give me a concrete answers and almost all of the longitudinal studies I looked at model the predictors and intention within the same wave without explicitly justifying an immediate effect.
Does anyone have an idea how an immediate effect could be justified or can cite useful papers on this topic?
Any help is greatly appreciated!
Hello, I am trying to evaluate the group differences of the change from baseline in an EEG index (fMMN_amplitude), using the Linear Mixed Models process. I entered Cluster (three groups), time (two visits), Cluster × time interaction as fixed effects, baseline MMN_amplitude value as a covariate, and participant as a random effect for the intercept. I used EMMEANS to obtain the change from baseline of MMN_amplitude in each Cluster. Picture 1 is the command lines I used.
However, I could not find a way to compare the change values between groups. Can someone please let me know if there is a way to acquire such group differences and the effect sizes, just like the mean difference versus placebo in the Picture 2 (from DOI: 10.1016/S2215-0366(20)30513-7). It's like computing [(g2t2 - g2t1) - (g1t2 - g1t1)]. SPSS command lines will be best, GUI operations are also fine. Any help will be appreciated!!
Sincerely,
Greatson Wu


I have a longitudinal model and the stability coefficients for one construct change dramatically from the first and second time point (.04) to the second and third time point (.89). I have offered a theoretical explanation for why this occurs, but have been asked about potential model bias.
Why would this indicate model bias? (A link to research would be helpful).
How can I determine whether the model is biased or not? (A link to research would be helpful).
Thanks!
Hi all,
I am looking for some statistic methods to analyse our data. We asked participants an open-ended question about their present top three worries and we collected this data every month for five times. We also collected their demographic information and some other continuous variables, such as loneliness or mental health.
We manually categorised the worries into five types (so it can be treated as a categorical variable) and the potential research questions we have for now include:
-whether these worries change over time
-whether there are subgroups of the trajectory and what could predict the memberships
-what are the dynamic relationships between other continuous variables and the worries
I was wondering what statistical analyses we could do to answer any of the above questions? Thank you!
I have 15 treatments. My main interest is to find the best treatment. The response is measured every day up to 30 days. My model has an interaction effect between time and treatments. I use suitable effect size, but I need power of 80% with type I error 5%. How can I calculate sample size by simulation?
Dear all,
I have a questionnaire with 20 questions. Average score of question 1,6,7,9 (say FACTOR-1) and 2,3,5,8,10 (say FACTOR-2) are taken. The two factors are not normally distributed.
What non-parametric techniques mixed effect models are possible for this situation?
Waiting for your response
Thank you
I am in the process of doing a meta-analysis on longitudinal data and I'm running into an issue when trying to calculate the effect sizes for paired samples. The data I have available are the means, SD's and sample size with no access to raw data. Wondering if anyone knows that best way to go about this without the use of raw data.
Any help would be greatly appreciated!
I have a SEM model (with 9 psychological and/or physical activity latent variables) with cross-sectional data in which, guided by theory, different predictor and mediator variables are related to each other to explain a final outcome variable. After verifying the good fit of the model (and after being published), I would like to replicate such a model on the same sample, but with observations for those variables already taken after 2 and after 5 years. My interest is in the quasi-causal relationships between variables (also in directionality), rather than in the stability/change of the constructs. Would it be appropriate to test an identical model in which only the predictor exogenous variables are included at T1, the mediator variables at T2 and the outcome variable at T3? I have found few articles with this approach. Or, is it preferable to use another model, such as an autoregressive cross-lagged (ACL) model despite the high number of latent variables? The overall sample is 600 participants, but only 300 have complete data for each time point, so perhaps this ACL model is too complex for this sample size (especially if I include indicator-specific factors, second-order autoregressive effects, etc.).
Thank you very very much in advance!!
Hello everyone,
I am trying to run rmcorr (repeated-measures correlations) and linear mixed models on intervention (Pre-Post) data.
However, I have come across several variables which the residuals severely violate the assumption of normality due to outliers.
I was wondering if it is a valid approach to examine the distribution across both Pre and Post data together as one to transform and/or trim for a "more" normal distribution. Or would I have to examine the Pre data and Post data separately?
Thank you!
I am interested in examining whether my protein of interest is associated with cognitive decline over a period of time. In our study, participants were followed longitudinally for 8 years. At each visit (Baseline, 3 months, 12m, 24m, 36m,48m,60m,72m, 84m) participants underwent cognitive testing. Test scores are treated as continuous, repeated-measures. However, protein levels were only measured once at baseline. Therefore my independent variable would be my protein whilst cognitive test scores would be my dependent variable. However, I would like to control for covariates/ confounders such as age, gender and years of education. Finally, some of the participants missed cognitive testing at certain months – I hope this won’t affect the analyses. I spoke with a statistician and he recommended a linear mixed-effects model, however I am new to this type of modelling. I will use SPSS (V23) to run my analyses, however I have a few questions:
- Overall, is this plan feasible?
- In SPSS I have to select subjects and repeated variables in the first screen (attached picture), I assume this would just be each participant’s ID and the visit variable (time in months)?
- What would I consider for “fixed” and “random” effects?


In the frame of a longitudinal data analysis I need to write an R-code about mixed models. I know that there are many different names (mixed models, mixed-effect models, random effects,...) and would need to get first an overview helping me to choose the right one (in collaboration with a statistician) and finally to write the code.
Has anyone a recommendation?
Kind regards
Anne-Marie
What statistical test should I use to investigate how changes in variables A (continuous; defined as A at Time2- A at Time1) affect changes in Variables B (continuous; defined as B at Time2 - B at Time1) using longitudinal data? I really appreciate your help.
I want to learn how to handle and analyse panel data and I am looking for references (handbooks, articles or any other material, but I prefer a good handbook). Preferably using STATA language and focus on social sciences applications. I look forward to your suggestions. Thanks beforehand.
Hello!
We have a question about implementing the ‘Mundlak’ approach in a multilevel (3-levels) nested hierarchical model. We have employees (level 1), nested within year-cohorts (level 2), nested within firms (level 3).
In terms of data structure, the dependent variable is employee satisfaction (ordinal measure) at employee level (i) over time (t) and across firms (j) (let’s call this Y_itj), noting that we have repeated cross sections with different individuals observed in every period, while as regressors, we are mainly interested in the impact of a firm level time-variant, but employee invariant, variable (let’s call it X_tj). We apply a 3-level ordered profit model (meoprobit in Stata).
We are concerned with endogeneity issues of the X_tj variable, which we hope to (at least partially) resolve by using some form of a Mundlak approach, by including firm specific averages for X_tj, as well as for all other time-varying explanatory variables. The idea is that if the firm-specific averages are added as additional control variables, then the coefficients of the original variables represent the ‘within effect’, i.e. how changing X_tj affects Y_itj (employee satisfaction).
However, we are not sure whether approach 1 or 2 below is more appropriate, because X_tj is a level 2 (firm level) variable.
1. The firm specific averages of X_tj (as well as other explanatory variables measured at level 2) need to be calculated by averaging over individuals, even though the variable itself is a Level 2 variable (varies only over time for each firm). That is, in Stata: bysort firm_id: egen mean_X= mean(X). As our data set is unbalanced (so the number of observations for each firm varies over time), these means are driven by the time periods with more observations. For example, in a 2-period model, if a company has a lot of employee reviews in t=1 but very few in t=2, the observations in t=1 will dominate this mean.
2. Alternatively, as the X_tj variable is a level 2 variable, the firm specific averages need to be calculated by averaging over time periods. That is: we first create a tag that is one only for the first observation in the sample per firm/year, and then do: bysort firm_id: egen mean_X= mean(X) if tag==1. This gives equal weight to each time period, irrespective of how many employee-level observations we have in that period. For example, although a company has a lot of employee reviews in t=1 and very few in t=2, the firm specific mean will treat the two periods as equally important.
The two means are different, and we are unsure which approach is the correct one (and which mean is the ‘true’ contextual effect of X_tj on Y_itj). We have been unable to locate in the literature a detailed treatment of the issue for 3-level models (as opposed to 2-level models where the situation is straightforward). Any advice/suggestions on the above would be very much appreciated.
I have questions regarding measurement invariance for longitudinal data when analyzing latent variables. I would like to analyze 4 cohorts (grades 3,4,5,6 at baseline) longitudinally over 2 years with four assessments.
1. If I consider the four cohorts as one sample, should I then show measurement invariance (configural invariance, metric invariance, scalar invariance and strict invariance) between the cohorts for each of the four assessments. Or would it be more appropriate to evaluate each cohort separately? As an analysis method I would like to calculate random-intercept cross-lagged panel models.
2. Is it necessary to test for configural invariance, metric invariance, scalar invariance as well as strict invariance between the four assessments for the latent variables? Or is it also sufficient under certain conditions if only configural invariance is given?
Hello! I am trying to figure out how to run an analysis for a longitudinal moderated mediation. I will be collecting data twice from the same individuals in a mediation study while employing a first stage moderator. I have found a few papers that have used path modeling in R or SPSS, but I am wondering if this is the best way to go about this. Thank you!
Hello,
I plan to perform a linear mixed model analysis to look at change in cognitive function over 5 follow up waves. I am interested in looking at how this change is influenced by dietary and urinary sodium as well as other covariates like age, gender, blood pressure etc. Dietary and urinary sodium have only been collected at baseline, whilst other covariates have been collected at baseline and at each follow up wave.
I am aware of how to restructure the data set to the "long" format, creating an index variable for Time. However, as mentioned, certain variables have only been collected at baseline and are important as I need to include them in the data set in order to exclude cases. For example, I would like to exclude anyone who has a diagnosis of dementia at baseline and the corresponding variable only has one 1 timepoint (data collected at baseline).
I am looking for advice on the following:
1) How to restructure the data set so these kind of variables fit & make sense within the long format?
2) Then, how can I select cases based only on certain baseline variables and then perform the analysis on the remaining cases, across all waves?
3) Are LMM's the best approach here? Or would you suggest an alternative method.
Thanks.
Hello,
Often when I read multilevel model papers, when talking about time-varying covariates, they often only use binary time-varying variables.
If I was to do a cross product between the time variable, and a ordinal/categorical time-varying covariate with several catgeories, say 5, might this over complicate the model? As in, when I have added a time-varying covariate with multiple categories, I see that the BIC will go up, the AIC will go down, does this mean that the model is 'worse'?
Is it better to try and collapse a larger categorical time-varying covariate into just a binary category, to allow for easier interpretation? If using a timevarying variable with mutliple groups is accepted practice (Because I don't see it often), how should i determine if it is a good predictor? Is there any papers which might help? I can't see much in the literature relating to this.
Let me know your thoughts
Hi everyone!
I'm doing a PhD in Clinical Psychology and I have some treatment data to analyse. The design is a 2 (condition: self-compassion, cognitive restructuring) x 5 (time: baseline, mid-treatment, post-treatment, 1-week follow-up, and 5-week follow-up) design. I had 119 participants randomized and engage in their respective interventions for a 2-week period, with follow-up assessments. The aim was to reduce social anxiety.
One analysis I'm trying to do is mediation, and preferably I would use a more simple strategy such as Hayes' PROCESS macro on SPSS. However, my understanding is that I won't be able to use all five waves of my data if I use PROCESS. Does anyone know if that is an appropriate strategy for multiple waves? Should I be using all information? And if so, how?
I'm trying to assess severity of symptoms among GI patients pre and post intervention over time (post 12+ months). Dataset provided collected 1000 patients at 7 time-points (pre surgery, 0 - 1 mo., 2 -3 mo., 4 - 5 mo., 6 - 8 mo., 9 - 11 mo, 12 + mo.) with 767 unique patients (who responded only once). Total N for each time point vary between 80 - 185. Data collected used a survey which total score ranged between 0 - 50. scores were then categorized in 3 groups: none, moderate, and severe. Was asked to conduct a group x time interaction. But unsure how do it with unequal time intervals, different participants for each time point, and data not normally distributed. When assessing proportions per group results for moderate and severe show a "U" shape. Only additional data was given was gender and age.
Currently, I've only been able to do chi-square, or fisher as needed. But would like to do additional stats tests and would like ask what would be the best stats to do?
I have limited stats experience and have assess to Prism GraphPad. Any recommendations how to best review the data would be extremely helpful.
Thank you.
I have collected data using questionnaires from athletes at two time points. I have an independent variable (leadership) and a few dependent variables but also some mediating or indirect variables such as trust ect. All of the variables were measured using the same questionnaire at two time points.
What data analysis method would be best to analyse the data? So far I have used the Process macro on SPSS which uses OLS regressions but I am unsure if this is the best method.
I essentially want to see how the IV relates/ increases the dependant variables over time and whether this changed occurred directly from the IV and indirectly through the mediators.
Would these be appropriate research questions for the type of data I have and for the appropriate analysis technique?
Hello to all of you,
I want to make an intervention at work, and I wish to measure the effects of such intervention.
I intend to introduce a new goal management process: I would like to analyze whether the goal monitoring frequency affects the goal attainment.
My intention is to make a survey to collect data before and after the intervention. The data that I would collect would consist of:
- Individual data points (to avoid repetition and to pair with second survey).
- Independent variable: the goal monitoring frequency.
- Dependent variables: knowledge of goals, performance indicators.
Although the idea seems pretty straightforward, I've been researching without much success regarding the methodology of the analysis.
As I understood it, this can be considered a lingitudinal study, since I will ask the same questions to the same individuals in two different points in time.
I would like to have some guidance on how to perform the analysis, regarding the methodology and the requirenments thereof.
Thank you for your guidance!
I have collected data using questionnaires from athletes at two time points. I have an independent variable (leadership) and a few dependent variables but also some mediating or indirect variables such as trust ect. All of the variables were measured using the same questionnaire at two time points.
What data analysis method would be best to analyse the data? So far I have used the Process macro on SPSS which uses OLS regressions but I am unsure if this is the best method.
I essentially want to see how the IV relates/ increases the dependant variables over time and whether this changed occurred directly from the IV and indirectly through the mediators. Would these be appropriate research questions for the type of data I have and for the appropriate analysis technique?
I have a dataset comprising clinical characteristics and glycomics profiles of disease samples (n=100+) and matched healthy controls. So far, I have been able to do classification modeling to discriminate between disease vs healthy using the biomolecular profiles.
On top of that, I have glycomics profiles collected across 2 more time points (total 3 time points - > timepoint 1 = time of discharge, timepoint 2 = 1 month follow-up, timepoint 3 = 6 months follow-up) from the disease patients. However, the healthy controls only have their glycan profiles measured at 1 time point (this is because no follow-up assessments and blood taking were done for the healthy controls).
My question is this: What kind of statistical analysis can I perform to draw meaningful insights on how the glycan profiles change across the 3 timepoints? I was originally thinking of survival analysis, but only a handful of patients out of the 100+ samples had adverse outcomes. So I question the applicability of that. Other than that, are univariate or multivariate tests to determine significant differences in the biomolecular profiles between each time point the only thing I can do?
I apologise for the lengthy question and appreciate any advice given!
In a longitudinal study on the consequences of electronic health record implementation on healthcare professionals work motivation, we aim to test a first stage moderated mediation model, using GEE in SPSS. We want to follow the method described by Hayes (2015; index of moderated mediation). Although there is literature on using GEE for mediated effects (Schluchter, 2008), I was not able to find any (research) papers on using GEE for this specific purpose. Does anyone have experience with moderated mediation using GEE or know whether this is an appropriate method?
We have the following variables: (X = Time: before versus after EHR implementation), two mediators (M = job autonomy, task interdependence), one outcome variable (autonomous motivation) and one moderator (W = profession, with four subcategories). We anticipate that profession has a moderating effect on the relationship between time and the mediators.
Most of recent books in longitudinal data analysis I have come through have mentioned the issue of unbalanced data but actually did not present a solution for it. Take for example:
- Hoffman, L. (2015). Longitudinal analysis: modeling within-person fluctuation and change (1 Edition). New York, NY: Routledge.
- Liu, X. (2015). Methods and applications of longitudinal data analysis. Elsevier.
Unbalanced measurements in longitudinal data occurs when participants of a study are not measured at the exact same points of time. We gathered big, complex and unbalanced data. Data comes from arousal level which is measured every minute (automatically) for a group of students while engaging in learning activities. Students were asked to report on what they felt while in the activities. Considering that not all students were participating in similar activities in the same time and not all of them were active in reporting their feelings, we end up with unstructured and uncontrolled data which does not reflect a systematic and regular longitudinal data. Add to this issue the complexity of the arousal level itself. Most of longitudinal data analysis assume the linearity (the outcome variable changes positively/negatively with the predictors). Clearly that does not apply to our case, since the arousal level fluctuates over time.
My questions:
Can you please specify a useful resource (e.g., book, article, forum of experts) to analysis unbalanced panel data?
Do you have yourself any idea on how one can handle unbalanced data analysis?
Hi All
My dataset is longitudinal in the long format, and each individual has 5 rows of data with each row representing one wave (i.e. total 5 waves).
I just want to create a variable called year which can take on the same 5 values: 1982, 1989, 1999, 2009 & 2015 for subjects and is understood by Stata to be in calendar years. I will use this year variable to help create other variables like duration and age etc.
I would like the year variable to look this:
id year
1 1982
1 1989
1 1999
1 2009
1 2015
2 1982
2 1989
2 1999
2 2009
2 2015
3 1982
3 1989
3 1999
3 2009
3 2015
Any ideas how to generate the above year variable in Stata date language?
Many Thanks
/Amal
Hi everyone,
I would like to know how do you conduct EDA on a longitudinal study. For example, If I want to study the impact of pred1 (categorical - 3 levels) and pred2 (continuous) on the response (cont) variable which are collected at 4 different time points (equally spaced interval). Also, consider that I have covariates (age, BMI).
All covariates, pred1, and pred2 are time invarying variables.
Primary hypotheses would be something like-
- Pred2 over 4 time points positively effects response.
- Pred1 levels shows statistically different trajectory for response
- level1 response trajectory is better than the others. (better means- higher response trajectory)
*There are missing values in the repeated responses.
Thank you.
Hi All
I'm about to impute missing data in a longitudinal dataset that follows a cohort of subjects over 50 years with data collected at 5 time points (5 waves).
I use Stata for data analysis and I've previously analysed longitudinal data in the 'long format'. This is relatively easy & straight forward in Stata.
However, it is recommended to impute longitudinal data in the 'wide format' and then reshape the data back to long format for analysis. I find this rather cumbersome, as it requires a fair bit of data prep. Imputing in Stata add 'prefixes' to all imputed variables (for example) which can make re-shaping the data difficult.
1. Is it possible to impute longitudinal data in the long format? pros & cons?
2. Is there an alternative or easier strategy when imputing longitudinal data using Stata?
Thoughts, ideas, recommendations welcome!
Thanks
/Amal
I've been reading up on grade of membership models and they may be promising for some analysis I would like to do, but seem to be mostly for categorical data.
I've heard of mixed and partial membership models as well but generally haven't found a resource that explains many of these methods in a way that I can use (i.e. massive formulas that I don't know how to apply).
If anyone could direct me to a useful resource I would appreciate it!
Hello!
I have a working hypothesis that I am not sure is possible to accurately measure.
Currently, the EDI (Early Development Index) measures whether or not children ages 4 to 6 are considered "developmentally at risk" for specific domains. My hypothesis is that a low score (i.e., a score with indicates being developmentally at risk) in the COMMUNICATIONS SKILLS AND GENERAL KNOWLEDGE or EMOTIONAL MATURITY sub scales could predict a later in life (i.e., age 6 to 12) difficulties with executive functioning (which are typically measured with a WISC test or the Behavior Rating Inventory of Executive Functions (BRIEF), the Child Behavior Checklist (CBLC), and the Behavior Assessment System for Children (BASC).
All of these measures a reliable, valid, and research based for measuring their designed construct...the EDI has not yet been linked to executive functioning (predictive or otherwise) because children ages 4 to 6 are simply too young for that to be measured reliably due to brain development.
Given that the age 4 to 6 metric is entirely different than the age 6 to 12 metrics, what statistically tests do I need to run in order to say with statistically significant confidence that low performance on the EDI may be predictive of later in childhood difficulties with executive functioning?
I use your stata command traj to find the group-based trajectories. Usually, the comparison of BIC values from 1 to x groups leads to the decision to select the model with optimal groups.This command gives the optimal number of groups. I have read the help file for many times but i cannot find the way to get BIC values for different numbers of groups.
Does anyone use stata command traj to conduct group-based trajectory modeling? And how can i get the BIC values from models with different number of groups?
I have check the SAS codes. They include an option that I can set the number of groups. However, I have not installed SAS software in my computer.
Thank you!
Yimin
I collect data of 21 participants daily over 14 weeks (98 measurement points per participant/physiological data). Each participant is part of an intervention group. All interventions have the possibility to improve the dependent variable. I assume that there was no improvement per day but perhaps per week (I know that it is not perfect because all interventions had the possibility to improve the dependent variable). My questions are:
- should I combine the daily measurements into weekly measurements?
- should I use the time variable (measurement or week) as continuous or as a factor?
- Is this a possible code (here using the lmer function)?
model1_fit <- lmer(formula = dependent variable ~ daily measurement/week+intervention+daily measurement/week:intervention+ (1|id),
data=data,
na.action=na.exclude)
summary(model1_fit)
- how should i interpret the interaction when using the time variable as a factor (assuming that's the better choice)?
Thanks for your help.
I have read in the literature LMM/GLMM is preferred to repeated measures methodology. I would like to know the specific advantages of LMM/GLMM over repeated measures with citations. TIA
I have longitudinal data with 2 follow ups at year 1 and 3. At baseline we have measure of endothelial function and we want to calculate the relation ship between endothelial function and diabetes type II. at follow up, the incidence of diabetes is recorded with self reported diagnosis or use of medication as the follow up was only done through interviews and questionnaire. we dont have the exact date of incidence of diabetes. which analysis method will be best to use for this type of data. survival analysis or mixed model? I understand for survival analysis I would need exact date of diagnosis of diabetes. but is there other assumptions that can make it possible to use it?
I have assessed (Trait measure of attribution - for positive and negative situations ) for 85 sports persons. They all went on to play a competitive match. After the match (T1), they filled a state attribution questionnaire based on the result of the match (win/lose). Total number of winners were 51 and 34 lost their match. They all went on to play their next game (T2) and filled the state attribution scale, based on the result of the game (Win/lose) again. From the 51, who won their match at (T1), only 35 won and 16 lost the second game. From the 34, who lost their game at (T1), only 16 won their second game and 18 lost again.
I want to see whether attributions remain same? (trait-state-state)
whether one behaves like their trait attributions?
Any help regarding statistical approach would be appreciated.
Question edited:for clarity:
My study is an observational two-wavel panels study involving one group samples with different levels of baseline pre-outcome measures.
There are three outcome measurements that will be measured two times (pre-rest and post-rest):
1. Subjective fatigue level (measured by visual analog score - continous numerical data)
2. Work engagement level (measured by Likert scale - ordinal data)
3. Objective fatigue level (mean reaction time in miliseconds - continous numberical data)
The independant variables consist of different type of data i.e. continous numerical (age, hours, etc), categorical (yes/no, role, etc) and ordinal type (likert scale).
To represent the concept of recovery i.e. unwinding of initial fatigue level, i decided to measure recovery by substracting pre-measure with post-measure for each outcome, and the score differences are operationally defined as recovery level (subjective recovery level, objective recovery level and engagement recovery level).
I would like to determine whether the independant variables would significantly predict each outcome (subjective fatigue, work engagement and objective fatigue).
Currently i am thinking of these statistical strategies. Kindly comments on these strategies whether they are appropriate.
1. Multiple linear regression, however one outcome measure i.e. work engagement is ordinal data.
2. Hierarchical regression or hierarchical linear modelling or multilevel modelling, but i am not quite familiar with the concept, assumption or other aspect of these method.
3. I would consider reading on beta regression (sorry, this is my first time reading on this method).
4. Structural Equation Modelling.
- Can the 3 different type of fatigue measurement act as inidcator to measure an outcome latent construct of Fatigue?
- Can the independant variables consist of mix type of continous, categorical and ordinal type of data
Thanks for your kind assistance.
Regards,
Fadhli
Hello readers
I am studying attributions in sports. I have a trait like measure which assesses attributions from six dimensions (Internal-External, Stable-Unstable, Global-Specific, Controllability, Intentionality). I also have state measure for attributions, which i assessed after two performances t1 and t2.
I want to know/test, people who are optimistic (trait) remain optimistic (state) irrespective of the result of t1 and t2 and people who are pessimistic remain pessimistic.
Can any one help me with the suitable analysis.
Any help would be appreciated
For example, I calculate a firm's cash ratio by divided cash / total asset, but both two variables is nominal, is the ratio comparable over years?
I currently have a micro (5 years) panel data of house price transaction. There are a total of 800 different high rise properties (name of apartment/condo) and 30000 transactions across the 5 years. However, the data is really unbalanced as residential properties have different numbers of transaction in a year. Also at times, a particular year does not have transaction of a particular property.
I am unsure as how I should proceed with the analysis. Should I code the property names before running the analysis or should I aggregate the values and independent variables by district and recode the districts instead?
I am running a longitudinal study, where I collect data at 8 time points using many different questionnaires. Some questionnaires are measured at T1, T3, T5, and T7; other at different time points (T2, T4, T6, T8).
I tried mixed model in SPSS, but every time I did not get any results because of the missing data
Many thanks
I have a genome wide methylation (IlluminaEPIC methylation array) data through multiple time points, from individuals classified into two distinct clinical outcome groups. I want to see the variation of a specific probe across different time points of a single individual (intraclass) and also between individuals (interclass).
Need assistance in determining the lagged effects in a time series of four weeks using R. I need to determine multilevel estimates of the models predicting a lagged effect. What is the syntax that would do this?
in the analysis of shear lag of prestressed concrete box girder based on the energy variational method, there exists two viewpoints of the assumption of the flange longitudinal displacement :
1)U(x,y)=hi[w'(x)+f(x)u(x)];
2)U(x,y)=hi[-w'(x)+f(x)u(x)];
my question is which is right ?
My exposure is a continuous variable that has been measured at 9 follow-up examinations in each participant.
My outcome is a also a continuous variable that has been measured only once: at the end of study.
I would like to test whether changes in my exposure over time is related to the outcome.
What are the available statistical tests to evaluate such relationship?
I have used mixed model previously to test the association between an exposure at time x and change in outcome over time. In this case, the outcome was measured multiple times over time while the exposure was measured only at the beginning of study. I wonder if mixed model would work for modeling the changes in exposure over time as well? or are there better statistical tests to answer my question?
Thank you
I'm looking at the impact of a policy change on crime rates, but there are two policy interventions over the time span I'm studying. Would it be appropriate to do a time series analysis with two interventions, or would an ANOVA be better?
I would like to conduct multiple imputation of missing values in a 3-wave dataset, however, the percentage of cases with missing values is high - approximately 70%. Despite the high proportion, a lot of cases comprise missing vase only for one wave. Is there a solution to this situation or are these data suitable only for complete cases analysis?
I have a medical longitudinal retrospective dataset, records between the observation period of 2000 and end 2016. For many reasons not every medical record spans that entire time-frame, e.g. the patient may have died, or they may have transferred in to the study half way through or transferred out at some stage.
A particular event (or exposure) is seen as a clinical event e.g., going to the doctor and saying or being told that you have a particular disease, e.g., a chest infection. That patient will also have a categorical variable to indicate whether they are a smoker or not.
I wish to count the frequency of chest infections per patient and distribute them over whether they smoke or not. I can imagine this would be a box plot with UQ and LQ being defined, frequency of disease on the Y, and a Smoke YES and NO on the X. This would be very easy to do. The problem I have though is that I am not sure how I deal with medical records of varying length. Surely there is bias if a smoker vs. non-smoker both have twenty chest infections, but there is a four year medical record difference?
Thanks
Brief background: I’m examining mediation rates in China. I have a panel dataset with N=24 provinces and T=30 years (1985-2014). For each province-year, I observe mediation rates and host of economic/demographic information.
Anecdotal reports suggest that in 2006 or soon thereafter, the Chinese government began bolstering its mediation system and encouraging its use. My goal is to test this assertion. Unfortunately, I’m unable to quantify the “effort” the government exerts in promoting mediation.
What is the most convincing way to test this assertion? My ideas are listed below. Please evaluate. For all ideas, assume that I am regressing mediation rates on variables thought to influence mediation rates.
- Include a time polynomial (such as year and year^2) in the regression. If year is negative and year^2 is positive, this is evidence of a parabolic trend, even after controlling for other factors. Proceed by determining whether the minimum of the parabola is around year 2006.
- Include a lagged dependent variable in the regression, that is, a lag of the mediation rate. Perhaps the best measure of “effort” is the previous year’s mediation rate. After controlling for other factors, I can determine whether this effort proxy is positive and significant.
- Include year fixed effects. After these effects are estimated, graph their magnitudes against time. Progressively increasing year fixed effects after 2006 would indicate more effort.
- Include a linear time trend with a kink at 2006. Determine whether the post 2006 trend is significantly larger than the pre 2006 trend. Unlike the other ideas, this one assumes that we know where the change occurs.
Is there any reason to use a combination of these ideas? Do you have other ideas? Please suggest. Thanks for your help!
dear all,
I have a dataset with almost 2 millions observation nested within (European) countries. My DV is probability of weekly religious practice.
I want to disentangle between Age, Period and Cohort effects and there is the weel-known identification problem.
Given that I have so many observations and a quite wide time span (year from 1970 to 2015, cohort from 1900 to 2000, age from 15 to 100) what is the best strategy to apply?
I know this is a very broad question and that there is a huge debate behind but I really need to collect some opinions about this.
Thanks in advance!
Francesco Molteni
I have longitudinal data (repeated measures on each subject over 't' time periods) with a binary response variable & various continuous/categorical covariates. I wish to build a forecasting model that tells the outcome for the time ahead: t+1, t+2... etc, while simultaneously regressing on the predictors, until time t.
I want my model to use the information from the covariates at present time t, to forecast the response for the time ahead.
I believe that my model will predict the outcome with a probability associated with it, something like a Markov model + regression, that gives the state transition probability, also taking into consideration the covariates that affect the state.
Any help on how to structure the problem and/or implement it in R/SAS will be helpful.
For example, would 2 bursts with 5 measurements in each suffice?
Respected Researchers
i have panel data of 1252 firm years observations with 182 firms and time period is 2010-2016. i have 14 independent variables including 3 control variables, 1 mediator and 1 dependent variable. i want to use STATA for testing direct and mediation models. i have 11 direct hypotheses from independent variables to dependent variables and 11 mediation hypotheses.
Data: Unbalanced panel
QUESTIONS:
1) which test should be implemented other than the selection between fixed and random effect?
2) how mediation analysis can be performed?
I am trying to model the longitudinal APIM for categorical dyadic data where the dyad members (members of 2-Person-Households) are not empirically distinguishable regarding the outcome variable. Extensive research for literature on such analysis was unsuccessful but I am hoping that someone has written something about it and I have just not been able to find it. Any suggestion or advice is well appreciated!
I am trying to estimate a 2-2-1 mediation model where the IV is a level 2, the moderator is a level 2 and the dv is a level 1 variable.
There is some variation in practices and it seems that Preacher et al (2010) suggest using a MSEM approach that is available in MPLUS. Any idea whether this is doable in STATA 14? And any suggestion in terms of the code (ml_mediation or xtmixed, mle)?
Much appreciated.
Amedeo
I'm currently preparing for initial data analysis and I am having problems merging my data files for longitudinal data analysis. Please how can I go about this using SPSS?
Dear all,
I am looking for a way to use (Bayesian) model averaging in a multi-level context?
With which software could this be achieved? MLWin? MPlus? Do you have suggestions for references?
Thanks a lot in advance!
Best
I am struggling with structural equation modeling/confirmatory factor analysis in SAS using the PROC CALIS procedure, which I'm new to using. My goal is to create different factors of neighborhood conditions based on variables such as SES measures from the census, crime indices, green space, physical conditions (cleanliness, noise), and social environment. The problem is the model fit is not good and I'm not sure how to figure out covariance structures among the factors or variables. If anyone can give me some advice or direct me to a good resource, I would appreciate it. Thanks!
Hi,
Imagine you have measured two variables X and Y at two points in time. You want to predict Y2 from X1 and also control for the autoregressive effect (= temporal stability) of Y1 and control for the correlation of X1 and Y1. My question: Is there any statistical reason that makes it necessary and/or advantageous to implement a full CLP, that is to include the pathes X1 -> X2, Y1 -> X2, and the correlation X2 <-> Y2?
I am not interested in reciprocal relations between X and Y. I just wonder whether including these additional pathes has an impact on my path of interest (X1 -> Y2) and if so, why? I'd also be glad if oyu could provide a reference.
I want to estimate dynamic factor model in STATA software. The problem is that my dependent variable has quarterly frequency and the independent variables (which I want to merge into the latent variable - factor) have monthly frequency. I set the dependent variable to be known every third month of the given quarter but STATA gives me error that there are missing observations in the dependent variable. Is there any solution how to estimate DFM with mixed-frequency data in STATA? Thank you.
I have daily data from Jan/1/2008 to Jan/1/2012 i would like to create dummy variable for the whole period after a specific date that is after March 2011, in addition i would like to create another dummy variable for the period from March 2011 to June 2011,
How to do that using Stata 13
Thanks in advance
non stationarity is sometimes considered in the sense of econometricians as unit root problems 5PCB Philipps et al.).
A question would be to unify "reasonable" notions of non stationarity, eg local stationarity (eg Dahlhaus and followers), periodic or seasonal (eg Dickey, Mohamedou Ould Haye, Viano, Leskow), unit roots.
For periodic behaviors an additional attractive question is to fit periods (even if the continuous time case is even more relevant and a problem is then to fit periods
Many other non stationarity notions, such as random walks in random environment (eg Snitzmann) are extremely attractive: Anyway they dont necessarily features of real data
The idea of the present project is more to develop (non parametric or parametric) models featuring the main properties of real data sets to be fitted
Certainly the validity of techniques should be provided from real data analysis as this was emblematically done in a contradictory paper by Mikosch and Starica opposing long range dependent models to models with a linear trend
I am doing longitudinal research.
There are two variables.
One was measured for three times, but the other one was observed for five times.
In other words, one variable was missing twice.
Do you think that I can develop the AR model with these variables?
I am conducting research on the institutional and community trajectories of people who have been housed in forensic mental health units at some point between 2005 and 2015. I have indicators for several domains assessed at several times (usually upon admission and every 6 months after that). There can be several admissions over time. The domains are psychiatric symptoms, past and recent negative life events, behaviour problems in the institution, treatment use and control interventions. The latter two are outcomes, the first three domains are predictors.
Do you have any recommendation as to what analysis to use, software, etc.? Since all my indicators are categorical and there are many per domain, I figure I have to do some kind of data reduction procedure (Latent class models?) to get at the domains. Any ideas are welcome.
Hello,
I am interested in testing a model using a cross-lagged panel design; both X and Y are measured at T1 and T2. I think the XY and YX associations have the potential to be nonlinear, both within and across time points. Is it possible to integrate quadratic X and quadratic Y into a cross-lagged panel design (for example, something like the attached figure?). If so, does anyone have any published examples where this approach has been used?
Thank you!
Melissa
I am analyzing some longitudinal data using mixed models (lmer in lme4). Sampling date is a within subject covariate. I also have a two level factor where each subject is one of the two levels and a continuous between subjects covariate. I get a significant interaction between the between subjects covariate and the two level factor and am not quite sure how to interpret this. Does it mean that the slope of the relationship between the response and the between subject covariate is significantly different for each level of the factor? Or does it simply mean that the effect of the covariate evaluated at 0 is significantly different for each level of the factor (implying nothing about the slope relationship)?
I have a dataset of 140 patients equally divided into 3 groups. The dependent variable is "moca" and can take integers between 0 and 30. It is a longitudinal study with a total of 4 time-points (variable time is "timepoint"). The group defining variable is "status". The independent variables are "age at recruitment", "education", "sex". I am interested in differences in the behaviour of moca across time in the 3 groups. The variable defining different patients is "recordID"
This is the command that I used:
mixed moca i.status i.sex education ageatrecruitment timepoint || RecordID: timepoint
So my questions are:
1- does the command I am using make sense?
2- with this model the lines I get for each group are parallel (i.e. have the same slope and differ only for the intercept). I suspect they might have different slopes though, how could I test for that? My guess would be to run the same command separately for each group (using the option by), but then how could I compare them with a test (the suest command doesn't work with the mixed command).
3- my dependent variable is limited to integers between 0 and 30 and it might be that my regression line tends to an asymptote at 30, how could I test for that/implement this in my model?
I apologise if my question is unclear/too long.
I have longitudinal data on registered arthropathy diagnoses for my study subjects including main group and a comparison group. As the data include 22 years of follow up, I want to see the pattern of incidence across age groups. I am looking for a stata command/module that does it automatically for me. I know how to do it manually but that takes some time and is not the best choice for me. Any guidance from my experienced colleagues is very much appreciated.
I have found the only relation between moment, thickness, and connectivity but I guess it is only for longitudinal method.
I am working on a longitudinal study with 140 participants divided in 3 groups. The participants were assessed every 2 years from 2010 (4 time-points in total), so time-points are equally distributed, but there are some dropouts, so some patients are missing some time-point.
The assessment consisted of some tests, the results of which are discrete numerical variables (e.g. one of these is the MoCA test, which is a cognitive test with different tasks and for each task the participant is given a score; the final score is the sum of the partial scores).
My goal would be to show any difference between groups in the progression of the scores through time.
After some readings I am thinking to use a mixed effect model with the random part on the single individual level and the fixed part on the group level, would that make sense? What other statistical model could I use?
I am trying to analyze data (self-concept and testscores) of students before and after transition from primary to secondary education. The aim is to show the impact of individual achievement and class achievement on self-concept both before and after transition: I hypothesize (1) that individual achievement has a positive impact on self-concept and class achievement a negative one, (controlling for individual achivement) and more important (2) that after transition to secondary school, class achievement of the "old" class before transition does no longer have it's negative impact on self-concept measured after transition.
Now I do not know how to set up a model, for students change classes with transition and therefore are nested in two different groups - their classes - before and after transition.
Does anyone have an idea how to set up a model that allows to analyse these questions or has anyone done some similar analysis?
Thank you very much for your answers!
I have got panel data of 200 companies for 10 years, i.e. 2000 observations. For example, if I collect macro-economic data of GDP (per year data) for 10 years it will be 10 observations . How do I regress my panel data over macro-economic data? Which econometric/statistical model will be used and how?
I have received a comment from reviewer about random intercept model. He recommended me to compute for random intercept and trend model rather than random intercept model based on RD. Gibbons et. al , 2010) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2971698/pdf/nihms236714.pdf recommendation . But i was a bit confused about these terminologies? Does he mean that I should consider for "random slope"? when he suggest "random intercept and trend model"? If random slope and random intercept and trend model is the same, I have already tested the lrtest and opted for random intercept model before based on the recommendation while i wrote the manuscript.
If you still believe that random intercept and trend model and random slope are different, what is the command in stata taking in to account random intercept and trend model? is it different from random slope?
I appreciate for your help
Longitudinal data elements need to be embedded in usual snap-shot mode of survey. what techniques can help to achieve this?
Although it is interesting to run complex interactions such as categorical by categorical by categorical interactions. Little information is available on how to interpret them. Can anyone recommend a good book or other useful resources?
I would like to flexibly model the development of some continuous outcome of interest as a nonlinear function of age, using longitudinal data with rather strong imbalance (i.e., most individuals cover only a small fraction of the age range of interest with their measurements, although the whole set of measurements is covering the entire range), including an estimate of uncertainty (e.g. confidence bands around the estimated function). Which method would you recommend for this purpose?
I used LOESS to get an idea of the overall trend but I don't know whether the error estimates can account for the correlation present in longitudinal data. Also, I would be interested in both "population average" estimates (such as in marginal models / GEE models, but I don't know if there are any extensions which allow flexible modelling of nonlinear relations) and in "individual-specific" estimates (maybe using Generalized Additive Mixed Models, but since I do not really understand them, I cannot gauge their appropriateness for this situation)
The attached text file contains R code for the creation of a fictitious dataset which illustrates the kind of data I am interested in (long format, variable "id" indicates person).
Thanks for any help / suggestions!
Case: Longitudinal research on kids over a period two years from 10 to 12 standard to analyse their behaviour toward education (sample size around 1200 during 10th and for 12th standard reduced to 1150.
Condition:
- I need to test a model for behaviour and see the difference between them.
- Total students during the study are 1200 and study conducted in 2 years of duration.
Approach 1:
- Step 1: Run AOMS to test SEM model on 12th standard and validate the model.
- Step 2: Use ANOVA to compare between 10 and 12 standard students behaviour toward studies.
Approach 2:
- Step 1: Run AOMS to test SEM model on 10th and 12th standard dataset separately and validate the model.
- Step 2: Use ANOVA to compare between 10 and 12 standard students behaviour toward studies.
Kindly guide me how to proceed in such condition when data is longitudinal in nature.
Dear All,
I would like to use multilevel modelling to estimate concentration index (CI) in STATA. I will exploit claims data for health care use. In my analysis, data for health care use is at individual level (about 5 million observations and socioeconomic status is at postcode level (about 2000 postcodes). These postcodes are nested at 31 higher health regions for which I have information about health care need. I wish to estimate CIs for this 31 areas using multilevel approach to explain the regional variation in equity.
Does anyone know how to implement multilevel modelling in STATA to derive area level concentration indices?
Thank in advance.
Good day all!
I am currently working on my dissertation proposal and after considering my longitudinal data, the methodologist on my committee and I decided that latent change score modeling was the best analysis to answer two of my questions.
However, I have only found some information on latent change score modeling. The resources found are:
McArdle et al. (2009) Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement.
McArdle et al. (2010) Five steps in latent curve and latent change score modeling with longitudinal data
van Montfort et al. (2010) Longitudinal research with latent variables (chapter on topic)
Geiser (2012) Data analyses with Mplus (chapter)
Coman et al. (2013) The paired t-test as a simple latent change score model
If anyone could suggest any more resources, I would really appreciate it. I am especially looking for articles about examining relations between latent change scores along with Coman et al. (2013)
Thank you!
I have the solution concentration (mg/l) in the different time (s) in a rectangular channel and I want to experimentally estimate the longitudinal dispersion coefficient with a high accuracy.
In my study,there are eleven response variable and one independent variabe.
Is there a way?(using by longitudinal ordinal response GEE models)
Is there a way to easily check the "geeglm" model (with ar1 correlation structure, in R) like its residuals or something else?
Thanx a lot
This is an observational trial looking to see if there are significant changes in biomarker data during a 6-month course of daily treatment with 6-months followup. The trial is studying Prostate Cancer in patients that elected to participate in an "Active Surveillance" program. I am writing a section in the protocol describing how the data will be analyzed and am looking for the correct terminology one uses in evaluating longitudinal changes over time. As an example there will be quarterly PSA tests run and I want to use a statistical method to measure upward or downward trends in PSA levels in dividuals and in a group of 20 patients where the data are combined. Can someone refer me to a resource to define this approach in longitudinal studies?
What is the best type of study to grasp the changes in well-being within a city's population in a time period of 3 months? Cross-section or longitudinal?
The population will the surveyed two times. Once before a major event, and the second time after the event. The time period between the two surveys will be of 3 months.
Dear All,
I would like to request you to highlight me how to work with longitudinal data in R. Sharing any relevant resources (book, articles, tutorials and youtube links etc) would be highly appreciated.
Sincerely
Sadik
In longitudinal study (two wave), normally we will choose time 2 mediator and time 2 outcomes in the model. My question is: do we really need to control them by using time 1 mediator and time 1 outcomes respectively? Thanks a lot.
What is the best method to examine the dynamics of cattle colonization by anti microbial resistant microorganisms? The data consists of 188 cattle measured at four equally spaced time points over the course of the year. The same cows were followed and a binary outcome (ARM present/absent) as well as the number of bacteria present (log colony forming units) are available. I was interested to explore either dynamic or longitudinal models to understand the underlying process of colonization in the herd over time. Please provide references explaining any methodologies or other examples.
I want to use only 1 independent variable - GDP per capita - and to control for time effects (6 dummy variables for each wave).
I am planning on modeling the fertility rate in Malaysia using Longitudinal Data Analysis. This is the first time I am doing Longitudinal Data Analysis. So, I hope anyone can help me in understanding the Longitudinal Data Analysis. Thank you.
Hello, I'm looking for a test or an index to verify the decrease or increase of temperature of sea water.
I have two time series of temperatures of two Decades (80s and 00s). I analyzed them with R, I deseasonalized the two time series, now I have the two trends that aren't very slope , but I would like to know if there is an increase or decrease or if the temperatures are equal.
Thank you very much.
The response variable is measured at common observation times. Can these observation times represent "Level 1"??
I'm working on a dataset of variables and I would like to apply a multiple imputation on a subset of variables of great interest. I'm working on STATA. I also know what's the prinicipal of the MI method : checking the MAR assumption, percentage of missing data, etc... But my main question is : What about the sample size? I read many articles (e.g Ula Nur for example) which describe the method perfectly on big datasets with around 15000 observations. In my study, I just have 300 hundred patients but more than 20% of missing data for many variables (this is why I think about imputation). So, can I apply this method on my little dataset ?
I intended to do longitudinal data analysis using Bayesian framework. My model has both random slope and intercept. However, when checking convergence, it seems that I am not entirely satisfied with the output. I want to apply hierarchical centering to see if it improves the convergence or not but I really struggle myself of how to implement it in my case.
Any help would be very useful for me. Thank you a lot.
Is there anyone working on the SHARE data, who could share some experience on dealing with the different scales of life satisfaction used in the survey?
In WAVE 1 there is a 4-point scale (1= Very satisfied; 2 = Somewhat satisfied; 3 = Somewhat dissatisfied; 4 = Very dissatisfied).
In WAVES 2,4 and 5 they switched to the commonly known 11-point scale ("On a scale from 0 to 10, where 0 means completely dissatisfied and 10 means completely satisfied, how satisfied are you with your life?")
When looking at the development of life satisfaction over time - how can I compare the results? How can I transform them to get comparable data? Any literature or concrete advice would be very helpful!
Hi all,
I am doing a longitudinal study with two waves of data, and I have a sample size of around 100. With these data, I hope to examine:
1) the direction of causality of IV and DV
2) how IVs and DVs change over a short period of time.
I am aware the sample size is quite small for a longitudinal study, and SEM cannot be used, so I am wondering if anyone can recommend a suitable way to analyse my data. Thanks a lot!
I would appreciate it if you could kindly send me multi-response longitudinal data, many thanks!
Can we use the one that is used for cross-sectional study?
If I would like to create a regression model including time-invariant explanatory variables (e.g. gender) as well as time-variant variables (e.g. income, health) for a longitudinal investigation of panel data - which technique would you recommend?
Is there any technique, that allows for the integration of both sorts of explanatory variables?
I've read about fixed effects regression, first difference and random effects, but have not found a promising answer, yet...