Science topic

Longitudinal Data Analysis - Science topic

Explore the latest questions and answers in Longitudinal Data Analysis, and find Longitudinal Data Analysis experts.
Questions related to Longitudinal Data Analysis
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
I have a SEM model (with 9 psychological and/or physical activity latent variables) with cross-sectional data in which, guided by theory, different predictor and mediator variables are related to each other to explain a final outcome variable. After verifying the good fit of the model (and after being published), I would like to replicate such a model on the same sample, but with observations for those variables already taken after 2 and after 5 years. My interest is in the quasi-causal relationships between variables (also in directionality), rather than in the stability/change of the constructs. Would it be appropriate to test an identical model in which only the predictor exogenous variables are included at T1, the mediator variables at T2 and the outcome variable at T3? I have found few articles with this approach. Or, is it preferable to use another model, such as an autoregressive cross-lagged (ACL) model despite the high number of latent variables? The overall sample is 600 participants, but only 300 have complete data for each time point, so perhaps this ACL model is too complex for this sample size (especially if I include indicator-specific factors, second-order autoregressive effects, etc.).
Thank you very very much in advance!!
Relevant answer
Answer
Hi there,
a completely different approach could be to run a TETRAD model. This allows to set some restrictions where you can be sure about the direction (e.g., the autoregressive effects as well as forbidding reverse effects from t_n to t_n-1) and freely expore the rest. The model will print a path diagram that shows you three things
1) clearly supported causal (happens only rarely)
2) Effects with one side (the arrowhead) crealy "depentent" but the other end ambiguous (may be a cause or a consequent)
3) completely ambiguous relationships
TETRAD exists since the 80s and is remarkably invisible to our field.
Eberhardt, F. (2009). Introduction to the epistemology of causation. Philosophy Compass, 4(6), 913-925.
Malinsky, D., & Danks, D. (2018). Causal discovery algorithms: A practical guide. Philosophy Compass, 13(1), 1-11. https://doi.org/10.1111/phc3.12470
A few final comments
1) 90% of confirmatory SEMs are much too complex and involve dozens and sometimes hundreds of testable implications. And because of that the models never fit which--in almost funny manner--is then used as support for the model ("well, models never fit, so why should my do?"). I would always focus one ONE or TWO essential effects or chains of effect and then try to a) think hard about confounders and b) potential instruments
2) Yes, causation needs time to evolve but most often, the time lag is embedded in the measurement. Otherwise you would not have any cross-sectional correlations. That is, if the causal lag is similar to the lag embedded in the measure (e.g., "how satisfied are you with your job" will prompt an answer that is derived from memory) OR / AND the IV is stable, then cross-sectional data will generally allow to identify causal effects. The key issue is and stays "causal identification"--that is removing confounding biases and potential reverse effects. The latter can be solved in a cross-lagged design but not the former. That is, you have to think hard about confounding no matter what the timely design is.
I had a long discussion in the following thread in case you differ in your opinion (which is fine for me):
Best,
Holger
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
Hello everyone,
I am trying to run rmcorr (repeated-measures correlations) and linear mixed models on intervention (Pre-Post) data.
However, I have come across several variables which the residuals severely violate the assumption of normality due to outliers.
I was wondering if it is a valid approach to examine the distribution across both Pre and Post data together as one to transform and/or trim for a "more" normal distribution. Or would I have to examine the Pre data and Post data separately?
Thank you!
Relevant answer
Answer
Hello Jeongwoon,
Much of the decision about "best" approach will depend on the specific research question(s) you seek to address, as well as the nature of the data (and how the data were collected). As we don't yet know this information, I'm having to give a general observation, based on what your query does contain.
I would suggest you consider using pre scores as a covariate and compare post scores by intervention method. If model residuals are troublesome here, you can easily apply bootstrap error estimates which do not require presumptions about distribution shape. Tests of the covariate in the model address whether pre- and post-scores are linearly associated.
If this is way off base for your research aims, perhaps you could elaborate your query.
Good luck with your work.
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I am interested in examining whether my protein of interest is associated with cognitive decline over a period of time. In our study, participants were followed longitudinally for 8 years. At each visit (Baseline, 3 months, 12m, 24m, 36m,48m,60m,72m, 84m) participants underwent cognitive testing. Test scores are treated as continuous, repeated-measures. However, protein levels were only measured once at baseline. Therefore my independent variable would be my protein whilst cognitive test scores would be my dependent variable. However, I would like to control for covariates/ confounders such as age, gender and years of education. Finally, some of the participants missed cognitive testing at certain months – I hope this won’t affect the analyses. I spoke with a statistician and he recommended a linear mixed-effects model, however I am new to this type of modelling. I will use SPSS (V23) to run my analyses, however I have a few questions:
  1. Overall, is this plan feasible?
  2. In SPSS I have to select subjects and repeated variables in the first screen (attached picture), I assume this would just be each participant’s ID and the visit variable (time in months)?
  3. What would I consider for “fixed” and “random” effects?
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
In the frame of a longitudinal data analysis I need to write an R-code about mixed models. I know that there are many different names (mixed models, mixed-effect models, random effects,...) and would need to get first an overview helping me to choose the right one (in collaboration with a statistician) and finally to write the code.
Has anyone a recommendation?
Kind regards
Anne-Marie
Relevant answer
Answer
Look up mlwin resources online.
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
What statistical test should I use to investigate how changes in variables A (continuous; defined as A at Time2- A at Time1) affect changes in Variables B (continuous; defined as B at Time2 - B at Time1) using longitudinal data? I really appreciate your help.
Relevant answer
Answer
Hi,
Repeated measure analysis. Here are a few references:
Schober P, Vetter TR. Repeated Measures Designs and Analysis of Longitudinal Data: If at First You Do Not Succeed-Try, Try Again. Anesth Analg. 2018;127(2):569-575. doi:10.1213/ANE.0000000000003511
For atheoretical insight try: Get this book in print
M. Ataharul Islam, Rafiqul I Chowdhury.Analysis of Repeated Measures Data. Publisher-Springer Singapore, Year: 2017
ISBN: 978-981-10-3793-1, 978-981-10-3794-8
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
I want to learn how to handle and analyse panel data and I am looking for references (handbooks, articles or any other material, but I prefer a good handbook). Preferably using STATA language and focus on social sciences applications. I look forward to your suggestions. Thanks beforehand.
Relevant answer
Answer
this is an excellent resource with code and videos for panel models and many others:
  • asked a question related to Longitudinal Data Analysis
Question
1 answer
Hello!
We have a question about implementing the ‘Mundlak’ approach in a multilevel (3-levels) nested hierarchical model. We have employees (level 1), nested within year-cohorts (level 2), nested within firms (level 3).
In terms of data structure, the dependent variable is employee satisfaction (ordinal measure) at employee level (i) over time (t) and across firms (j) (let’s call this Y_itj), noting that we have repeated cross sections with different individuals observed in every period, while as regressors, we are mainly interested in the impact of a firm level time-variant, but employee invariant, variable (let’s call it X_tj). We apply a 3-level ordered profit model (meoprobit in Stata).
We are concerned with endogeneity issues of the X_tj variable, which we hope to (at least partially) resolve by using some form of a Mundlak approach, by including firm specific averages for X_tj, as well as for all other time-varying explanatory variables. The idea is that if the firm-specific averages are added as additional control variables, then the coefficients of the original variables represent the ‘within effect’, i.e. how changing X_tj affects Y_itj (employee satisfaction).
However, we are not sure whether approach 1 or 2 below is more appropriate, because X_tj is a level 2 (firm level) variable.
1. The firm specific averages of X_tj (as well as other explanatory variables measured at level 2) need to be calculated by averaging over individuals, even though the variable itself is a Level 2 variable (varies only over time for each firm). That is, in Stata: bysort firm_id: egen mean_X= mean(X). As our data set is unbalanced (so the number of observations for each firm varies over time), these means are driven by the time periods with more observations. For example, in a 2-period model, if a company has a lot of employee reviews in t=1 but very few in t=2, the observations in t=1 will dominate this mean.
2. Alternatively, as the X_tj variable is a level 2 variable, the firm specific averages need to be calculated by averaging over time periods. That is: we first create a tag that is one only for the first observation in the sample per firm/year, and then do: bysort firm_id: egen mean_X= mean(X) if tag==1. This gives equal weight to each time period, irrespective of how many employee-level observations we have in that period. For example, although a company has a lot of employee reviews in t=1 and very few in t=2, the firm specific mean will treat the two periods as equally important.
The two means are different, and we are unsure which approach is the correct one (and which mean is the ‘true’ contextual effect of X_tj on Y_itj). We have been unable to locate in the literature a detailed treatment of the issue for 3-level models (as opposed to 2-level models where the situation is straightforward). Any advice/suggestions on the above would be very much appreciated.
Relevant answer
Answer
For general orientation you may be interested in
In this manual , see Chapter 8 where we use another multilevel model to get precision-weighted estimates of the group means
For a fuller discussion of the multilevel model as a measurement model) and a more convincing example of using precision-weighted estimate of the group mean can be found at
Finally , as in all research call involving a judgement call, do it both ways and see if it makes a substantive difference.
  • asked a question related to Longitudinal Data Analysis
Question
15 answers
I have questions regarding measurement invariance for longitudinal data when analyzing latent variables. I would like to analyze 4 cohorts (grades 3,4,5,6 at baseline) longitudinally over 2 years with four assessments.
1. If I consider the four cohorts as one sample, should I then show measurement invariance (configural invariance, metric invariance, scalar invariance and strict invariance) between the cohorts for each of the four assessments. Or would it be more appropriate to evaluate each cohort separately? As an analysis method I would like to calculate random-intercept cross-lagged panel models.
2. Is it necessary to test for configural invariance, metric invariance, scalar invariance as well as strict invariance between the four assessments for the latent variables? Or is it also sufficient under certain conditions if only configural invariance is given?
Relevant answer
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
Hello! I am trying to figure out how to run an analysis for a longitudinal moderated mediation. I will be collecting data twice from the same individuals in a mediation study while employing a first stage moderator. I have found a few papers that have used path modeling in R or SPSS, but I am wondering if this is the best way to go about this. Thank you!
Relevant answer
Answer
@Lindsey I guess, Multilevel (longitudinal) moderated mediation analysis can also be done in SPSS using MLMed macro.
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
Hello,
I plan to perform a linear mixed model analysis to look at change in cognitive function over 5 follow up waves. I am interested in looking at how this change is influenced by dietary and urinary sodium as well as other covariates like age, gender, blood pressure etc. Dietary and urinary sodium have only been collected at baseline, whilst other covariates have been collected at baseline and at each follow up wave.
I am aware of how to restructure the data set to the "long" format, creating an index variable for Time. However, as mentioned, certain variables have only been collected at baseline and are important as I need to include them in the data set in order to exclude cases. For example, I would like to exclude anyone who has a diagnosis of dementia at baseline and the corresponding variable only has one 1 timepoint (data collected at baseline).
I am looking for advice on the following:
1) How to restructure the data set so these kind of variables fit & make sense within the long format?
2) Then, how can I select cases based only on certain baseline variables and then perform the analysis on the remaining cases, across all waves?
3) Are LMM's the best approach here? Or would you suggest an alternative method.
Thanks.
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
Hello,
Often when I read multilevel model papers, when talking about time-varying covariates, they often only use binary time-varying variables.
If I was to do a cross product between the time variable, and a ordinal/categorical time-varying covariate with several catgeories, say 5, might this over complicate the model? As in, when I have added a time-varying covariate with multiple categories, I see that the BIC will go up, the AIC will go down, does this mean that the model is 'worse'?
Is it better to try and collapse a larger categorical time-varying covariate into just a binary category, to allow for easier interpretation? If using a timevarying variable with mutliple groups is accepted practice (Because I don't see it often), how should i determine if it is a good predictor? Is there any papers which might help? I can't see much in the literature relating to this.
Let me know your thoughts
Relevant answer
Answer
Regardless of time-dependence of covariates (as long as Z is predictable) the partial likelihood method preserves the same properties as classical maximum likelihood method.
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
Hi everyone!
I'm doing a PhD in Clinical Psychology and I have some treatment data to analyse. The design is a 2 (condition: self-compassion, cognitive restructuring) x 5 (time: baseline, mid-treatment, post-treatment, 1-week follow-up, and 5-week follow-up) design. I had 119 participants randomized and engage in their respective interventions for a 2-week period, with follow-up assessments. The aim was to reduce social anxiety.
One analysis I'm trying to do is mediation, and preferably I would use a more simple strategy such as Hayes' PROCESS macro on SPSS. However, my understanding is that I won't be able to use all five waves of my data if I use PROCESS. Does anyone know if that is an appropriate strategy for multiple waves? Should I be using all information? And if so, how?
Relevant answer
Answer
Hi Ying, please see our paper to read about how what we eventually did:
  • asked a question related to Longitudinal Data Analysis
Question
9 answers
I'm trying to assess severity of symptoms among GI patients pre and post intervention over time (post 12+ months). Dataset provided collected 1000 patients at 7 time-points (pre surgery, 0 - 1 mo., 2 -3 mo., 4 - 5 mo., 6 - 8 mo., 9 - 11 mo, 12 + mo.) with 767 unique patients (who responded only once). Total N for each time point vary between 80 - 185. Data collected used a survey which total score ranged between 0 - 50. scores were then categorized in 3 groups: none, moderate, and severe. Was asked to conduct a group x time interaction. But unsure how do it with unequal time intervals, different participants for each time point, and data not normally distributed. When assessing proportions per group results for moderate and severe show a "U" shape. Only additional data was given was gender and age.
Currently, I've only been able to do chi-square, or fisher as needed. But would like to do additional stats tests and would like ask what would be the best stats to do?
I have limited stats experience and have assess to Prism GraphPad. Any recommendations how to best review the data would be extremely helpful.
Thank you.
Relevant answer
Answer
I apologize for the delay of response. Yes, I would encourage you to follow PI's advice. My input is based on my training in statistical science, but by all means, I do not claim to be an expert in your field. Please take information below as a mere suggestion for alternative approach to answer your research question.
Having said that, here are my elaborated version of the previous response: it is my understanding that your interest is on evaluating 'slopes' at each time point by group. For instance, your 'none' group shows positive slope between 'pre-treatment' and '0-1 month', whereas the other two groups show negative slope.
You can run linear or generalized linear mixed effect model to quantify time-specific, age- and sex-adjusted slopes at the individual level, and extract those values and save them as data. You would then have individual-level slope information at multiple time points, which in turn multiple comparison tests can be used to evaluate slopes at each time point by group. For instance based on the graph you shared, your 'none' group slope would be statistically different to the other two groups for obvious reasons (positive slope vs. negative slope).
As you take this longitudinal 'piece-wise' regression approach, the benefits are: 1) you can control for auto-correlation for repeated observations; 2) you have a secondary dataset that can be manipulated further (for tables and graphs); 3) you can also make your time variable as continuous and predict individual's outcome based on the formula you built. (This would be a possible solution for your unequal time interval problem).
In my training, to do any kind of meaningful trend analysis, you need more intensive time points. You may consider doing a simple change for the two periods and run a simple binomial hypothesis testing, but the problem with this approach would be centering the sample mean, which can be tricky. Hope information were not overwhelming. Once again, take these information as a potential alternative approach. Good luck.
  • asked a question related to Longitudinal Data Analysis
Question
1 answer
I have collected data using questionnaires from athletes at two time points. I have an independent variable (leadership) and a few dependent variables but also some mediating or indirect variables such as trust ect. All of the variables were measured using the same questionnaire at two time points.
What data analysis method would be best to analyse the data? So far I have used the Process macro on SPSS which uses OLS regressions but I am unsure if this is the best method.
I essentially want to see how the IV relates/ increases the dependant variables over time and whether this changed occurred directly from the IV and indirectly through the mediators.
Would these be appropriate research questions for the type of data I have and for the appropriate analysis technique?
Relevant answer
Answer
Hello Ella,
The "best" analysis will depend on: (a) your specific research question/s; (b) the nature of the variables that you have collected and how they are quantified; and (c) your sampling procedures. You may find that an SEM framework would be easier for expressing your hypothesized model of how the variables do or don't fit together (even though Hayes' PROCESS add-in is pretty darn versatile). But, your query isn't fully elaborated, so (as an example) I have no way to judge whether you'd be better off with a univariate vs. a multivariate approach.
It sounds as if it might be worth your while to chat with someone from your institution to help arrive at an analytic approach that would answer your specific questions (a, above), while being defensible (given b and c, above).
Good luck with your work!
  • asked a question related to Longitudinal Data Analysis
Question
7 answers
Hello to all of you,
I want to make an intervention at work, and I wish to measure the effects of such intervention.
I intend to introduce a new goal management process: I would like to analyze whether the goal monitoring frequency affects the goal attainment.
My intention is to make a survey to collect data before and after the intervention. The data that I would collect would consist of:
- Individual data points (to avoid repetition and to pair with second survey).
- Independent variable: the goal monitoring frequency.
- Dependent variables: knowledge of goals, performance indicators.
Although the idea seems pretty straightforward, I've been researching without much success regarding the methodology of the analysis.
As I understood it, this can be considered a lingitudinal study, since I will ask the same questions to the same individuals in two different points in time.
I would like to have some guidance on how to perform the analysis, regarding the methodology and the requirenments thereof.
Thank you for your guidance!
Relevant answer
Answer
Thank you all for providing me with your perspectives on the topic.
Daniel Wright : Adding a control group is feasible due to the size of the company. But then again, according to the rationale, I would still be dealing with a n=1 study.
How can such an experiment within a company be undertaken, if the communication among employees is expected?
Oluwasola Ajewole The time frame will be of 2 months between both surveys. The intervention will break down the goals to the individual employee (In contrast with goals being given to the coaches that are distributed down to the employee without any system).
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I have collected data using questionnaires from athletes at two time points. I have an independent variable (leadership) and a few dependent variables but also some mediating or indirect variables such as trust ect. All of the variables were measured using the same questionnaire at two time points.
What data analysis method would be best to analyse the data? So far I have used the Process macro on SPSS which uses OLS regressions but I am unsure if this is the best method.
I essentially want to see how the IV relates/ increases the dependant variables over time and whether this changed occurred directly from the IV and indirectly through the mediators. Would these be appropriate research questions for the type of data I have and for the appropriate analysis technique?
Relevant answer
Answer
Hello Ela,
a) learn SEM--for instance with the lavaan-package (lavaan.org)
b) For your research question, autoregressive models would be a possibility in which you can model lagged or synchronous direct and indirect effects.
Here is an applied example. If this fits what you want, I'll give you further papers.
Frese, M., Garst, H., & Fay, D. (2007). Making things happen: Reciprocal relationships between work characteristics and personal initiative in a four-wave longitudinal structural equation model. Journal of Applied Psychology, 92(4), 1084-1102.
(don't be distracted that they used 4 waves)
Best,
Holger
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I have a dataset comprising clinical characteristics and glycomics profiles of disease samples (n=100+) and matched healthy controls. So far, I have been able to do classification modeling to discriminate between disease vs healthy using the biomolecular profiles.
On top of that, I have glycomics profiles collected across 2 more time points (total 3 time points - > timepoint 1 = time of discharge, timepoint 2 = 1 month follow-up, timepoint 3 = 6 months follow-up) from the disease patients. However, the healthy controls only have their glycan profiles measured at 1 time point (this is because no follow-up assessments and blood taking were done for the healthy controls).
My question is this: What kind of statistical analysis can I perform to draw meaningful insights on how the glycan profiles change across the 3 timepoints? I was originally thinking of survival analysis, but only a handful of patients out of the 100+ samples had adverse outcomes. So I question the applicability of that. Other than that, are univariate or multivariate tests to determine significant differences in the biomolecular profiles between each time point the only thing I can do?
I apologise for the lengthy question and appreciate any advice given!
Relevant answer
Answer
As the samples are acquired from the same patient across time, the ideal thing would be to perform a paired analysis (repeated measures ANOVA or a non-parametric equivalent). However, the levels of the factor time (the timepoints) are required for every sample group, which are absent in controls). Therefore, I'd move to a combination of pairwise comparisons (t test or Mann-Whitney, depending on data normality), at the expense of sacrifizing the pvalue of interactions. However, post-hoc corrections as FDR may penalize less the false positives as variables in your glicomics data are considered in individual pairwise comparisons, while the correction initially should correct for all variables tested in the study.
Hope this helps!!!
  • asked a question related to Longitudinal Data Analysis
Question
6 answers
In a longitudinal study on the consequences of electronic health record implementation on healthcare professionals work motivation, we aim to test a first stage moderated mediation model, using GEE in SPSS. We want to follow the method described by Hayes (2015; index of moderated mediation). Although there is literature on using GEE for mediated effects (Schluchter, 2008), I was not able to find any (research) papers on using GEE for this specific purpose. Does anyone have experience with moderated mediation using GEE or know whether this is an appropriate method?
We have the following variables: (X = Time: before versus after EHR implementation), two mediators (M = job autonomy, task interdependence), one outcome variable (autonomous motivation) and one moderator (W = profession, with four subcategories). We anticipate that profession has a moderating effect on the relationship between time and the mediators.
Relevant answer
Answer
I am afraid that this is not direct answer to your question but considers an alternative to GEE, - random effects modelling for longitudinal data -and why you may want to consider it
  • asked a question related to Longitudinal Data Analysis
Question
9 answers
Most of recent books in longitudinal data analysis I have come through have mentioned the issue of unbalanced data but actually did not present a solution for it. Take for example:
  • Hoffman, L. (2015). Longitudinal analysis: modeling within-person fluctuation and change (1 Edition). New York, NY: Routledge.
  • Liu, X. (2015). Methods and applications of longitudinal data analysis. Elsevier.
Unbalanced measurements in longitudinal data occurs when participants of a study are not measured at the exact same points of time. We gathered big, complex and unbalanced data. Data comes from arousal level which is measured every minute (automatically) for a group of students while engaging in learning activities. Students were asked to report on what they felt while in the activities. Considering that not all students were participating in similar activities in the same time and not all of them were active in reporting their feelings, we end up with unstructured and uncontrolled data which does not reflect a systematic and regular longitudinal data. Add to this issue the complexity of the arousal level itself. Most of longitudinal data analysis assume the linearity (the outcome variable changes positively/negatively with the predictors). Clearly that does not apply to our case, since the arousal level fluctuates over time.
My questions:
Can you please specify a useful resource (e.g., book, article, forum of experts) to analysis unbalanced panel data?
Do you have yourself any idea on how one can handle unbalanced data analysis?
Relevant answer
Answer
I do not have experience of this level of complexity in repeated measures , but it seems to me that you actually have multiple episodes - what you call events. So I am suggesting a three level structure with an identifier at level 3 for individuals; at level 2 for episodes within individuals and level 1 for repeated occasions within a specific episode; the minute by minute recording of repeated measurements. You could have variables measured at all three levels. The general approach is considered here
Fiona Steele (2008) Multilevel Models for Longitudinal Data, Journal of the Royal Statistical Society. Series A (Statistics in Society)Vol. 171, No. 1 (2008), pp. 5-19
  • asked a question related to Longitudinal Data Analysis
Question
8 answers
Hi All
My dataset is longitudinal in the long format, and each individual has 5 rows of data with each row representing one wave (i.e. total 5 waves).
I just want to create a variable called year which can take on the same 5 values: 1982, 1989, 1999, 2009 & 2015 for subjects and is understood by Stata to be in calendar years. I will use this year variable to help create other variables like duration and age etc.
I would like the year variable to look this:
id year 1 1982 1 1989 1 1999 1 2009 1 2015 2 1982 2 1989 2 1999 2 2009 2 2015 3 1982 3 1989 3 1999 3 2009 3 2015
Any ideas how to generate the above year variable in Stata date language?
Many Thanks
/Amal
Relevant answer
Answer
Instead of storing the actual values of year in a local macro, you could simply use a labelled variable for year, if the actual numeric value is less important than the order.
lab def year 1 "1982" 2 "1989" 3 "1999" 4 "2009" 5 "2015",modify
sort id
by id: gen byte year = _n
lab val year year
The trick is that when you use 'by', Stata interprets _n as the ordinal position of the row within the rows defined by the by prefix, so _n resets to 1 every time we reach a new id number. As an aside, _N also behaves like this. It is the last row in the data or the last row defined by a by prefix.
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
Hi everyone,
I would like to know how do you conduct EDA on a longitudinal study. For example, If I want to study the impact of pred1 (categorical - 3 levels) and pred2 (continuous) on the response (cont) variable which are collected at 4 different time points (equally spaced interval). Also, consider that I have covariates (age, BMI).
All covariates, pred1, and pred2 are time invarying variables.
Primary hypotheses would be something like-
- Pred2 over 4 time points positively effects response.
- Pred1 levels shows statistically different trajectory for response
- level1 response trajectory is better than the others. (better means- higher response trajectory)
*There are missing values in the repeated responses.
Thank you.
Relevant answer
Answer
Thank you David and Muhammad Ali for sharing these useful resources.
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
Hi All
I'm about to impute missing data in a longitudinal dataset that follows a cohort of subjects over 50 years with data collected at 5 time points (5 waves).
I use Stata for data analysis and I've previously analysed longitudinal data in the 'long format'. This is relatively easy & straight forward in Stata.
However, it is recommended to impute longitudinal data in the 'wide format' and then reshape the data back to long format for analysis. I find this rather cumbersome, as it requires a fair bit of data prep. Imputing in Stata add 'prefixes' to all imputed variables (for example) which can make re-shaping the data difficult.
1. Is it possible to impute longitudinal data in the long format? pros & cons?
2. Is there an alternative or easier strategy when imputing longitudinal data using Stata?
Thoughts, ideas, recommendations welcome!
Thanks
/Amal
Relevant answer
Answer
what is the percentage of missing data on important variables?
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
I've been reading up on grade of membership models and they may be promising for some analysis I would like to do, but seem to be mostly for categorical data.
I've heard of mixed and partial membership models as well but generally haven't found a resource that explains many of these methods in a way that I can use (i.e. massive formulas that I don't know how to apply).
If anyone could direct me to a useful resource I would appreciate it!
Relevant answer
Answer
For an application of such a model in the context of macroeconomics see:
  • asked a question related to Longitudinal Data Analysis
Question
10 answers
Hello!
I have a working hypothesis that I am not sure is possible to accurately measure.
Currently, the EDI (Early Development Index) measures whether or not children ages 4 to 6 are considered "developmentally at risk" for specific domains. My hypothesis is that a low score (i.e., a score with indicates being developmentally at risk) in the COMMUNICATIONS SKILLS AND GENERAL KNOWLEDGE or EMOTIONAL MATURITY sub scales could predict a later in life (i.e., age 6 to 12) difficulties with executive functioning (which are typically measured with a WISC test or the Behavior Rating Inventory of Executive Functions (BRIEF), the Child Behavior Checklist (CBLC), and the Behavior Assessment System for Children (BASC).
All of these measures a reliable, valid, and research based for measuring their designed construct...the EDI has not yet been linked to executive functioning (predictive or otherwise) because children ages 4 to 6 are simply too young for that to be measured reliably due to brain development.
Given that the age 4 to 6 metric is entirely different than the age 6 to 12 metrics, what statistically tests do I need to run in order to say with statistically significant confidence that low performance on the EDI may be predictive of later in childhood difficulties with executive functioning?
Relevant answer
Answer
Julia,
So, this all sounds good.
I am not sure whether the factor analytic evidence that you propose is to be created by you, or it comes from the literature. If from you, note that factor analysis needs a respectable N. I see you say "PCA," thus this may mean a planned Principal Components Analysis. If so, I would strongly encourage you to consider a common factor model; this is more aligned with what would be expected in top tier journals these days. While I am at it, I would suggest use of parallel analysis to determine dimensionality, and allowing the possibility of an oblique rotation.
As I understand it, the "kicker" you mention makes this a longitudinal study taking much more time.
In regard to the predictive relationship, I am not sure what "holds for covariates such as age, gender, and SES" means. Is it the case that you wish to determine the influence of these contextual variables on the relationship between the EDI scales and executive functioning? If so, this suggests a moderation model.
Finally, "with the ultimate criterion outcome being academic performance" may suggests a mediation question. EDI scales impact executive functioning scores which, in turn, affect academic performance.
My Best, and again, sorry for taking the conversations so far from your question.
John
  • asked a question related to Longitudinal Data Analysis
Question
28 answers
I use your stata command traj to find the group-based trajectories. Usually, the comparison of BIC values from 1 to x groups leads to the decision to select the model with optimal groups.This command gives the optimal number of groups. I have read the help file for many times but i cannot find the way to get BIC values for different numbers of groups.
Does anyone use stata command traj to conduct group-based trajectory modeling? And how can i get the BIC values from models with different number of groups?
I have check the SAS codes. They include an option that I can set the number of groups. However, I have not installed SAS software in my computer. 
Thank you!
Yimin
Relevant answer
Answer
Yimin Mao: There is a solution to that particular problem and requires a bit of programming and applying loops. It may not be the most elegant of all solutions, but it does trick. Be aware that applying permutations of polynomials in the trajectory framework consumes lots of time – depending on the size of the dataset you're using. To circumvent this problem, I always use test datasets (approx. 10 percent of original size) for bigger datasets. Also, feel free to recommend my answers if they helped you.
*Permutations of polynomials and group numbers
local polynom 4
local maxnum: display `polynom'^1+`polynom'^2+`polynom'^3+`polynom'^4
matrix a = J(`maxnum',4,.)
matrix colnames a = "Groups" "Polynomials" "BIC(N)" "BIC(panels)"
local i 1
local bicn: display -10^99
local bicp: display -10^99
local e(BIC_N_data): display -10^99
local e(BIC_N_subjects): display -10^99
forvalues a = 1/`polynom' {
quietly traj, var(qcp*op) indep(age*) model(cnorm) min(0) max(10) order(`a')
matrix a[`i',1]= e(numGroups1)
matrix a[`i',2]= `a'
matrix a[`i',3]= e(BIC_N_data)
matrix a[`i',4]= e(BIC_n_subjects)
if `e(BIC_N_data)' > `bicn' {
local bicn `e(BIC_N_data)'
local solutionn `a'
}
if `e(BIC_n_subjects)' > `bicp' {
local bicp `e(BIC_n_subjects)'
local solutionp `a'
}
local i `++i'
forvalues b = 1/`polynom' {
quietly traj, var(qcp*op) indep(age*) model(cnorm) min(0) max(10) order(`a' `b')
matrix a[`i',1]= e(numGroups1)
matrix a[`i',2]= `a'`b'
matrix a[`i',3]= e(BIC_N_data)
matrix a[`i',4]= e(BIC_n_subjects)
if `e(BIC_N_data)' > `bicn' {
local bicn `e(BIC_N_data)'
local solutionn `a'`b'
}
if `e(BIC_n_subjects)' > `bicp' {
local bicp `e(BIC_n_subjects)'
local solutionp `a'`b'
}
local i `++i'
forvalues c = 1/`polynom' {
quietly traj, var(qcp*op) indep(age*) model(cnorm) min(0) max(10) order(`a' `b' `c')
matrix a[`i',1]= e(numGroups1)
matrix a[`i',2]= `a'`b'`c'
matrix a[`i',3]= e(BIC_N_data)
matrix a[`i',4]= e(BIC_n_subjects)
if `e(BIC_N_data)' > `bicn' {
local bicn `e(BIC_N_data)'
local solutionn `a'`b'`c'
}
if `e(BIC_n_subjects)' > `bicp' {
local bicp `e(BIC_n_subjects)'
local solutionp `a'`b'`c'
}
local i `++i'
forvalues d = 1/`polynom' {
quietly traj, var(qcp*op) indep(age*) model(cnorm) min(0) max(10) order(`a' `b' `c' `d')
matrix a[`i',1]= e(numGroups1)
matrix a[`i',2]= `a'`b'`c'`d'
matrix a[`i',3]= e(BIC_N_data)
matrix a[`i',4]= e(BIC_n_subjects)
if `e(BIC_N_data)' > `bicn' {
local bicn `e(BIC_N_data)'
local solutionn `a'`b'`c'`d'
}
if `e(BIC_n_subjects)' > `bicp' {
local bicp `e(BIC_n_subjects)'
local solutionp `a'`b'`c'`d'
}
local i `++i'
}
}
}
}
matrix list a
display "Best solution BIC(n): " `solutionn'
display "Best solution BIC(p): " `solutionp'
  • asked a question related to Longitudinal Data Analysis
Question
8 answers
I collect data of 21 participants daily over 14 weeks (98 measurement points per participant/physiological data). Each participant is part of an intervention group. All interventions have the possibility to improve the dependent variable. I assume that there was no improvement per day but perhaps per week (I know that it is not perfect because all interventions had the possibility to improve the dependent variable). My questions are:
  • should I combine the daily measurements into weekly measurements?
  • should I use the time variable (measurement or week) as continuous or as a factor?
  • Is this a possible code (here using the lmer function)?
model1_fit <- lmer(formula = dependent variable ~ daily measurement/week+intervention+daily measurement/week:intervention+ (1|id),
data=data,
na.action=na.exclude)
summary(model1_fit)
  • how should i interpret the interaction when using the time variable as a factor (assuming that's the better choice)?
Thanks for your help.
Relevant answer
Answer
Should I combine the daily measurements into weekly measurements
NO you are loosing valuable information
Should I use the time variable (measurement oder week) as continous or as a factor?
A bot of both
In a repeated measures random effects multilevel model you have occasions at level 1 nested within individuals. The residual at the individual level will give 21 differences one fro each individual around and intercept and the 98 by 21 occasion residuals will give differences from the individual overall mean (intercept plus individual residual) for each and every occasion.
You can then put in variables to try and account fro this differences in the fixed which may be a function of continuous time ( linear or quadratic etc to get the overall trends), categorical time (a dummy for weekend or morning) and dummies distinguishing the three groups. You may also need interactions eg a linear rend by group interaction but 21 is not many for this type of modelling.
It is also usual to fit a random-slope model whereby the differences between individual may increase over time. A potential sequence of models is given in my answer to this question.
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I have read in the literature LMM/GLMM is preferred to repeated measures methodology. I would like to know the specific advantages of LMM/GLMM over repeated measures with citations. TIA
Relevant answer
Answer
I would like to add that a key feature in the difference is also the nature of the data, as mixed models can handle missingness (imbalance] under MAR assumptions and non-normal including discrete responses.
  • asked a question related to Longitudinal Data Analysis
Question
7 answers
I have longitudinal data with 2 follow ups at year 1 and 3. At baseline we have measure of endothelial function and we want to calculate the relation ship between endothelial function and diabetes type II. at follow up, the incidence of diabetes is recorded with self reported diagnosis or use of medication as the follow up was only done through interviews and questionnaire. we dont have the exact date of incidence of diabetes. which analysis method will be best to use for this type of data. survival analysis or mixed model? I understand for survival analysis I would need exact date of diagnosis of diabetes. but is there other assumptions that can make it possible to use it?
Relevant answer
Answer
The reflex here is to put in age as a covariate. The alternative, which makes better use of your age variable, is to use age as the time variable in a survival analysis. Participants enter at the age at which they were first surveyed and exit at the last survey age. This allows you to calculate the effect of your risk factor using a hazard function that starts at the earliest observed entry and ends at the last exit.
Ed Korn has a very good paper on this: Korn EL, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol. 1997 Jan 1;145(1):72–80.
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I have assessed (Trait measure of attribution - for positive and negative situations ) for 85 sports persons. They all went on to play a competitive match. After the match (T1), they filled a state attribution questionnaire based on the result of the match (win/lose). Total number of winners were 51 and 34 lost their match. They all went on to play their next game (T2) and filled the state attribution scale, based on the result of the game (Win/lose) again. From the 51, who won their match at (T1), only 35 won and 16 lost the second game. From the 34, who lost their game at (T1), only 16 won their second game and 18 lost again.
I want to see whether attributions remain same? (trait-state-state)
whether one behaves like their trait attributions?
Any help regarding statistical approach would be appreciated.
  • asked a question related to Longitudinal Data Analysis
Question
16 answers
Question edited:for clarity:
My study is an observational two-wavel panels study involving one group samples with different levels of baseline pre-outcome measures.
There are three outcome measurements that will be measured two times (pre-rest and post-rest):
1. Subjective fatigue level (measured by visual analog score - continous numerical data)
2. Work engagement level (measured by Likert scale - ordinal data)
3. Objective fatigue level (mean reaction time in miliseconds - continous numberical data)
The independant variables consist of different type of data i.e. continous numerical (age, hours, etc), categorical (yes/no, role, etc) and ordinal type (likert scale).
To represent the concept of recovery i.e. unwinding of initial fatigue level, i decided to measure recovery by substracting pre-measure with post-measure for each outcome, and the score differences are operationally defined as recovery level (subjective recovery level, objective recovery level and engagement recovery level).
I would like to determine whether the independant variables would significantly predict each outcome (subjective fatigue, work engagement and objective fatigue).
Currently i am thinking of these statistical strategies. Kindly comments on these strategies whether they are appropriate.
1. Multiple linear regression, however one outcome measure i.e. work engagement is ordinal data.
2. Hierarchical regression or hierarchical linear modelling or multilevel modelling, but i am not quite familiar with the concept, assumption or other aspect of these method.
3. I would consider reading on beta regression (sorry, this is my first time reading on this method).
4. Structural Equation Modelling.
- Can the 3 different type of fatigue measurement act as inidcator to measure an outcome latent construct of Fatigue?
- Can the independant variables consist of mix type of continous, categorical and ordinal type of data
Thanks for your kind assistance.
Regards,
Fadhli
Relevant answer
Answer
Hello Fadhli,
Don't worry about your grammar. I should have been more careful, too.
I think you have enough people for your analyses.
I am attaching an article from a highly respected methodologist / statistician that should reassure you about your work engagement variable being able to be regarded as at the equal-interval level for purposes of analysis. It has some highlighting that I placed in it. I hope that's OK.
Robt.
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
Hello readers
I am studying attributions in sports. I have a trait like measure which assesses attributions from six dimensions (Internal-External, Stable-Unstable, Global-Specific, Controllability, Intentionality). I also have state measure for attributions, which i assessed after two performances t1 and t2.
I want to know/test, people who are optimistic (trait) remain optimistic (state) irrespective of the result of t1 and t2 and people who are pessimistic remain pessimistic.
Can any one help me with the suitable analysis.
Any help would be appreciated
Relevant answer
Answer
اختبارات البارامترية
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
For example, I calculate a firm's cash ratio by divided cash / total asset, but both two variables is nominal, is the ratio comparable over years?
Relevant answer
Answer
Dear Wang
It depends on the accounting standards employed, that is, market-value accounting or historical cost accounting. That and the inflation rate for different assets, over time, may vary. For example the land prices may be appreciating at a faster inflation rate versus plant and equipment values. The greater disparity in asset inflation rates will introduce a higher degree of distortion when analyzing and interpreting the financial ratios.
  • asked a question related to Longitudinal Data Analysis
Question
6 answers
I currently have a micro (5 years) panel data of house price transaction. There are a total of 800 different high rise properties (name of apartment/condo) and 30000 transactions across the 5 years. However, the data is really unbalanced as residential properties have different numbers of transaction in a year. Also at times, a particular year does not have transaction of a particular property.
I am unsure as how I should proceed with the analysis. Should I code the property names before running the analysis or should I aggregate the values and independent variables by district and recode the districts instead?
Relevant answer
Answer
The chapter for Unbalanced Panel of Badi Baltagi Book is good, also you will need to use indicator variables to avoid loosing information. Moreover, you need first to be sure that missing data is not due to self selection, i.e. the selection variable is not correlated to the idiosyncratic term of your model,
  • asked a question related to Longitudinal Data Analysis
Question
6 answers
I am running a longitudinal study, where I collect data at 8 time points using many different questionnaires. Some questionnaires are measured at T1, T3, T5, and T7; other at different time points (T2, T4, T6, T8).
I tried mixed model in SPSS, but every time I did not get any results because of the missing data
Many thanks
Relevant answer
Answer
Hi maria
No, not yet, I could not find a suitable solution
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I have a genome wide methylation (IlluminaEPIC methylation array) data through multiple time points, from individuals classified into two distinct clinical outcome groups. I want to see the variation of a specific probe across different time points of a single individual (intraclass) and also between individuals (interclass).
Relevant answer
Answer
Thank you all for your answers. I am from a non-statistical background, and all your suggestions have been extremely helpful. Thanks to Kelvyn Jones particularly for especially suggesting the book on longitudinal analysis. I needed to know the relevant terms first!
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
Need assistance in determining the lagged effects in a time series of four weeks using R. I need to determine multilevel estimates of the models predicting a lagged effect. What is the syntax that would do this?
Relevant answer
Answer
Have send you a PM :-)
  • asked a question related to Longitudinal Data Analysis
Question
21 answers
in the analysis of shear lag of prestressed concrete box  girder based on the energy variational method, there exists two viewpoints of the assumption of the flange longitudinal displacement :
1)U(x,y)=hi[w'(x)+f(x)u(x)];
2)U(x,y)=hi[-w'(x)+f(x)u(x)];
my question is which is right ?
Relevant answer
Answer
Good question.. Please share me the best answer might you trust...
Regards…
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
My exposure is a continuous variable that has been measured at 9 follow-up examinations in each participant.
My outcome is a also a continuous variable that has been measured only once: at the end of study.
I would like to test whether changes in my exposure over time is related to the outcome.
What are the available statistical tests to evaluate such relationship?
I have used mixed model previously to test the association between an exposure at time x and change in outcome over time. In this case, the outcome was measured multiple times over time while the exposure was measured only at the beginning of study. I wonder if mixed model would work for modeling the changes in exposure over time as well? or are there better statistical tests to answer my question?
Thank you
Relevant answer
Answer
Dear Faye Anderson,
Thank you for your suggestions.
I am aware of the tests that you suggested, however, I was wondering if there are any specific statistical tests that would allow modeling the changes in exposure over time.
For example, I have used mixed model previously to test the association between an exposure at time x and change in outcome over time. In this case, the outcome was measured multiple times over time while the exposure was measured only at the beginning of study. I wonder if mixed model would work for modeling the changes in exposure over time as well?
Thank you
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I'm looking at the impact of a policy change on crime rates, but there are two policy interventions over the time span I'm studying. Would it be appropriate to do a time series analysis with two interventions, or would an ANOVA be better?
Relevant answer
Answer
Yes. You can set your DVs to match Your time frame periods and apply a ANOVA.
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I would like to conduct multiple imputation of missing values in a 3-wave dataset, however, the percentage of cases with missing values is high - approximately 70%. Despite the high proportion, a lot of cases comprise missing vase only for one wave. Is there a solution to this situation or are these data suitable only for complete cases analysis?
Relevant answer
Answer
I found this paper useful
Lee JH, Huber Jr J. Multiple imputation with large proportions of missing data: How much is too much?. InUnited Kingdom Stata Users' Group Meetings 2011 2011 Sep 26 (No. 23). Stata Users Group.
  • asked a question related to Longitudinal Data Analysis
Question
8 answers
I have a medical longitudinal retrospective dataset, records between the observation period of 2000 and end 2016. For many reasons not every medical record spans that entire time-frame, e.g. the patient may have died, or they may have transferred in to the study half way through or transferred out at some stage.
A particular event (or exposure) is seen as a clinical event e.g., going to the doctor and saying or being told that you have a particular disease, e.g., a chest infection. That patient will also have a categorical variable to indicate whether they are a smoker or not.
I wish to count the frequency of chest infections per patient and distribute them over whether they smoke or not. I can imagine this would be a box plot with UQ and LQ being defined, frequency of disease on the Y, and a Smoke YES and NO on the X. This would be very easy to do. The problem I have though is that I am not sure how I deal with medical records of varying length. Surely there is bias if a smoker vs. non-smoker both have twenty chest infections, but there is a four year medical record difference?
Thanks
Relevant answer
Answer
You are about to discover the concept of incidence density! This is the rate of events per unit time. In your case, the rate of events per 100 or 1000 person-years.
You can tackle the problem using Poisson regression, with length of observation set as the exposure time variable.
Alternatively, you can treat the data as time-to-event data with repeated events. This has the advantage that the probability of chest infections probably rises with age. You can use age as the time variable in the analysis, with subjects entering at the age they were first seen and exiting at the age of last follow up. In order to avoid immortal time bias, you need to declare them to be at risk from some point, say from age 18.
This is pretty straightforward in Stata, if you have it.
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
Brief background: I’m examining mediation rates in China. I have a panel dataset with N=24 provinces and T=30 years (1985-2014). For each province-year, I observe mediation rates and host of economic/demographic information.
Anecdotal reports suggest that in 2006 or soon thereafter, the Chinese government began bolstering its mediation system and encouraging its use. My goal is to test this assertion. Unfortunately, I’m unable to quantify the “effort” the government exerts in promoting mediation.
What is the most convincing way to test this assertion? My ideas are listed below. Please evaluate. For all ideas, assume that I am regressing mediation rates on variables thought to influence mediation rates.
  1. Include a time polynomial (such as year and year^2) in the regression. If year is negative and year^2 is positive, this is evidence of a parabolic trend, even after controlling for other factors. Proceed by determining whether the minimum of the parabola is around year 2006.
  2. Include a lagged dependent variable in the regression, that is, a lag of the mediation rate. Perhaps the best measure of “effort” is the previous year’s mediation rate. After controlling for other factors, I can determine whether this effort proxy is positive and significant.
  3. Include year fixed effects. After these effects are estimated, graph their magnitudes against time. Progressively increasing year fixed effects after 2006 would indicate more effort.
  4. Include a linear time trend with a kink at 2006. Determine whether the post 2006 trend is significantly larger than the pre 2006 trend. Unlike the other ideas, this one assumes that we know where the change occurs.
Is there any reason to use a combination of these ideas? Do you have other ideas? Please suggest. Thanks for your help!
Relevant answer
Answer
Hi Vladislav,
Your response is incredibly helpful. Thank you!
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
dear all,
I have a dataset with almost 2 millions observation nested within (European) countries. My DV is probability of weekly religious practice.
I want to disentangle between Age, Period and Cohort effects and there is the weel-known identification problem.
Given that I have so many observations and a quite wide time span (year from 1970 to 2015, cohort from 1900 to 2000, age from 15 to 100) what is the best strategy to apply?
I know this is a very broad question and that there is a huge debate behind but I really need to collect some opinions about this.
Thanks in advance!
Francesco Molteni
Relevant answer
Answer
You may want to look at this Proect on Research Gate
you will see that we are very specptiacl amout autmatic fail safe proecures, but this shows what can be done
  • asked a question related to Longitudinal Data Analysis
Question
2 answers
I have longitudinal data (repeated measures on each subject over 't' time periods) with a binary response variable & various continuous/categorical covariates. I wish to build a forecasting model that tells the outcome for the time ahead: t+1, t+2... etc, while simultaneously regressing on the predictors, until time t. 
I want my model to use the information from the covariates at present time t, to forecast the response for the time ahead.
I believe that my model will predict the outcome with a probability associated with it, something like a Markov model + regression, that gives the state transition probability, also taking into consideration the covariates that affect the state.
Any help on how to structure the problem and/or implement it in R/SAS will be helpful.
Relevant answer
Answer
you can use the cquad R-package to analyze the data having binary response variable
see the details in 
or just install cquad package in R and seek help and analyze your data, 
the forcasting will be done if you have some values available on covariates using "predict" function.
  • asked a question related to Longitudinal Data Analysis
Question
2 answers
For example, would 2 bursts with 5 measurements in each suffice?
Relevant answer
Answer
For general orientation on this - see
You can use the following software to do the power calculations by simulation
calculationshttp://www.bristol.ac.uk/cmm/software/mlpowsim/
the manual is a very good primer
and Optimal Design for Longitudinal and Multilevel Research Documentation for the “Optimal Design” Software
Whether 2 bursts by 5 will work depend on the process and what you want to know  and  you will know about that more than we will - if the process is volatile your plan  seems to be quite limited. Moreover  two bursts seems quite few to characterise longer term change. But you can use MLpowSim to simulate plausible values and see what you can reasonably detect
.There is not much out there on this design but see
and 
it only used two bursts
  • asked a question related to Longitudinal Data Analysis
Question
7 answers
Respected Researchers
i have panel data of 1252 firm years observations with 182 firms and time period is 2010-2016. i have 14 independent variables including 3 control variables, 1 mediator and 1 dependent variable. i want to use STATA for testing direct and mediation models. i have 11 direct hypotheses from independent variables to dependent variables and 11 mediation hypotheses.
Data: Unbalanced panel 
QUESTIONS:
1) which test should be implemented other than the selection between fixed and random effect?
2) how mediation analysis can be performed?
Relevant answer
Answer
The 'MEDIATION' package written by Hicks & Tingley may be more up-to-date than 'medeff'. Type "findit mediation" or search for it under the help or use the link below.
  • asked a question related to Longitudinal Data Analysis
Question
8 answers
I am trying to model the longitudinal APIM for categorical dyadic data where the dyad members (members of 2-Person-Households) are not empirically distinguishable regarding the outcome variable. Extensive research for literature on such analysis was unsuccessful but I am hoping that someone has written something about it and I have just not been able to find it. Any suggestion or advice is well appreciated!
Relevant answer
Answer
Marie:
I think I have figured out how to do this fairly "simply."  See the attachment which shows how to do so in R and SPSS for a growth curve model, but it can be adapted to other sorts of models.  Email me at david.kenny@uconn.edu if you have questions.
Take care,
Dave
  • asked a question related to Longitudinal Data Analysis
Question
2 answers
I am trying to estimate a 2-2-1 mediation model where the IV is a level 2, the moderator is a level 2 and the dv is a level 1 variable.
There is some variation in practices and it seems that Preacher et al (2010) suggest using a MSEM approach that is available in MPLUS. Any idea whether this is doable in STATA 14? And any suggestion in terms of the code (ml_mediation or xtmixed, mle)?
Much appreciated.
Amedeo
Relevant answer
Answer
  • asked a question related to Longitudinal Data Analysis
Question
1 answer
I'm currently preparing for initial data analysis and I am having problems merging my data files for longitudinal data analysis. Please how can I go about this using SPSS?
Relevant answer
Answer
There are several ways to do it. Under the heading data you can merge files by adding cases or variables. However you do not provide enough information to help you. When you add cases names and types of the variables to be joined should be identical. For instance sex and sexe will not merge as the names are different. When adding names can be changed and made identical. That will not do for variables having different types and formats. If an item in one file is F1.0 it should be F1.0 in all files to be merged. If a variable is a string of size 4, it should be a string of size 4 in all files. If you want to add variables, the identifying variables (person identifiers) should be identical. Then it will be different if different files have cases that may not be joined. It depends if you need what in database would be inner or outer joins. Do you what you need to do?
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
Dear all,
I am looking for a way to use (Bayesian) model averaging in a multi-level context? 
With which software could this be achieved? MLWin? MPlus? Do you have suggestions for references?
Thanks a lot in advance!
Best
Relevant answer
Answer
By expressing the model in the form of a matrix where only some of the places are occupied by terms that are aggregates of the various money-flows of the diagrammatic model for the whole system, one can then expand on the combined row and column or interest so as to better investigate the part of the Big Picture of concern.
This model is shown in my book "Consequential Macroeconomics" which I saved as an e-file on LINKEDIN. In Appendix C of that document, I provide an example of this treatment of the matrix.
  • asked a question related to Longitudinal Data Analysis
Question
7 answers
I am struggling with structural equation modeling/confirmatory factor analysis in SAS using the PROC CALIS procedure, which I'm new to using. My goal is to create different factors of neighborhood conditions based on variables such as SES measures from the census, crime indices, green space, physical conditions (cleanliness, noise), and social environment. The problem is the model fit is not good and I'm not sure how to figure out covariance structures among the factors or variables. If anyone can give me some advice or direct me to a good resource, I would appreciate it. Thanks!
Relevant answer
Answer
To enhance the model fit in CFA  we need to ensure that correlation between variables are significant ( Moderate to high). This is the first step. In SEM, to enhance the model fit we retain the most significant factors. I am not familiar with SAS. I did my SEM using AMOS. There we have modification indices which we use to enhance the fit. Your RMSEA is slightly higher i believe if you can remove any factor or variable which is not having much significance ( It means the factor's contribution to the model is less) the model fit may increase. SRMR seems to be fine. 
Reference : Hu and Bentler (1999) empirically examine various cut-offs for many of these measures, and there data suggest that to minimize Type I and Type II errors under various conditions, one should use a combination of one of the above relative fit indexes and the SRMR (good models < .08) or the RMSEA (good models < .06). RMSEA -0-05-0.08 is a reasonable fit. ( Browne and Cudeck,1993)
Refer : Reporting Structural Equation Modeling and Confirmatory Factor Analysis Results: A Review by
James B. Schreiber, Amaury Nora, Frances K. Stage, Elizabeth A. Barlow, and Jamie King
  • asked a question related to Longitudinal Data Analysis
Question
2 answers
Hi,
Imagine you have measured two variables X and Y at two points in time. You want to predict Y2 from X1 and also control for the autoregressive effect (= temporal stability) of Y1 and control for the correlation of X1 and Y1. My question: Is there any statistical reason that makes it necessary and/or advantageous to implement a full CLP, that is to include the pathes X1 -> X2, Y1 -> X2, and the correlation X2 <-> Y2?
I am not interested in reciprocal relations between X and Y. I just wonder whether including these additional pathes has an impact on my path of interest (X1 -> Y2) and if so, why? I'd also be glad if oyu could provide a reference.
Relevant answer
Answer
Dear Philipp,
thanks for correctly completing my question ;)
And even more for your answer. I totally agree, but it feels good to be more confident now.
All the best,
Johannes
  • asked a question related to Longitudinal Data Analysis
Question
1 answer
I want to estimate dynamic factor model in STATA software. The problem is that my dependent variable has quarterly frequency and the independent variables (which I want to merge into the latent variable - factor) have monthly frequency. I set the dependent variable to be known every third month of the given quarter but STATA gives me error that there are missing observations in the dependent variable. Is there any solution how to estimate DFM with mixed-frequency data in STATA? Thank you.
Relevant answer
Answer
I think that the STATA error message lies in the lack of correspondence between a quarterly series (the dependent variable) and the monthly series (the independent variable). To avoid this situation the variables should be both quarterly or both monthly. STATA demands such a correspondence between the two variables.
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I have daily data from Jan/1/2008 to Jan/1/2012 i would like to create dummy variable for the whole period after a specific date that is after March 2011, in addition i would like to create another dummy variable for the period from March 2011 to June 2011,
How to do that using Stata 13
Thanks in advance
Relevant answer
Answer
There may be a more efficient way to do this, but here is one solution. (NOTE: I am assuming your date variable is in stata's date format).
You can generate a variable for month and year:
gen year=year(date_var)
gen month=month(date_var)
gen dummy=(year>=2011)
replace dummy=0 if dummy==1 & month<3 & year==2011
The same idea can by applied to the other period as well.
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
non stationarity is sometimes considered in the sense of econometricians as unit root problems 5PCB Philipps et al.).
A question would be to unify "reasonable" notions of non stationarity, eg local stationarity (eg Dahlhaus and followers), periodic or seasonal (eg Dickey, Mohamedou Ould Haye, Viano, Leskow), unit roots.
For periodic behaviors an additional attractive question is to fit periods (even if the continuous time case is even more relevant and a problem is then to fit periods
Many other non stationarity notions, such as random walks in random environment (eg Snitzmann) are extremely attractive: Anyway they dont necessarily features of real data
The idea of the present project is more to develop (non parametric or parametric) models featuring the main properties of real data sets to be fitted
Certainly the validity of techniques should be provided from real data analysis as this was emblematically done in a contradictory paper by Mikosch and Starica opposing long range dependent models to models with a linear trend
Relevant answer
Answer
Thanks for being more ellaborate.  As a teacher I usually assign real data analysis to students.  As and when I come accross some atypical behaviour, I will communicate with you.
  • asked a question related to Longitudinal Data Analysis
Question
1 answer
I am doing longitudinal research.
There are two variables.
One was measured for three times, but the other one was observed for five times.
In other words, one variable was missing twice.
Do you think that I can develop the AR model with these variables?
Relevant answer
try to use the average of the three as 2 other variable to be 5
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I am conducting research on the institutional and community trajectories of people who have been housed in forensic mental health units at some point between 2005 and 2015. I have indicators for several domains assessed at several times (usually upon admission and every 6 months after that). There can be several admissions over time. The domains are psychiatric symptoms, past and recent negative life events, behaviour problems in the institution, treatment use and control interventions. The latter two are outcomes, the first three domains are predictors.
Do you have any recommendation as to what analysis to use, software, etc.? Since all my indicators are categorical and there are many per domain, I figure I have to do some kind of data reduction procedure (Latent class models?) to get at the domains. Any ideas are welcome. 
Relevant answer
Answer
You may want to look at these slides for some orientation
Multilevel discrete-time event history analysis
  • asked a question related to Longitudinal Data Analysis
Question
4 answers
I am planning a longitudinal study with writing data taken from between 20-40 second language (L2) participants at at least 5 time points. These participants will vary according to age, gender, and L1 background. The writing data will vary according to prompt and writing context (home or in class). The goal of my study will be to examine:
1. Growth in several measures of vocabulary use
2. Effects of learner and task variables on growth
I know LCM is probably out of the question, so are there any statistical methods I can use to analyze my data? 
Thank you for your help!
Relevant answer
Answer
I do not think 40 is too small for some sensible analysis
Have a look at this
 Optimal Design for Longitudinal and Multilevel Research:
Documentation for the "Optimal Design" Software 
which will allow you to do the power calculations
and here is relevant paper
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
Hello,
I am interested in testing a model using a cross-lagged panel design; both X and Y are measured at T1 and T2. I think the XY and YX associations have the potential to be nonlinear, both within and across time points. Is it possible to integrate quadratic X and quadratic Y into a cross-lagged panel design (for example, something like the attached figure?). If so, does anyone have any published examples where this approach has been used?
Thank you!
Melissa
Relevant answer
Answer
Melissa, in general this is possible. However, I would not square the dependent variables, but only add squared independent variables to your model.
Here is an article that may be helpful for you:
Selig JP, Preacher KJ, Little TD. Modeling Time-Dependent Association in Longitudinal Data: A Lag as Moderator Approach. Multivariate Behavioral Research. 47: 697-716. PMID 24771950 DOI: 10.1080/00273171.2012.715557
Regards, Karin
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I am analyzing some longitudinal data using mixed models (lmer in lme4). Sampling date is a within subject covariate. I also have a two level factor where each subject is one of the two levels and a continuous between subjects covariate. I get a significant interaction between the between subjects covariate and the two level factor and am not quite sure how to interpret this. Does it mean that the slope of the relationship between the response and the between subject covariate is significantly different for each level of the factor? Or does it simply mean that the effect of the covariate evaluated at 0 is significantly different for each level of the factor (implying nothing about the slope relationship)?
Relevant answer
Answer
The easiest way to appreciated  this is to make some predictions and draw a plot.
Create a dummy for you factor with same codes as in lme4 - create a set of typical values for your covariate - say quintiles and produce the interaction as a new variable.
Then use the multilevel estimated equation to make some predictions
Yhat = Intercept + Coef1*Factor dummy + Coeff2* Covariate + oef3* interaction.
Then plot Yhat against Covariate with 2 lines distinguishing factors.
It is even better if you make the predictions with say 95% confidence intervals
This MLwin manual shows our customized predictions facility which does this for you easily and you can set other variables  at say their mean value - it does this by simulation as the are extra complexities when the response variable is discrete
  • asked a question related to Longitudinal Data Analysis
Question
7 answers
I have a dataset of 140 patients equally divided into 3 groups. The dependent variable is "moca" and can take integers between 0 and 30. It is a longitudinal study with a total of 4 time-points (variable time is "timepoint"). The group defining variable is "status". The independent variables are "age at recruitment", "education", "sex". I am interested in differences in the behaviour of moca across time in the 3 groups. The variable defining different patients is "recordID"
This is the command that I used:
mixed moca i.status i.sex education ageatrecruitment timepoint || RecordID: timepoint
So my questions are:
1- does the command I am using make sense?
2- with this model the lines I get for each group are parallel (i.e. have the same slope and differ only for the intercept). I suspect they might have different slopes though, how could I test for that? My guess would be to run the same command separately for each group (using the option by), but then how could I compare them with a test (the suest command doesn't work with the mixed command).
3- my dependent variable is limited to integers between 0 and 30 and it might be that my regression line tends to an asymptote at 30, how could I test for that/implement this in my model?
I apologise if my question is unclear/too long.
Relevant answer
Answer
thank you again Kelvyn, I found the book in my hospital library and will dig in it!
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I have longitudinal data on registered arthropathy diagnoses for my study subjects including main group and a comparison group. As the data include 22 years of follow up, I want to see the pattern of incidence across age groups. I am looking for a stata command/module that does it automatically for me. I know how to do it manually but that takes some time and is not the best choice for me. Any guidance from my experienced colleagues is very much appreciated. 
Relevant answer
Answer
Dear Mehdi,
I presume you have each patient followed from diagnosis until some event, and that the follow-up length varied for each patient, with right-censoring and late entry (ie, by longitudinal you mean the sample cohort being followed and not different samples being surveyed over time).
If yes, one way to do that is first:
1 - stset the data (by time or date, and other options)
2 - use the strate / stmh
For age-specific incidences, you split the data first
3 - Use the stsplit in agebands
4 - Use strate / stmh
Notes:
1 - when using stset, pay attention to each scale to you want, eg: scale(365.25) - year; scale(12) - months
2 - be carefull when using stsplit, it is very powerful, but you can messy things
3 - strate allows you to obtain rates per(1000) per(100000) and the unit depends on the scale in stset
I hope it helped
some tutorials in the links
  • asked a question related to Longitudinal Data Analysis
Question
1 answer
I have found the only relation between moment, thickness, and connectivity but I guess it is only for longitudinal method.
Relevant answer
  • asked a question related to Longitudinal Data Analysis
Question
8 answers
I am working on a longitudinal study with 140 participants divided in 3 groups. The participants were assessed every 2 years from 2010 (4 time-points in total), so time-points are equally distributed, but there are some dropouts, so some patients are missing some time-point. 
The assessment consisted of some tests, the results of which are discrete numerical variables (e.g. one of these is the MoCA test, which is a cognitive test with different tasks and for each task the participant is given a score; the final score is the sum of the partial scores).
My goal would be to show any difference between groups in the progression of the scores through time.
After some readings I am thinking to use a mixed effect model with the random part on the single individual level and the fixed part on the group level, would that make sense? What other statistical model could I use?
Relevant answer
Answer
I agree with both Cauane and Georgio above.
You are dealing with a multi-level analysis of panel data with 4 repeated measures.
When it comes to the wonderful stats package, R, it's a fair bet that someone has faced a similar problem and shared their solutions. See the link attached: Multilevel analysis: panel data and multiple levels.
  • asked a question related to Longitudinal Data Analysis
Question
7 answers
I am trying to analyze data (self-concept and testscores) of students  before and after transition from primary to secondary education. The aim is to show the impact of individual achievement and class achievement on self-concept both before and after transition: I hypothesize (1)  that individual achievement has a positive impact on self-concept and class achievement a negative one, (controlling for individual achivement) and more important (2) that after transition to secondary school, class achievement of the "old" class before transition does no longer have it's negative impact on self-concept measured after transition.
Now I do not know how to set up a model, for students change classes with transition and therefore are nested in two different groups - their classes - before and after transition.
Does anyone have an idea how to set up a model that allows to analyse these questions or has anyone done some similar analysis?
Thank you very much for your answers!
Relevant answer
Answer
Thank you again for your answers. Unfortunately, I am still struggling with my data. I do not want to bother you anymore with my questions, still trying to tell you what I intend to do now.
Because we measured self-concept as a latent variable  by several indicators I think I need to do a two-level CFA controlling for measurement invariance between the two levels.
If this works I realized that I probably have to run a contextual effects model. The reason for this is, that I have on both individual and class level the same predictor (testscores) and want to see what impact class means of test scores do have on self-concept beyound individual test-scores (see Marsh et al., 2009).
At the moment I think I will then just try to show
(1) that there are contextual effects  of class achievement at t1 on self-concept at t1 for both groups of students (those with and those without transition, in seperate models).
(2) that there is a contextual effect of class achivement at t1 on self-concept at t2 only for those students wihtout transition to secondary school after t1.
(3) hat there are contextual effects of class achievement at t2 on self-concept at t2 for both groups of students (those with and those without transition, in seperate models).
More sophisticated analyzes seem to exceed my competencies at the moment.
Marsh, H. W., Lüdtke, O., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2009). Doubly-latent models of school contextual effects: Integrating multilevel and structural equation approaches to control measurement and sampling error. Multivariate Behavioral Research, 44, 764-802. 
  • asked a question related to Longitudinal Data Analysis
Question
8 answers
I have got panel data of 200 companies for 10 years, i.e. 2000 observations. For example, if I collect macro-economic data of GDP (per year data) for 10 years it will be 10 observations . How do I regress my panel data over macro-economic data? Which econometric/statistical model will be used and how?
Relevant answer
Answer
The actual placing of the GDP observations in the spreadsheet of your Panel depends on how it is arranged.
Where we have added Macro-economic avariables to a panel analysis we have just added them as another column in the panel.
If the Panel is by company and several vaiables just add it as the last variable for the correct year; this involves 10 entries for 1 company.  Repeat this for each company.  It is a bit repititous but won't take to long for 200 companies.  If you can write a little visual basic it is even quicker.
As final comment you might want to use some other measures like inflation, unemployment rate, exchange rate.  Best to collect them all and do it at once.
  • asked a question related to Longitudinal Data Analysis
Question
3 answers
I have received a comment from reviewer about random intercept model. He recommended me to compute for random intercept and trend model rather than random intercept model based on RD. Gibbons et. al , 2010) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2971698/pdf/nihms236714.pdf recommendation . But i was a bit confused about these terminologies? Does he mean that I should consider for "random slope"? when he suggest "random intercept and trend model"? If random slope and random intercept and trend model is the same, I have already tested the lrtest and opted for random intercept model before based on the recommendation while i wrote the manuscript.
If you still believe that random intercept and trend model and random slope are different, what is the command in stata taking in to account random intercept and trend model? is it different from random slope?
I appreciate for your help
Relevant answer
Answer
Sorry but i do not use Stata
This might help as it has a lot of syntax for different packages
  • asked a question related to Longitudinal Data Analysis
Question
6 answers
Although it is interesting to run complex interactions such as categorical by categorical by categorical interactions. Little information is available on how to interpret them. Can anyone recommend a good book or other useful resources?
Relevant answer
Answer
Allison,
Consider . C. Ai and E. C. Norton, "Interaction Terms in Logit and Probit Models."  Economic Letters 80 (1): 123-129.
With good wishes !
  • asked a question related to Longitudinal Data Analysis
Question
5 answers
I would like to flexibly model the development of some continuous outcome of interest as a nonlinear function of age, using longitudinal data with rather strong imbalance (i.e., most individuals cover only a small fraction of the age range of interest with their measurements, although the whole set of measurements is covering the entire range), including an estimate of uncertainty (e.g. confidence bands around the estimated function). Which method would you recommend for this purpose?
I used LOESS to get an idea of the overall trend but I don't know whether the error estimates can account for the correlation present in longitudinal data. Also, I would be interested in both "population average" estimates (such as in marginal models / GEE models, but I don't know if there are any extensions which allow flexible modelling of nonlinear relations) and in "individual-specific" estimates (maybe using Generalized Additive Mixed Models, but since I do not really understand them, I cannot gauge their appropriateness for this situation)
The attached text file contains R code for the creation of a fictitious dataset which illustrates the kind of data I am interested in (long format, variable "id" indicates person).
Thanks for any help / suggestions!
Relevant answer
Answer
Consider building your foundation on Singer and Willett's classic text: https://www.amazon.com/Applied-Longitudinal-Data-Analysis-Occurrence/dp/0195152964. The multi-level growth model is quite flexible. The good folks at UCLA have worked out the accompanying R code: http://www.ats.ucla.edu/stat/examples/alda/