Science topic
Longitudinal Analysis - Science topic
Explore the latest questions and answers in Longitudinal Analysis, and find Longitudinal Analysis experts.
Questions related to Longitudinal Analysis
I have 15 treatments. My main interest is to find the best treatment. The response is measured every day up to 30 days. My model has an interaction effect between time and treatments. I use suitable effect size, but I need power of 80% with type I error 5%. How can I calculate sample size by simulation?
I am searching for a step-by-step procedure explaining all the commands and possible commands and models when using the mixed models command in SPSS (I have unbalanced longitudinal data).
I am conducting an intensive longitudinal analysis with multilevel models.
I have some participants that don't have any variation on the predictor variables. Of course they don't really provide any useful information to the model that can be analyzed.
But I have also heared, that keeping such cases in the dataset could introduce bias to the results. However, I can't find any literature on this topic.
Does anyone know a source that discusses this and recommends excluding these cases from the sample?
Hello,
I am interested in how mental health evolves over a 10-year period (N=10000). My analysis plan is to first factor out age, gender, and other demographics in each wave, then take these residuals into a liner model with years as predictors. Does it sound sensible to do so?
Best,
Are there any literature available on how much time should lapse to be considered a longitudinal study?
I have 2 seperate data sets that I need to combine, time point 1 and time point 2.
Not all participants in time point 1 are in time point 2 (i.e., attrition, etc.). So I will need to know how to match participants, and keep the duplicates. Too many instructional videos tell you how to remove duplicates.
Next, I need to structure of the time points to be on top of each other, by columns. Meaning, I need time 1 to be above time 2, not next to each other. Again too many videos tell you how to add rows, (i.e., dplyr left join, right join), its very hard to find those that teach you how to add by columns. I want the data structure to be suitable for longitudinal analysis, or at least some form of repeated measures - where adding data sets to left or right may not work.
Please help! Either in R or excel!
Meta-analysis is a good method to estimate the average correlation between two variables based on previous findings. When examining the associations between two variables, we also want to figure out the direction of the relationship (or in other words, who influences who). Is it possible for us to infer the causal direction by examining the moderating effect of the measurement order of two variables in a meta-analysis (i.e., comparing the correlations measured between two variables when one was measured before or after the other, or when they were measured concurrently)? The cross-lagged panel model states A to be the likely cause of B when the correlation between A measured at an earlier point and B measured at a later point is larger than the correlation between B measured at an earlier point and A measured at a later point (Kearney, 2017). Following this logic, can we say that, in a meta-analysis, if the correlation was more negative when A was measured after B than before or concurrently, A may be the cause of B and play a dominant role in the relation? If yes, are there some published examples? Thank you very much!
What statistical test should I use to investigate how changes in variables A (continuous; defined as A at Time2- A at Time1) affect changes in Variables B (continuous; defined as B at Time2 - B at Time1) using longitudinal data? I really appreciate your help.
I have collected data at 3 different time periods. For the 1st time period, I have analysed using SEM (SPSS + AMOS). Subsequently, I have collected data for 3 different time periods (say each year). What type of statistical analysis can be done for drawing inference ?
Hi all,
I'm analyzing a set of longitudinal data obtained from subjects at regular time intervals following an intervention. I want to look for the effect of time, but I also want to adjust the observed time effect for age and sex of subjects. My initial thought was to run a linear model like this:
lm(Signal ~ SubjectID + Time + Age + Sex)
However, age and sex are both attributes of subjectID, and, therefore, are not independent covariates. Should I rather simplify the model like this:
lm(Signal ~ SubjectID + Time) ?
Any suggestions are much appreciated.
Thank you very much!
I have a dataset where the same survey was sent out to a community of individuals over the course of seven waves. I will note that not all participants started at the same wave, nor did all participants complete all waves.
One question in the dataset asked participants if a particular event had occurred in their life with a binary answer format (yes/no). I want to look at if there are significant differences within participants on certain dependent outcome variables before vs. after this event occurred. I have used Excel coding to identify ~90 participants from this dataset who stated the event had not occurred in their life in their first wave, but stated the event had taken place in at least one subsequent wave. This means not all participants stated the event took place at the same wave.
Therefore, I'm wondering what would be the best course of analysis? I originally was suggested to do a repeated measures ANOVA, but I cannot neatly select two within subject variables when participants have different pre/post timepoints. It was also suggested I dummy code each case for whether the case occurred before or after the event for each participant, which makes sense, but I am unsure how to use this grouping variable in a repeated measures ANOVA (this task is also tedious by hand, but doable).
Hello!
We have a question about implementing the ‘Mundlak’ approach in a multilevel (3-levels) nested hierarchical model. We have employees (level 1), nested within year-cohorts (level 2), nested within firms (level 3).
In terms of data structure, the dependent variable is employee satisfaction (ordinal measure) at employee level (i) over time (t) and across firms (j) (let’s call this Y_itj), noting that we have repeated cross sections with different individuals observed in every period, while as regressors, we are mainly interested in the impact of a firm level time-variant, but employee invariant, variable (let’s call it X_tj). We apply a 3-level ordered profit model (meoprobit in Stata).
We are concerned with endogeneity issues of the X_tj variable, which we hope to (at least partially) resolve by using some form of a Mundlak approach, by including firm specific averages for X_tj, as well as for all other time-varying explanatory variables. The idea is that if the firm-specific averages are added as additional control variables, then the coefficients of the original variables represent the ‘within effect’, i.e. how changing X_tj affects Y_itj (employee satisfaction).
However, we are not sure whether approach 1 or 2 below is more appropriate, because X_tj is a level 2 (firm level) variable.
1. The firm specific averages of X_tj (as well as other explanatory variables measured at level 2) need to be calculated by averaging over individuals, even though the variable itself is a Level 2 variable (varies only over time for each firm). That is, in Stata: bysort firm_id: egen mean_X= mean(X). As our data set is unbalanced (so the number of observations for each firm varies over time), these means are driven by the time periods with more observations. For example, in a 2-period model, if a company has a lot of employee reviews in t=1 but very few in t=2, the observations in t=1 will dominate this mean.
2. Alternatively, as the X_tj variable is a level 2 variable, the firm specific averages need to be calculated by averaging over time periods. That is: we first create a tag that is one only for the first observation in the sample per firm/year, and then do: bysort firm_id: egen mean_X= mean(X) if tag==1. This gives equal weight to each time period, irrespective of how many employee-level observations we have in that period. For example, although a company has a lot of employee reviews in t=1 and very few in t=2, the firm specific mean will treat the two periods as equally important.
The two means are different, and we are unsure which approach is the correct one (and which mean is the ‘true’ contextual effect of X_tj on Y_itj). We have been unable to locate in the literature a detailed treatment of the issue for 3-level models (as opposed to 2-level models where the situation is straightforward). Any advice/suggestions on the above would be very much appreciated.
Hi,
I am testing this hypothesis:
The sustainability of IC at T1 and T2 is moderated by the difficulties to readjust at work after an international assignment.
{IC= Intercultural competences (Continous), difficulties to readjust (1= Yes/ 2= No )}. The sample size is 72.
For the continuous variables, I used a scale (7 Likert scales) to measure intercultural competence (IC) at two-time points (T1 and then T2 after 1 year). Then I computed the results for each time into one continuous variable.
I am using SPSS to test my hypothesis. I want to add that my data are not normally distributed. I tried to use multinomial logistic regression. Thus I converted the variables (IC T1 and T2) to categorical variables (High, Average, Low). I added IC T2 to the dependent variable and IC T1 to the predictors along with the moderator.
So, the data I have converting the continuous variable into categories:
Dependent IC T2 (nominal)
predictors: Independent IC_T1 (nominal)
Modertor= difficulties to readjust (1= Yes/ 2= No)
My question is, do you think what I have done so far is correct? If not, what are your suggestions?
And If you think it is correct, how could I interpret the results?
I am new to work with statistics, and this is part of my PhD research. I hope I can learn something from you about this issue.
Kind regards,
A few days ago a colleague of mine made me think about the impact the COVID19 crisis will have on cohort studies. Especially those focused on causes of mortality and the elderly will be deeply impacted by the number of deaths due to the pandemic.
Is this something manageable?
How big is this matter in your mind?
How can this be handled?
Hi everyone!
I'm doing a PhD in Clinical Psychology and I have some treatment data to analyse. The design is a 2 (condition: self-compassion, cognitive restructuring) x 5 (time: baseline, mid-treatment, post-treatment, 1-week follow-up, and 5-week follow-up) design. I had 119 participants randomized and engage in their respective interventions for a 2-week period, with follow-up assessments. The aim was to reduce social anxiety.
One analysis I'm trying to do is mediation, and preferably I would use a more simple strategy such as Hayes' PROCESS macro on SPSS. However, my understanding is that I won't be able to use all five waves of my data if I use PROCESS. Does anyone know if that is an appropriate strategy for multiple waves? Should I be using all information? And if so, how?
Hi Scholars, I am confused between theoretical & practical implications and also theoretical & practical contributions so, How to distinguish between them?
thanks a lot in advance.
Hi everyone, I need help for what analysis is better for this longitudinal study. In my study, I measured the attachment security of children at 5 time points. My dubts are that: 1) spacing between time points is not equal (T1 = beginning; T2 = after 1 month; T3 = after 2 months; T4 = after 6 months; T5 = after 15 months); 2) the total sample is 148 children but not all of them have the 5 observations/scores (T1 = 148; T2 = 140; T3 = 112; T4 = 20; T5 = 50) so there are many missing, especially at T4.
Aim: I would like to examine if attachment scores change significantly over time and if these are affected by other variables such as gender, age, etc.
My questions are:
- focusing on the first period of time, as preliminary analysis for T1-2-3 I used the Repeated Measures ANOVA, because the spacing between time points are equal (however, there are some missing and I lose some information). Then, I analyzed means with the Repeated Measures ANOVA analysis and Post-hoc tests (Bonferroni) with e.g. "gender" as between-subjects factor. Does that work?
- then the study has continued at T4 and T5. Which analysis can I use now? Does it make sense to quit T4 with so few subjects?
- what analysis considering the role played by other variables? Growth Curve Model?
Thanks so much
Most of recent books in longitudinal data analysis I have come through have mentioned the issue of unbalanced data but actually did not present a solution for it. Take for example:
- Hoffman, L. (2015). Longitudinal analysis: modeling within-person fluctuation and change (1 Edition). New York, NY: Routledge.
- Liu, X. (2015). Methods and applications of longitudinal data analysis. Elsevier.
Unbalanced measurements in longitudinal data occurs when participants of a study are not measured at the exact same points of time. We gathered big, complex and unbalanced data. Data comes from arousal level which is measured every minute (automatically) for a group of students while engaging in learning activities. Students were asked to report on what they felt while in the activities. Considering that not all students were participating in similar activities in the same time and not all of them were active in reporting their feelings, we end up with unstructured and uncontrolled data which does not reflect a systematic and regular longitudinal data. Add to this issue the complexity of the arousal level itself. Most of longitudinal data analysis assume the linearity (the outcome variable changes positively/negatively with the predictors). Clearly that does not apply to our case, since the arousal level fluctuates over time.
My questions:
Can you please specify a useful resource (e.g., book, article, forum of experts) to analysis unbalanced panel data?
Do you have yourself any idea on how one can handle unbalanced data analysis?
Hi Everyone,
I am currently looking at decline/change over time.
The data (unbalanced) is in long format with each individual having more than 2 assessments. Thus each individual has data in 2 or more rows depending on the no of assessments he/she has had.
I am uncertain about how to go about creating a variable on the basis of unique id to calculate the time difference between each individuals assessments considering that there are multiple assessments for each individual.
I am uncertain how to look at decline over time for each individual as well as average decline of each group.
I would be really grateful if anyone could possibly help me understand how to go about this or point me towards a good reference.
Hello, panel data professionals!
I have a longitudinal dataset containing yearly data collected for 20 companies over 10 years.
The dataset contains the dependent variable, one independent variable (the task is to find if this variable Granger causes the dependent variable), and 3 additional variables that can be potentially correlated with the dependent variable.
In summary, we have the following conditions:
- Non-stationary series (for each variable)
- Additional control variables
- Longitudinal data, T=10, N=20.
Which test for Granger causality would be appropriate in this case?
Tanks a lot!
I have been using mixed effect models to analyzing neuroimaging datasets with multiple scanning sessions per participant. All my previous models only included a random intercept, and it was not until recently that I heard about random slopes.
I am still uncertain about when I should be using a random intercept and slope in my mixed effect models? Any practical or theoretical insight would be greatly appreciated.
This is a general question, but for the sake of example lets say I am testing a hypothesis that there will be a negative relationship between total brain volume and depression severity symptoms (both variables are numeric). In this case, I am trying to control for age at scan since there are inconsistent intervals between scan sessions across all participants. Therefore, should I include a random intercept and random slope of age?
gamm4(GreyMatterVol ~ s(AgeAtScan, k=4) + DepressionSeverity, random=as.formula(~(1|sub), data=alltimepoints, REML=T)$gam
Hi all,
I´m doing research in a big corporation that has implemented a purpose program, which consist of a 6 hour workshop for employees where they reflect about their purpose in work and the purpose of the firm. We are designing a longitudinal study for measuring the impact of this program in atendants' perceived productivity, collaboration and enviromental awareness.
We are going to do a multilevel longitudinal analysis surveying both employees and their inmediate supervisor. We have develop the scales for each variable but we have doubts regarding the time-lag between each wave. Could you help me with that?
I thank you so much in advance your help,
Alvaro
Over the past couple of years, I have been getting into methods to infer causality (after a strong focus on correlation, survey-based research). Methods such as longitudinal analysis using fixed effects as well as DiD-analysis are contested. But are there other valid methods to infer causality apart from actual experiments or the above-mentioned methods? And what are the benefits and costs of such methods? Looking forward to hearing what you think.
I have read multiple articles that have used machine learning algorithms (convolutional neural network, random forest, support vector regression, and gaussian process regression) on cross-sectional MRI data. I am wondering whether it is possible to apply these same methods to longitudinal or clustered data with repeated measures? If so, is there an algorithm that might be better to use?
I would be interested in seeing how adding longitudinal data could improve the performance of these types of machine learning models. So far, I am only aware of using mixed effect-models or generalized estimating equation on longitudinal data, but I am reading books and papers to learn more. Any advice or resources would be greatly appreciated.
Hello,
assume I have data from two (and later three) time points that are in a three-level structure. I want to test mediation. The predictor is binary, the mediator is (usually) continuous (in some analyses also binary), the outcome is continuous (and in some analyses binary).
So, it's
- mutilevel
- longitundinal
- mediation
analyses.
Can you recommend how to analyse this data? I was thinking about latent growth modeling with mediation but I am not sure if this is either possible or the best.
Thanks!
I collect data of 21 participants daily over 14 weeks (98 measurement points per participant/physiological data). Each participant is part of an intervention group. All interventions have the possibility to improve the dependent variable. I assume that there was no improvement per day but perhaps per week (I know that it is not perfect because all interventions had the possibility to improve the dependent variable). My questions are:
- should I combine the daily measurements into weekly measurements?
- should I use the time variable (measurement or week) as continuous or as a factor?
- Is this a possible code (here using the lmer function)?
model1_fit <- lmer(formula = dependent variable ~ daily measurement/week+intervention+daily measurement/week:intervention+ (1|id),
data=data,
na.action=na.exclude)
summary(model1_fit)
- how should i interpret the interaction when using the time variable as a factor (assuming that's the better choice)?
Thanks for your help.
I need to perform a linear regression analysis in SPSS where:
-the predictor is a continuous variable representing the SD of changes over time.
-the outcome is a continuous variable measured at one time point.
-other co-variates are measured repeatedly over time.
What would be the best approach to test the association between predictor and outcome while co-varying for the other time-varying variables?
Thank you
I have assessed (Trait measure of attribution - for positive and negative situations ) for 85 sports persons. They all went on to play a competitive match. After the match (T1), they filled a state attribution questionnaire based on the result of the match (win/lose). Total number of winners were 51 and 34 lost their match. They all went on to play their next game (T2) and filled the state attribution scale, based on the result of the game (Win/lose) again. From the 51, who won their match at (T1), only 35 won and 16 lost the second game. From the 34, who lost their game at (T1), only 16 won their second game and 18 lost again.
I want to see whether attributions remain same? (trait-state-state)
whether one behaves like their trait attributions?
Any help regarding statistical approach would be appreciated.
Hello readers
I am studying attributions in sports. I have a trait like measure which assesses attributions from six dimensions (Internal-External, Stable-Unstable, Global-Specific, Controllability, Intentionality). I also have state measure for attributions, which i assessed after two performances t1 and t2.
I want to know/test, people who are optimistic (trait) remain optimistic (state) irrespective of the result of t1 and t2 and people who are pessimistic remain pessimistic.
Can any one help me with the suitable analysis.
Any help would be appreciated
In Criminology as in all social science, longitudinal analysis is NOT hurting causal science. If, owing to poor conceptualization, a lack of proper measurement and data, longitudal analysis is premature then so would be cross-sectional analysis. Classic research survives not because of the findings, the theory or methods but because of the scholarship. Advancing our understanding of any complex phenomena requires so much more than analysis over time or space. It requires our conceptualization to advance beyond these 2 dimensions and include interactive simultaneous multilayered dynamic processes contingent in time and space. So instead of waiting to model time when we have more knowledge or better data I would contend that waiting would only worsen our understanding by continuing to isolate and truncate how the world actually works. Associations at time t are not just time dependent, they may be transformed or die out by time t plus 1. Don’t fear time, lean into it. Let’s collaborate across disciplines to conceptualize, invent and test new heuristics. It’s messy. Let’s get out there and model the heck out of it!
A government body performs an annual rating of schools. The rating is given a score (called performance rating). In addition, there are the PISA and TIMSS ranking and other global competitiveness ranking shows the education performance of the country. I want to test if there is an impact of the schools rating on those indicators. Does the improvement in the annual assessment scores (increase or decrease) impact the PISA, TIMSS and GCI ? Longitudinal analysis, but I am strutting to plug in the data in SPSS and to identify the test tool to use. I have 3 years schools assessment scores, the indicators results.
I am doing a meta-analysis of longitudinal data in which few studies have OR as summary estimate while others have rate ratio (incidence). Except for the estimate used, there is no heterogeneity between the studies. It doesn't seem sensible for me to conduct separate meta-analyses just for this reason. I will be grateful if somebody can advise. The years of follow-up in both the exposed and unexposed groups is similar.
According to my knowledge
Incidence rate ratio = (incidence in exposed group/person-years) / (incidence in unexposed group/person-years)
Relative risk = (Incidence in exposed group/persons in exposed group/total persons in exposed group) / (incidence in unexposed group/total persons in unexposed group)
Given that the years of follow-up in both the exposed and unexposed are similar, can't we consider IRR as equivalent to Relative Risk.
Please correct me if I am wrong and let me know if there is some way to do this
I am trying to decide which longitudinal analysis I should choose for my study. I collected data at four different time points:
1. Before the intervention
2. After the intervention
3. Three months after the intervention
4. Six months after the intervention
Study design: 12 participants took an intervention made to foster a sense of purpose in life. I want to see whether the toolkit fostered purpose and if the effects held over a period of time.
I am carrying out research on longitudinal changes in migrant health. In one of my objectives I am investigating how BMI changes over time for migrants compared to the native born, in wave 1 I have around 16000 participants, in wave 2, 8700 and in wave 3, 3700. The attrition is very high as indicated, some of the participants had their BMI recorded for some waves only. When I attempted repeated measures ANOVA, only around 900 participants had complete cases were included in the analysis. Are there other longitudinal methods that can include incomplete cases?
For studying the transversely isotropic elastic behaviour of cylindrical structures of a given length, subjected to radial and axial pressure simultaneously, which plane of isotropy will be considered for the analysis?
TIA
I'm looking at the impact of a policy change on crime rates, but there are two policy interventions over the time span I'm studying. Would it be appropriate to do a time series analysis with two interventions, or would an ANOVA be better?
I have a medical longitudinal retrospective dataset, records between the observation period of 2000 and end 2016. For many reasons not every medical record spans that entire time-frame, e.g. the patient may have died, or they may have transferred in to the study half way through or transferred out at some stage.
A particular event (or exposure) is seen as a clinical event e.g., going to the doctor and saying or being told that you have a particular disease, e.g., a chest infection. That patient will also have a categorical variable to indicate whether they are a smoker or not.
I wish to count the frequency of chest infections per patient and distribute them over whether they smoke or not. I can imagine this would be a box plot with UQ and LQ being defined, frequency of disease on the Y, and a Smoke YES and NO on the X. This would be very easy to do. The problem I have though is that I am not sure how I deal with medical records of varying length. Surely there is bias if a smoker vs. non-smoker both have twenty chest infections, but there is a four year medical record difference?
Thanks
Mixed effect model and baseline dependent variable as covariate
I would like to determine which predictors are associated with the rate of change in a continuous dependent variable repeatedly/longitudinally measured over time in each patient. The analysis will use a mixed effect model. The model will therefore include an interaction term between the predictor and time (c.time in STATA) and the coefficient for the interaction term will give what is essentially a difference in slopes. So, if the predictor is gender, and males (coded 0) have a c.time coefficient of 1.5 (slope) then the interaction term could give a coefficient of say 0.5 and tell you that females (coded 1) have a slope of 2.0.
This will be tough since I have many variables which will involve many interaction terms (one for each variable) and the interpretation may get complicated. I recently came across an article that states the following where FEV1 (lung function) is the outcome that is being repeatedly measured and mixed effect model is also used:
" All models include baseline lung function as a covariate; therefore, the regression coefficients express the influence of predictor variables upon the annual rate of decline of lung function. " (PMC2078677)
However, is this correct? I tried this with sample data to see if the answers match and they do not. I am not sure if I am doing it right.
I did the following in STATA:
1) xtmixed y gender##c.time || id:, var
and compared this to
2) xtmixed y gender time baseline_y || id:, var
I guess I was expecting the gender coefficient from 2) to match the gender##c.time coefficient from 1) which seems very silly now.
My only other option would be to group individual slopes of the dependent variable into high and low slope patients to use in a logistic regression model which doesn't adjust for covariance. I am wondering what you all think of this as well.
dear all,
I have a dataset with almost 2 millions observation nested within (European) countries. My DV is probability of weekly religious practice.
I want to disentangle between Age, Period and Cohort effects and there is the weel-known identification problem.
Given that I have so many observations and a quite wide time span (year from 1970 to 2015, cohort from 1900 to 2000, age from 15 to 100) what is the best strategy to apply?
I know this is a very broad question and that there is a huge debate behind but I really need to collect some opinions about this.
Thanks in advance!
Francesco Molteni
This data is from a 9-month intervention with children with Down syndrome (n=12). I know my sample size is very small, but we do have a lot of data points per child.
The dependent variable is essentially a measure of performance (ability to independently activate and drive a powered mobility device, biweekly assessment). The independent variable is practice time (from an activity log, in minutes per day).
First, I am stuck in how to categorize the dependent variable. I was thinking of just picking the date in which independent driving emerges, but ideally, I would like to capture more of the variability in driving patterns and abilities over time.
Another option is to use the percentage of time they were independently activating the car in each assessment OR categorizing each session as novice/intermediate/advanced behavior.
This is all fairly new research, so there is little to follow in the literature. The goal for these analyses would be to provide recommendations to clinicians on when learning is expected to occur based on usage patterns (and yes, recommend with caution of course, given that the scope of inference should likely be restricted to this sample).
Is there a way to analyze the association between two patterns over time for a sample of this size? Or should I treat them as single cases and report all children individually?
Any and all advice appreciated - I am a Master's student, and this level of statistics is a bit daunting to me!
I have a number of questions regarding a multinomial logistic mixed model analysis that I would like to ask. Essentially the questions can be reduced to: "Is a multinomial logistic mixed model analysis appropriate for only two time points?", and "Is it possible to measure multiple outcomes using this or an alternative, more appropriate analysis method?".
I am currently conducting an analysis on a subsample drawn from a larger longitudinal cohort study. So far, only two waves of the study have been conducted, with 2-3 years between them. I will be using both waves. My subsample includes only those individuals who were aged 15-25 years at baseline (Wave 1), following them on to Wave 2. I am investigating the relationship between ecstasy use and psychological variables, over time.
At each wave, there are separate questionnaires for those aged 15-17 and 18+. This I not an issue for most variables I am interested in, with the exception of quality of life (QoL). I’ll expand on this in a moment.
By Wave 2, the age range of my sample was 17-28 years of age. Therefore, at Wave 2, most of my sample completed the 18+ questionnaire except for those aged 17 years. This will become relevant when explaining the QoL measures I have.
From what I have read, I believe a multinomial logistic mixed model analysis is the best way to analyse my data. My exposure variable “ecstasy use” is categorical. The outcome variables I am interested in are related to depression and quality of life (QoL). I have one depression scale (PHQ9) and two QoL scales - none of these are normally distributed and I will therefore have to categorise them. That being the case, have a categorical exposure variable and categorical outcome variables.
I also have several dichotomous and categorical covariates. Of my covariates, many are time-varying (marijuana use etc.), but some (sexuality, socioeconomic status, indigenous status and language spoken at home) were only measured at baseline and are assumed to be time-invariant.
My first question is whether a multinomial logistic mixed model analysis is appropriate with a dataset for which only 2 time points are available.
Secondly, whether or not the mixed modelling approach is appropriate, is it possible to measure two outcomes (depression and QoL) in the same model? If so, will the likely high correlation between depression score and QoL score be an issue?
Finally, QoL is measured using different, incompatible scales for 15-17 years and 18+ years. Is it possible to use both of these when including QoL in the model? I assume this is unlikely as Wave 2 data for the 15-17 years scale is obviously only available for those who are aged 17 years by wave 2. The alternative is only assessing QoL for adults, and ignoring the QoL scale I have for the younger group. If this is necessary, am I still able to measure two outcomes in the same model, or will this be an issue due to one outcome (depression) being only measured for those individuals aged 15-25 years at baseline and the other outcome (QoL) being only measured for those individuals aged 18+ at baseline?
Regards,
Rowan
I am running a multilevel model using stata but the data only contains replicate sampling weight. I was suggested to use -meglm- but -meglm- is incompatible with replicate weights. Are there any alternative ways or commands I can try to run the multilevel model using replicate sampling weight?
Also, the data doesn't have longitudinal weights (only contains wave-specific weights) and what weights were suggested to use under this occasion when the data structure is longitudinal?
Hello,
I have recently come across the following in an article I am assessing for my own data analysis regarding platelet refractory levels as a result of multiple variables at several points in time following transfusion. "Risk factors contributing to platelet count increments within 1 hour and between 18 and 24 hours after transfusion and to the interval between transfusions were analyzed by longitudinal linear regression using a random effects model derived by generalized estimating equations"..I was under the impression that generalized estimating equations did not use random effects modelling? Could someone please clarify this for me?
Regards,
Derek
I conducted a multiple logistic regression to assess the effect of parent's nationality, mother's weight at conception, mother' weight at delivery, and father's weight on adverse pregnancy outcomes. The Block 0 of analysis, which assume only constant in the analysis and does not include our explanatory variables, should show that without any independent variable, the best prediction is to assume that no participants show adverse pregnancy outcome. However, my Block 0 table did not show this. It shows that with out any independent variable, all the participants show the adverse pregnancy outcome. I saw this result in adverse pregnancy outcome 1 and 2, but I saw assuming of non of participants show adverse pregnancy in my adverse pregnancy outcome 3. What is the wrong with my two first outcomes analysis?
Thank in advance,
Hello, I wish to calculate relative risk adjusted for other risk factors perinatal deaths ( dead=1; alive =0 ). Given that perinatal deaths is a common outcome, so use the adjusted odds ratio is overestimated. So how do I calculate adjusted relative risk?
Respected Researchers
i have panel data of 1252 firm years observations with 182 firms and time period is 2010-2016. i have 14 independent variables including 3 control variables, 1 mediator and 1 dependent variable. i want to use STATA for testing direct and mediation models. i have 11 direct hypotheses from independent variables to dependent variables and 11 mediation hypotheses.
Data: Unbalanced panel
QUESTIONS:
1) which test should be implemented other than the selection between fixed and random effect?
2) how mediation analysis can be performed?
I am running a difference in difference regression to assess the early impact of minimum wage introduced in 2015 on satisfaction of workers. I actually have data from 2010 to 2015 but the panel is unbalanced since some individual seem to be missing in different years. Do you suggest that i should focus on just 2014 and 2015 waves using the same individuals (balanced data) to run the regression and I should consider also the other years before the introduction of the minimum wage?
Dear all,
I need some support within a problem in statistics. I need the explanations by an example, please. Otherwise I suspect that I won’t understand the answer, unfortunately. Attached you will find a screenshot (.jpg) of the example:
There is a sampling of 5 taxis. I know the kilometers travelled and the numbers of the accidents of each taxi. The goal is, to calculate the variance of the average number of accidents per 100000 km (and afterwards to calculate the confidence interval).
It is important, that I want to calculate the average number of accidents in that way, that I divide the summed accidents and the summed travelled kilometers of all taxis. (and not calculate the number of accidents per 100000 km of each single taxi and then calculate the mean).
That means I won’t calculate the variance of the single values (accidents per 100000 km), but I want to estimate the resulting variance of the variance of the travelled kilometers and the variance of the accidents.
Thanks a lot for any support!
Andreas
Hi all,
I'm currently trying to do an analysis of the correlates of change in the county-level mortality rate (M) between 2010 and 2014. Since I'm particularly interested in the within-county change in mortality, I thought the most appropriate thing would be either to regress my county-level predictors onto a difference score (M2 - M1) or regress predictors onto Time-2 mortality (M2) while including Time-1 mortality (M1) as a covariate. I've read that this latter method is equivalent to ANCOVA, and should be avoided when doing nonrandomized, observational studies (due to Lord's Paradox). So in my case, a multiple regression with difference scores as my DV seems appropriate (please correct me if I'm wrong).
However, I'm struggling with some theoretical questions regarding my county-level predictor variables. My predictors are mostly coming from Census data, which comes aggregated into 5-year periods. Therefore, my county-level predictors (e.g., median income, % population with college degrees) represent two 5-year time periods: 2005-2009 and 2010-2014. My goal is to analyze how the overall between-county differences (a) and within-county trends over time (b) relate to the change in mortality over time (M2 - M1). My questions are as follows:
1) Is it possible to disaggregate the between-county and within-county effects with only two time points (2005-2009 and 2010-2014) for each county? My inclination is to follow the advice of Curran & Bauer, 2011 (pg. 9), and compute county-level means (collapsed across time) of my predictor variables to represent between-county effects, and then compute the within-county trends by subtracting the county-level mean (of the predictor) from the T2 (2010-2014) predictor value. Is this an appropriate approach when I only have two time points for my predictor variables?
2) Is it a problem that my outcome variable (change in M from 2010 to 2014) does not match up temporally with my predictor variables (change in IV from 2005-2009 to 2010-2014)? I realize this would be a problem if I wanted to conclude a simultaneous change in M and IV, but I want to test something slightly different: whether between-county differences (at the 2010-2014 period) or within-county trends (between 2005-2009 and 2010-2014 account for the change in mortality in the later period (2010 to 2014).
Thanks very much,
Jake
I am working on a meta-analysis of RCTs, and I have to calculate .metabias (several tests, including Egger's) for continuous data (variables such as means and standard deviations). What is the process? Which are the commands used?
Thank you so much.
Good Evening Sir/Ma'am,
I have a Model with four constructs (A, B, C, D). The Constructs are related in the following manner:
A-------> B-------> C--------> D.
Each Construct is having "Repeated measures" i.e., each construct is a "State-level variable", which is to be repeatedly measured, to obtain its Trait-level.
The sample respondent is Employee.
I am planning to use "Hierarchial Linear Modelling" to analyse the Model.
So, could you please suggest me the Minimum sample requirement at:
1) Level-1 (i.e., How many times should I measure the state-level variable?) &
2) Level-2 (i.e., How many Employees are required?)
Is it true that there is a linear relationship between risk and return i.e. high risk associated with the high return and low risk with the low return.
Hi,
Imagine you have measured two variables X and Y at two points in time. You want to predict Y2 from X1 and also control for the autoregressive effect (= temporal stability) of Y1 and control for the correlation of X1 and Y1. My question: Is there any statistical reason that makes it necessary and/or advantageous to implement a full CLP, that is to include the pathes X1 -> X2, Y1 -> X2, and the correlation X2 <-> Y2?
I am not interested in reciprocal relations between X and Y. I just wonder whether including these additional pathes has an impact on my path of interest (X1 -> Y2) and if so, why? I'd also be glad if oyu could provide a reference.
I have a problem with SPSS. I want to write a code in SPSS to repeat linear regression for ten times but every time the dependent variable change and the other part remain the same. How can do this?
I use this code to do linear regression and I want to repeat this for ten times changing the dependent variable :
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT mmsecorretto
/METHOD=BACKWARD CerebralWM_FD CortexVol Mean_GM_temporal_lobe_volume sex anniscolarita
EstimatedTotalIntraCranialVol age
I don't grasp the concept. For example, if I have a sample n=1000 (that is stratified and clustered) and made three groups of that sample to compare means, is using a regression model enough?
IV is categorical and DV is continuous variable
Does it affect how I choose the groups?Logistic regression is better? If I have two dependent variables, do I use a multiple regression model ?
I am working with a complex survey (secondary analysis) in mental health and I want to know if having a history of ADHD symptoms (IV) can affect QoL (DV) in adults so I have three groups
1. adults without history of ADHD nor current symptoms---> QoL x
2. adults with history of adhd w/o current symptoms-----> QoL y
3. adults with history of adhd and current symptoms------> QoL z
and want to prove if there is an statistical difference between x, y z
A data modeled with categorical predictors and has a large proportion of cells with zero counts. Apart from considering adding a constant to all cells, we may collapse categories in a theoretically meaningful way.
How do we do such collapsing?
I am doing longitudinal research.
There are two variables.
One was measured for three times, but the other one was observed for five times.
In other words, one variable was missing twice.
Do you think that I can develop the AR model with these variables?
I have found the only relation between moment, thickness, and connectivity but I guess it is only for longitudinal method.
I am working on a longitudinal study with 140 participants divided in 3 groups. The participants were assessed every 2 years from 2010 (4 time-points in total), so time-points are equally distributed, but there are some dropouts, so some patients are missing some time-point.
The assessment consisted of some tests, the results of which are discrete numerical variables (e.g. one of these is the MoCA test, which is a cognitive test with different tasks and for each task the participant is given a score; the final score is the sum of the partial scores).
My goal would be to show any difference between groups in the progression of the scores through time.
After some readings I am thinking to use a mixed effect model with the random part on the single individual level and the fixed part on the group level, would that make sense? What other statistical model could I use?
I am trying to analyze data (self-concept and testscores) of students before and after transition from primary to secondary education. The aim is to show the impact of individual achievement and class achievement on self-concept both before and after transition: I hypothesize (1) that individual achievement has a positive impact on self-concept and class achievement a negative one, (controlling for individual achivement) and more important (2) that after transition to secondary school, class achievement of the "old" class before transition does no longer have it's negative impact on self-concept measured after transition.
Now I do not know how to set up a model, for students change classes with transition and therefore are nested in two different groups - their classes - before and after transition.
Does anyone have an idea how to set up a model that allows to analyse these questions or has anyone done some similar analysis?
Thank you very much for your answers!
Longitudinal data elements need to be embedded in usual snap-shot mode of survey. what techniques can help to achieve this?
Hi
I'm running a series of multilevel regression models (mixed effects or random coefficient analysis) in Stata 13 to investigate associations between a set of predictors, time (here interpreted as duration in months from time of diagnosis) and my outcome of interest which is continuous (say cholesterol in mmol/L).
The main purpose is to investigate rate of change (i.e. this is a longitudinal analysis) - does cholesterol change with duration (time) given a set of certain predictors.
I know that the modeling results in two parts; the fixed effects part and the random effects part. I know how to interpret the fixed effects part, but could someone help me understand the estimates from the random effects part, when this is run for longitudinal analysis?
Below, we see that cholesterol reduces by -19 units per month (duration is in months). Mixed and black ethnicities have cholesterol levels at 3.4 & 3.3 at time 0 (i.e. at diagnosis here) and so on. But how do I interpret the estimates under 'random-effects parameters', in the bottom half of the output?
Example of output:
xtmixed hba1cifcc2 durationm durationm2 sex1 i.ethnicnew2 || id: durationm, cov(unstr) mle var, if diagyr>2004 & durationm<6.1 & imd4!=.
Mixed-effects ML regression Number of obs = 1028
Group variable: id Number of groups = 443
Obs per group: min = 1
avg = 2.3
max = 7
Wald chi2(9) = 729.89
Log likelihood = -4329.3449 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
cholesterol | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
durationm | -19.43518 .7736272 -25.12 0.000 -20.95146 -17.9189
durationm2 | 2.402219 .1226445 19.59 0.000 2.16184 2.642598
sex| -1.840137 1.352356 -1.36 0.174 -4.490706 .8104314
ethnicity |
mixed | 3.442956 2.547798 1.35 0.177 -1.550636 8.436549
Black | 3.286653 2.12706 1.55 0.122 -.8823077 7.455614
asian | 5.825651 1.820642 3.20 0.001 2.257258 9.394044
_cons | 93.19885 2.935992 31.74 0.000 87.44441 98.95329
-------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unstructured |
var(durati~m) | 21.35454 2.9492331 16.03032 27.2855
var(_cons) | 352.785 38.02899 284.87269 434.91405
corr(durati~m,_cons) | -69.80694 9.286172 -88.562195 -50.74312
-----------------------------+------------------------------------------------
sd(Residual) | 10.55247 .4243647 9.752664 11.41787
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 214.13 Prob > chi2 = 0.0000
In my study,there are eleven response variable and one independent variabe.
Is there a way?(using by longitudinal ordinal response GEE models)
Hi I did an experiment for miRNA and targets but I have a problem in calculating the significance of the data can you help me which test should I have to use and how to caluclte the p value or z score for these data
negative (no vector)
Positive ( null miRNA vector + gene)
G1, G2, G3 (miRNA vector + gene)
Is there a way to easily check the "geeglm" model (with ar1 correlation structure, in R) like its residuals or something else?
Thanx a lot
Dear all,
I am running an MLM, where I am interested in individual and regional effects, but only want to control for the country. However, once I insert the country dummies at Level3 -while none of the previous results change- the LR test now indicates that I should rather use a logit model.
Thus I am wondering: Is the variance of the dummies already accounted for? i.e. do I need the dummies in the first place to control for the nesting of regions in countries and take out all country fixed effects, or is that already done?
Thank you very much for your time!
Best, Jo
PS: I am using STATA'S build in command which takes forever. I have tried MLWIN but was disappointed since it crashed very often. If you have a program you would recommend please also do let me know.
What is the best type of study to grasp the changes in well-being within a city's population in a time period of 3 months? Cross-section or longitudinal?
The population will the surveyed two times. Once before a major event, and the second time after the event. The time period between the two surveys will be of 3 months.
I have been conducting a longitudinal study examining the association between aerobic fitness, weight status, and academic achievement. Recently the standardized testing measure, used to assess academic achievement among youth, was changed due to a switch in government contract. Is there any way to continue longitudinal analysis with this switch in the measurement tool for the outcome variable of academic achievement?
What is the best method to examine the dynamics of cattle colonization by anti microbial resistant microorganisms? The data consists of 188 cattle measured at four equally spaced time points over the course of the year. The same cows were followed and a binary outcome (ARM present/absent) as well as the number of bacteria present (log colony forming units) are available. I was interested to explore either dynamic or longitudinal models to understand the underlying process of colonization in the herd over time. Please provide references explaining any methodologies or other examples.