Science topic
Generalized Linear Models - Science topic
Explore the latest questions and answers in Generalized Linear Models, and find Generalized Linear Models experts.
Questions related to Generalized Linear Models
Hi, I am doing a study for my masters and in that I performed linear regressions to check if all the assumptions were met. All the regressions came out to be significant. Even the mediators(I have 4 mediators) to dependent variable ones. But when I performed the mediation, the indirect effect was not significant but the total effect was. I don't understand why that is the case. Please help

If I use Generalized Linear Model (GLM) in SPSS, how should I arrange my data (2 Years) and interpret the results? Are there any reliable source for understanding this process?
I am looking for a way to analyze repeated measurements.
I have data of subjects who have varying number of measurements (from 2 to 10) over fixed time periods (Week 1, week 2, etc)
The subjects are divided into two groups (A and B).
What I want to check in my analysis is:
1. For all subjects together, does Week1 differ from Week 2, Week 2 from Week 3, etc
2. Are the changes over time in group A different from the changes in group B? I expect from my data for the group A to have higher deltas between timepoints in comparison with group B.
Some issues with my data:
1. some values are missing for most individuals;
2. plotting the data over time reveals non-linear trend: There is a trend of increasing values during the first 3 weeks, and from weeks 3 to 10 - a decrease of the values;
3. measurements of different patients are quite variable (think body weight - from 50kg to 120 kg) - a wide variation
- Repeated measures ANOVA is not a good option as I understand because it cannot deal with missing data.
- Linear mixed models (LMM) seems to be a good fit, as it allows for missing data, and allows entering the subjects as a random factor (so each subject has their own intercept).
The problem I see is the slope - it is not linear.
I know SPSS does not have a non-linear mixed effects model at all and I am not skilled in any of the other statistical programs. Is there any other solution for my data or workaround to use LMM?
Hello,
I have a dataset for which I want to see which of the independent variables are affecting the dependent variable. I standardized the dataset in Rstudio. I was thinking of using GLM in Rstudio to address the question. As per the results of Shapiro-Wilk's test, the dependent variable is not normally distributed. So, I cannot use Gaussian family in GLM. The dataset contains negative values post the standardization process. So, I cannot use a Poisson distribution in the GLM. I was wondering if anybody could suggest how I could proceed with running the GLM.
Thanks in advance,
Best wishes,
Sraman
I am looking for a published article using SAS or SPSS Generalized linear model for trial/event data and not survival analysis. Both software packages off the option for the number of success out of the number of trials, but I can not find a published article or reference
I have two types of behavioural data; (i) duration of behaviours from focal sampling and (ii) frequency of behaviours from scan sampling. There are 6 trials of scan samplings. Both of these data (focal & scan samplings) are not normally distributed. The focal sampling data is zero-inflated, continuous data. I would like to assess the interaction between factors (like sex, age, time etc) and behaviours, so I opted to use Generalized Linear Models (GLM). My concern is, which model suits the data from focal sampling best? I tried using a linear model for focal sampling data, but based on my reading, a linear model is for normally distributed data. I tried using Gamma as well but multiple warnings appeared. I even tried to transform my data, but it's still not normal. I did get results, but I'm unsure if it's accurate to use them. Can anyone advise me on which model I could try that suits my data best. Thanks a lot.
Generally speaking, I was checking if the presence of animal carcasses could be determined based on the chemical composition of the soil. I know that in the context of ecological modeling and environmental data analysis, the moderate predictive power of models, as suggested by the R² values presented in my analysis (around 38%), is often considered acceptable. I used GLM. I am having trouble finding scientific articles to support this... I am only finding articles from the "social sciences". Can you help?
Hello Research Community,
I am currently working on a gene expression study involving two factors: treatment (control vs. treatment) and time points. The results from a two-way ANOVA using GLM have indicated significant interactions.
Typically, qPCR experimental designs focus on single factors. Has anyone dealt with this kind of data analysis? Should I separate the comparisons between and within groups and apply a one-way ANOVA to each? Or have a better suggestion.
And which multiple correction test method is recommended?
Thanks in advance for your assistance!
Xin
Dear colleagues, I wish compare two paired samples with a Poisson distribution through a generalized linear model, but I don't know if this is correct. I would appreciate to much your help.
Cheers,
Hi all,
I'm currently working on a logistic regression model in which I've included year as a random variable, so in the end I am working with a Generalized Linear Mixed Model (GLMM). I've built the model, I got an output and I've checked the residuals with a very handy package called 'Dharma' and everything is ok.
But looking for bibliography and documentation on GLMMs, I found out that a good practice for evaluating logistic regression models is the k-fold cross-validation (CV). I would like to perform the CV on my model to check how good it is, but I can't find a way to implement it in a GLMM. Everything I found is oriented for GLM only.
Anyone could help me? I would be very thankful!!
Iraida
A question to the statistics savvy people on
I'm interested to evaluate whether a specific disease state manifests as imaging-derived parameter across a range of ages.
I have 2 samples (+disease/-disease) of about 500 subjects each, across a similar range of ages (40-70)
I played with the data and noticed that for younger subjects the said parameter does not significantly (ttest, p>.05) separates the 2 groups, while in older it does. I was wondering if there is a correct way to determine the critical age (or age range) in which an effect becomes significant. I used a sliding window (8 years) that showed the effect become significant from 47-55 age group onward. I am not sure that this is the correct approach as in the sliding window analysis the samples (i.e windows) are not independent.
Follow-up question – If I want to control for potential confounds such as different ratio of male/female in each sliding window, should I use a GLM?
Thanks in advance
Hope this leads to an interesting discussion.
Greg
In a random effects regression we have the assumption that the individual specific heterogeneity is not correlated with the predictor variables:
Yit = 𝛽1Xit,1+ 𝛽2Xit,2+…+ 𝛽kXit,k+ 𝛼𝑖 + 𝑢𝑖t
i = entity-individual
t= measurement at time t
αi ~ N(0,σα), (i=1….n) is the unknown intercept for each entity ( n entity-specific intercepts)
Yit is the dependent variable where i = entity and t = time
Xit is an independent variable
𝑢𝑖t idiocynraticerror
Assumption: cov(αi ,Xit) = 0
Do we also make this assumption when using linear mixed effects models?
Hello everyone,
I'm currently conducting genetic analyses using SSR markers, and I'm looking for software recommendations that support GLM (General Linear Model) and MLM (Mixed Linear Model) analyses with SSR markers. I've previously used TASSEL, but I understand it no longer supports SSR markers. Could anyone suggest alternative software or tools that I could use for my analysis? Any recommendations or insights would be greatly appreciated.
Thank you in advance!
Originally, I intended to conduct independent samples t-test and one-way ANOVA in SPSS for analysing my data. However, when I examined the normality of the dependent variable (DV) for checking their assumptions, it showed that my DV is highly skewed as the attached photo.
My DV is measured through an open-ended question and it is a continuous variable about participants’ predictions of the duration of an emotion, ranging from 0 to almost 300 unit. Sample size is around 1000.
Since the normality assumption is violated, I am wondering:
- Whether I should conduct (1) nonparametric tests instead (e.g. Mann-Whitney u test and Kruskal-Wallis H test) or (2) a generalised linear model (which allows for the DV to have a non-normal distribution) by specifying the model as gamma with log link instead? Or (3) conduct both but when reporting the results, I say something like “because the results are similar, I will only report the parametric ones” (which is the generalised linear model)?
- I am also interested in examining the moderating effect of a categorical variable on the relationship between a continuous IV and a DV. Therefore, even if the answer to question 1 is "conducting non-parametric tests", I still would like to know whether I can perform the generalised linear model and examine the interaction of the IV and moderator, even my DV is not normally distributed?
Thank you in advance for your advice.

Dear Community,
I am interested in reporting the relative importance of my predictors in a GLM (Gamma distribution, log link). In other words, among significant predictors, I would like to know which ones influence the response variable the most. The model includes both numeric and categorical predictors.
I have read that partial R-squared could be appropriate to assess the relative importance of predictors in a GLM. Using the R package rsq (Zhang 2023), I managed to extract these partial R-squared values from my model. Yet, what worries me is that some of these values are negative - even from predictors that were returned as significant by the model. How is that possible, as partial R-squared explain some parts of the variance?
Do you know what it means exactly? As I am not sure to have fully comprehended the meaning of partial R-squared anyway, any reminder on this topic would be very helpful as well.
Thank you so much for your help,
Antoine
I am a doctoral student working on my dissertation. In my data set, some data have bimodal distribution and I'm wondering how I should handle this data using a GLM?
Do Monte Carlo or Generalize Linear Model do the same thing?
What are their difference?
Which is best for a count data and why?
Hello everyone! I need some help with a Generalized Linear Mixed-effect Model.
Here's the problem:
In my GLMM, I have two cluster variables (Stimuli and Participants) that I have modeled as random effects (both random intercepts and random slopes).
The question is:
Can I include fixed effects variables in my model that do not vary within each Participant ?
Currently, my model looks like this (please ignore the syntax, it's a mix of GAMLj and lme4)
DV ~ 1 + Shape + gender + GAAIS + Presentation order + Shape * gender + Shape * GAAIS + Presentation order * Shape + (1 + Shape | Stimulus) + (1 + Shape | ResponseId)
The "issue" is that GAAIS, Presentation order, and gender are three variables that only have one value within each Participant. Obviously, each participant has only one gender, one value of GAAIS, and one value for Presentation order.
I wonder if what I am doing makes sense theoretically or if there is a risk of encountering problems with quasi-separation and/or multicollinearity between the fixed and random effects.
Any help is more than welcome! Thanks in advance!
Association mapping, GAPIT, GLM and MLM, GWAS
GLM, MLM, association mapping, population structure, Kinship data, etc.
Hello everyone, I studied the diversity of an orchard, I did a molecular diversity using molecular markers, and some morphological traits over two years, in the first year I did the CPA analysis for morphological traits, and my supervisor is suggesting to include data from two years split using GLM, so I have genotypes, first year and second year data,(genotype*year) I couldn't figure out how can I use GLM to analyze all of this, my question is if GLM is the right analysis to do for this case (diversity study), or I just use PCA, and combine the results of two years on one
Hi. Please can anyone help?
I have two dependent variables, a length and a width measurement, that are related (spearmans r = 0.54). I have three fixed factors (dog breed, assessor and trial group) and three covariates (age, days since intervention and height*weight).
I was planning on running a multivariate GLM in SPSS but despite lots of reading and chatting with my supervisors, I am still not clear whether this is suitable for my data. Please can someone help?
I followed this tutorial https://www.youtube.com/watch?v=J0FeyWJgHiU but included all my two way interactions between fixed factors and covariates. Some of them were significant. Does this mean that the assumptions are violated and I should not run the multivariate model?
Thanks in advance for any assistance!
Dear all,
I have conducted a research about snake chemical communication where I test the reaction of a few adult snake individuals (both males and females) to different chemical compounds. Every individual is tested 3 times with each of the compounds. Basically, I put a soaked paper towel in each of the individual terrariums and record the behavior for 10 minutes with a camera. The compounds are presented to the individuals in random order.
My grouping variable represents the reactions to each of the compounds for each of the sexes. For example, in the grouping variable I have categories titled “male reactions to compound X”, “male reactions to compound Y” etc. I have three dependent variables as follows: 1) whether there is an interest towards the compound presented or not (binary), 2) chin rubbing behavior recorded (I record how many times this behavior is exhibited) and 3) tongue-flick rate (average tongue-flicks per minute). The distribution is not normal.
What I would like to test is 1) whether there is a difference in the behavior between males and females, 2) whether there is a difference between the behavior of males snakes to the different compounds (basically if males react more to compound X, rather than to compound Y) and the same goes for females, and finally 3) whether males exhibit different behavior to different types of compounds (I want to combine for example compounds X, Y and Z, because they are lipids and A, B and C, because they are alkanes and check difference in male responses).
I thought that PERMANOVA will be enough, since it is a multivariate non-parametric test, but two reviewers wrote that I have to use Generalized linear mixed models, because of the repeated measures (as mentioned, I test each individual with each of the compounds 3 times). They think there might be some individual differences that could affect the results if not taken into consideration.
Unfortunately, I am a newbie in GLMM, and I do not really see how such model can help me answer my questions and test the respective hypotheses. Could you, please, advise me on that? And how should I build the data matrix in order to test for such differences?
Isn’t it also possible to check for differences between individuals with the Friedman test and then use PERMANOVA?
Thank you very much in advance!
So I have parasite prevelence data (proportion data) and host species identity. I want to check if host species impact parasite prevalence. I'm referring to a paper where they compared two models using F-test. Can someone please give me the step-by-step r codes to build the model? Considering I'm very much new to r-programming.
Hello Dear All,
I know how I can use the forward, backward, and stepwise methods in linear regression models to eliminate insignificant variables in SPSS. However, I don't know how can I use those methods in Poisson and Negative models to determine the best model.
Please if someone knows help me with that.
Thanks
Best Regards
Fathi
Hi eceryone.
We need to create a model. Our dataset contains a bunch of daily values of covariates, including wind direction. But the response variable, which is a count data, is provided at a weekly basis. We are planing to calculate weekly averages for all of variables so that we can have equal time frames for both covariates and the response variable. First of all we're not even sure using the average is sufficient or not. But assuming the covaiates' weekly means can result in a proper and unbiased inout for our model, how can we calculate the mean of wind direction?
Thanks a lot in advance. :))
Hello,
I'm currently working on analyzing the influence of environmental variables on the distribution of two copepod species. I first did a Variance Partitionning using RDA analysis where I partionned the variation in the species matrix in 4 categories:
a) Pure environmental variation
b) Spatially and seasonally structured environmental variation
c) Pure spatial and seasonal variation
d) Unexplained variation and stochastic fluctuations
This gave me an idea of the variance explaining the distribution of the whole community. Now I want to narrow it down to my two species of interest. I thought of doing a GLM with the environmental factors as my explanatory variables but I also want to integrate the spatial and seasonnal matrix in the model as covariates.
The question is simple: can I do that with a GLM or is there another analysis that you think is more appropriate here ? I am new to this whole statistical world so I'm a little bit lost.
Thank you in advance
Hello, I would like to make a generalized linear model (GLM), using functional richness as a response variable and different elements of the bottom cover (coral, algae, rocks, etc.) as explanatory variables.
But I have the inconvenience that the functional richness data is greater than 1 and with decimals, so I could not use the Poisson distribution.
Could you round the data? Or what other distribution family could I use in this case?
I have run a poisson generalized linear model and am now looking to do a post-hoc test to look at differences within my independent variable.
My independent variable is categorical with 6 categories and my dependent variable is count. Can I use a Least Significant Difference test for this? If not then which would be the best to use?
Thanks in advance!!
My independent variable is categorical, and my dependent variable is count. I have run a poisson log-linear generalized linear model which shows my independent variable is having a significant effect on my dependent. I now want to compare the categories to see if there are statistically significant differences in the count data between them. Does anybody know what test I can use here?
I would like to evaluate the effect of the independent variable (ratio of females to males) on the dependent variable Y in R. We often use cbind(y, N - y) as a dependent variable in GLM with binomial distribution. Similarly, can we use cbind() as independent variable?
I have written the following equation:
res <- glmer(Y ~ cbind(female, male) + (1|location), data = DF, family = binomial)
Is this correct?
Thank you in advance.
Kei
I conducted a magnitude estimation experiment to find out the difference between multiple conditions. Twelve people participated in the experiment, and three experimental conditions were given. Each participant performed five evaluations for each condition. Since the evaluation orders are randomly assigned, the order does not have any meaning.
I have one dependent variable (evaluation score) and two independent variables (fixed: condition, random: participant), so I think I should analyze the data with the "General Linear Model - Univariate" method. However, the raw data violates the homogeneity of variances assumption, and the SPSS disables the bootstrap option when I set the participant to a random variable. Should I use another analysis method, or can I preprocess the data to use the GLM Univariate method?
Thank you for sparing your valuable time.
Joyoung Han
Say I am interested in how mental health changes in a ten-year period using a Linear Fixed Effect Model, would it be possible to control for changes in demographics over years (e.g., income) or should I just use demographics in the first year as co-variate in the model?
Hello Scientists.. if anybody know how to analysis data using Generalize linear Model (GLM) Please let me know. I will be very Thankful to you.
I am currently working on a project that aims to characterise in R on a pool of 500 bird species the traits that may be at the origin of their introduction outside their natural habitat and thus allowing them to become invasive or not.
Thus, out of my pool of 500 species, I ended up with 150 bird species that were introduced elsewhere (introduction = 1) versus 350 others that were not introduced (introduction = 0), with approximately 80 life history traits for each of them.
My idea was therefore to use PGLS (linear models correcting for the phylogenetic effect of species on their traits) on my pool of 500 species and see which traits could explain the "introduction" variable.
The problem is that by doing this my results are biased by the presence of many more non-introduced birds than introduced birds. My initial idea was to use bootstrapping to resample my n=350 birds to n=150 and run my PGLS on this new pool of 300 species (n=150 introduced and n=150 unintroduced), repeat it and then do some model averaging.
However by doing this my final models obtained in this way are completely different at each of my R sessions. I have tried increasing the number of bootstrap runs to 10,000 but this does not solve the problem. When I do this with basic GLMs I do not encounter this problem of non-repeatability.
Would you have a solution to solve this problem of repeatability with the PGLS in my process?
I have used column-bind (cbind) function in R to create a dependent variable of two different data columns, and would like to apply Bayesian GLM analysis to it with several other independent variables. Is this doable?
Why the changing of variance of the response variable (heteroscedasticity) is important for a GLM model e.g. Poisson? Why not just fitting an exponential curve?
The histogram is the distribution of the response (i.e., subjective social class)whereas the second image is the pp-plot of the residuals after the GLM using predictors including demographics such as age, gender, education, income... Is GLM still suitable in this context? If not, what would be the best alternatives?


Why the changing of variance of the response variable (heteroscedasticity) is important for a GLM model e.g. Poisson? What about the exponential curve? Can we fit the exponential curve instead?
What MATLAB function should I use to normalize this data that is right skewed before GLM? This is the predicted variable. Please see the attached picture

Would this be considered as normally distributed and thus could be modeled using GLM? See attached image.

Predictive models that use ordinary least squares (OLS) for parameter estimation must show residuals with normal distribution and constant variance (homoscedastic).
However, in most scientific articles (in engineering-related areas, at least) I don't see a concern with meeting these assumptions. In your opinion, why does this happen? In the end, the results do not change that much when we make the necessary transformations so that these assumptions are met?
If you have had any experience with this topic, please feel free to share.
Basically having this exact same problem
I've got 4 BOLD runs, each 190 volumes, but I guess SPM interprets them as just one volume?
I've used dcm2nii to convert 190 DICOMs to just one 4D nifti file and now, after warping, co-registering, etc. tryina run the GLM model, inserted onset times, but it says
21-Aug-2019 22:08:49 - Running job #2
------------------------------------------------------------------------
21-Aug-2019 22:08:49 - Running 'fMRI model specification'
21-Aug-2019 22:08:49 - Failed 'fMRI model specification'
Error using spm_run_fmri_spec (line 131)
Not enough scans in session 1.
In file "C:\Program Files\Polyspace\R2019a\spm\spm12\config\spm_run_fmri_spec.m" (v6562), function "spm_run_fmri_spec" at line 131.
The following modules did not run:
Failed: fMRI model specification
not sure how to specify that there's 190 volumes in each .nii file.
Hi researchers,
Is it possible to perform nested GLM models or nested ANOVA of GLM models with >2 categorical variables in R, or does it make sense?
Thank you
Dear SPM-Experts,
I have set up a full factorial model on the second level in which I compare two groups (factor 1: experimental group vs. control) and two time points: pre vs post (factor 2).
I was wondering whether the contrast images for factor 2 can stem from the same first-level GLM? Specifically, there, I modelled the pre- and post-session within one GLM and defined various contrasts already on the first level, for instance, differential contrast pre vs. post (1 -1 and other regressors 0), mean activity pre (1 0), mean activity post (0 1). For the second-level full factorial model , I guess I would use then the mean activity pre (1 0) and mean activity post (0 1) contrast images from the first level as inputs (for the two levels of factor 2).
Alternatively, one could run two first-level GLMs - one for only the pre session, and one for only the post-session, and then use those mean activity contrast images for the second-level model.
However, I would prefer the first option, because then one could already compare the pre vs. post activity on the first-level. This way one could also use this difference contrast on the second level and run for instance a simple two-sample t-test, comparing the two groups.
I hope that my prefered variant (modeling both pre- and post-session in one first-level model) is also fine for the second-level full factorial model. Or is the alternative variant (separate first-level models for pre and post) preferable? And if so, why? I am not completely sure if these two variants should produce the same or similar results anyway. My intuition is yes, at least similar. On the other hand, I think that when modeling both pre- and post-session in one model the mean baseline activity to which, for instance, pre-activity-only is compared, is quite different (because it also includes post-activity) then when keeping them separate.
Maybe someone can help. Thanks a lot!
Stefan
I'm trying to fit a glm in R
My code is the following:
glm(formula = DV ~ IV1between*IV2within, data = df).
The DV distribution (not the residual!) is approximately an ex-gaussian.
Which "family" argument should I use?
Any tips are welcome, thanks!
I was wondering whether anyone had any suggestions as to what statistic to use (I use R Studio).
I have some muscle cell circumference measurements (response) and I have two explanatory variable columns (Control/Diet) and exercise (Yes/ No) (categorical) and I need the option to run an interaction between these explanatory variables (as I had 4 groups) as well as separately. A Bartlett's test states that its non-parametric data so does anyone have any suggestions as to what statistic/model I could use? I have looked into GLM's but as far as I can see, they don't work with interactions/ non-binomial data.
Thanks in advance for any recommendations.
Hello,
I am attempting a two-part model on semi-continuous data (zero inflated).
As I understand, the first part is a binary logistic regression (or probit) model for the dichotomous event of having zero or positive values.
logit[P(Yi = 0)] = xβ ......... Equation (1)
Conditional on a positive / non-zero value, the second part (continuous) can be modelled using either a OLS regression (with or without log-normal transformation of outcome variable) or generalized linear models (GLM).
log(yi|yi > 0) = xβ2 + e where e is normally distributed .... Eq (2)
Combining the above two parts, the overall mean can be written as the PRODUCT of expectations from the first and second parts of the model (refer Eq 1 and 2), as follows:
E(y|x)=Pr(y> 0|x) ×E(y|y> 0, x) .... Eq (3).
I could find a no. of papers which have employed the 'twopm' command of STATA package. However, I am using SPSS. I have conducted the two parts (binary and continuous) using the same set of predictors.
Please suggest how to multiply the results from both the parts using SPSS.
Thank you.
Below the formula and output of a glm to determine if there is a relation between hatching success (proportional data from 0 to 1 + skewed towards the 1) and some other variables such as species, location and average temperature. With aid of the AIC I want to figure out which formula fits best (thus, to get the smallest AIC), but I don't get any AIC value at all! What went wrong?
Call:
glm(formula = Success ~ Species + Location + `Average temperature`,
family = quasibinomial("logit"), data = dd)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4514 -0.5189 0.2341 0.6552 0.9869
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.99750 18.90781 -0.317 0.753
SpeciesRicordii -0.36436 0.60982 -0.597 0.553
LocationPuente Arriba -0.69432 0.90582 -0.767 0.448
LocationTierra -0.85959 0.71708 -1.199 0.237
`Average temperature` 0.08536 0.21218 0.402 0.689
(Dispersion parameter for quasibinomial family taken to be 0.471929)
Null deviance: 24.064 on 47 degrees of freedom
Residual deviance: 23.315 on 43 degrees of freedom
(178 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 4
Hi, I am not really sure if this question is valid nor makes any sense.
But for example we have a single (imaginary) species, let's say Pikapika pii, and determined its genetic diversity. PCA and STRUCTURE clustering showed three groups, GRP1, GRP2, and GRP3.
My question is, can I treat these three groups as "separate species" and use it to run a multispecies SDM, or run an ensemble of single species SDM, or this is not valid/possible at all.
I would appreciate any help/correction with this thought. If possible, you can also refer publications that I can read, or experts that I can directly consult/talk with.
Thank you so much for your time and help.
Hi everyone! I have a statistical problem that is puzzling me. I have a very nested paradigm and I don't know exactly what analysis to employ to test my hypothesis. Here's the situation.
I have three experiments differing in one slight change (Exp 1, Exp 2, and Exp 3). Each subject could only participate in one experiment. Each experiment involves 3 lists of within-subjects trials (List A, B, and C), namely, the participants assigned to Exp 1 were presented with all the three lists. Subsequently, each list presented three subsets of within-subjects trials (let's call these subsets LEVEL, being I, II, and III).
The dependent variable is the response time (RT) and, strangely enough, is normally distributed (Kolmogorov–Smirnov test's p = .26).
My hypothesis is that no matter the experiment and the list, the effect of this last within-subjects variable (i.e., LEVEL) is significant. In the terms of the attached image, the effect of the LEVEL (I-II-III) is significant net of the effect of the Experiment and Lists.
Crucial info:
- the trials are made of the exact same stimuli with just a subtle variation among the LEVELS I, II, and III; therefore, they are comparable in terms of length, quality, and every other aspect.
- the lists are made to avoid that the same subject could be presented with the same trial in two different forms.
The main problem is that it is not clear to me how to conceptualize the LIST variable, in that it is on the one hand a between-subjects variable (different subjects are presented with different lists), but on the other hand, it is a within-subject variable, in that subjects from different experiments are presented with the same list.
For the moment, here's the solutions I've tried:
1 - Generalized Linear Mixed Model (GLMM). EXP, LIST, and LEVEL as fixed effect; and participants as a random effect. In this case, the problem is that the estimated covariance matrix of the random effects (G matrix) is not positive definite. I hypothesize that this happens because the GLMM model expects every subject to go through all the experiments and lists to be effective. Unfortunately, this is not the case, due to the nested design.
2 – Generalized Linear Model (GLM). Same family of model, but without the random effect of the participants’ variability. In this case, the analysis runs smoothly, but I have some doubts on the interpretation of the p values of the fixed effects, which appear to be massively skewed: EXP p = 1, LIST p = 1, LEVEL p < .0001. I’m a newbie in these models, so I don’t know whether this could be a normal circumstance. Is that the case?
3 – Three-way mixed ANOVA with EXP and LIST as between-subjects factors, and LEVEL as the within-subjects variable with three levels (I, II, and III). Also in this case, the analysis runs smoothly. Nevertheless, together with a good effect of the LEVEL variable (F= 15.07, p < .001, η2 = .04), I also found an effect of the LIST (F= 3.87, p = .022, η2 = .02) and no interaction LEVEL x LIST (p = .17).
The result seems satisfying to me, but is this analysis solid enough to claim that the effect of the LEVEL is by no means affected by the effect of the LIST?
Ideally, I would have preferred a covariation perspective (such as ANCOVA or MANCOVA), in which the test allows an assessment of the main effect of the between-subjects variables net of the effects of the covariates. Nevertheless, in my case the classic (M)ANCOVA variables pattern is reversed: “my covariates” are categorical and between-subjects (i.e., EXP and LIST), so I cannot use them as covariates; and my factor is in fact a within-subject one.
To sum up, my final questions are:
- Is the three-way mixed ANOVA good enough to claim what I need to claim?
- Is there a way to use categorical between-subjects variables as “covariates”? Perhaps moderation analysis with a not-significant role of the moderator(s)?
- do you propose any other better ways to analyze this paradigm?
I hope I have been clear enough, but I remain at your total disposal for any clarification.
Best,
Alessandro
P.S.: I've run a nested repeated measures ANOVA, wherein LIST is nested within EXP and LEVEL remain as the within-subjects variable. The results are similar, but the between-subjects nested effect LIST within EXP is significant (p = .007 η2 = .06). Yet, the question on whether I can claim what I need to claim remains.

I am currently doing a PCA on microbial data. After running a Parallel Analysis to determine the number of factors to retain from the PCA, the answer is 12. Since my idea is to save the factor scores and use them as independent variables for a GLM together with other variables, I was wondering:
- Should I definitely save the factor scores of all 12 factors (which would become too many variables) or I can save only a few of them (e.g., the first 3 which together explain a 50% of the variance) for the GLM?
- If I can save a lower number, should I re-run the PCA retaining only that lower number (e.g. 3) or just use the factor scores already obtained when retaining the 12 ones?
Thank you all for your time and help!
I am exploring some data and and possibilities of quasibinomial GLM. The data is less than perfect. Nonetheless, the target variable can range from 0 till 1 and from my knowledge it seems okay (is it? perhaps there are other options?) to use a quasibinomial GLM. However, visualizing the predictions it seems that oversaturation of values 0.5 pull the model some down (Fig A solid line; if this is the right description). So, it is possible to weigh the values as weight=abs(values-0.5), seemingly improving the visual fit (Fig A dashed lines). Yet, then I am disregarding the 0.5 values and Base R does not return an AIC for the models. So, I am not really sure how to compare the models (besides the residuals Fig B-C). One other option would be to correlate the predicted versus the actual values. Which is higher for the weighted model. Yet, I am completely ignorant to any implications of weighing a quasibinomial model and the implications, and it is relatively difficult to find information on this, while for WLS is.
The questions are:
(1.) Is a quasibinomial GLM reasonable or are there better options?
(2.) Is weighing the GLM reasonable?
(3.) What are the implications of (2.) or are they similar to WLS?
(4.) Is it possible to compare the models by comparing the correlation between predicted and actual values?
Thank you in advance

I want to test one-sample data of mate choice between two species. Each female has spawned X times with one species and Y times with another. With a one-sample t-test, I can test if they spawn assortative or random. However, I would like to test this in a Generalized Linear Model with a binary respons with X as the dependent variable and total number of spawnings (X+Y) as the number of events occurring in a set of trials. I want to test this against 50:50 (0.5) but also against. 40:60 (0.6). This is easy to do in SPSS for a one-sample t-test where I am asked for the test value. However, I can't find out how to do it in the Generealized Linear Model module. Is it doable? If anyone know how to do it in R, I can try that as well.
GLM is an advanced quantitative data analysis but I could not find enough materials for SPSS. Please, suggest some good resources teaching and learning GLM by using SPSS.
A question :Pseudo-observations in Lasso inferential model
Dear All,
I hope all is well.
I have found a paper titled " Regression analysis of censored data using pseudo-observations: An update".
the pseudo-observations express the survival function at one point as a GLM.
I wonder if it is valid to use the pseudo-observations as the dependent variable in lasso inferential model for continuous variable? I wonder
My reasoning for this is that the pseudo-observations express the survival function at one point as a GLM. At the same time, lasso is a regularisation of the GLM.
So , would it be valid to use the pseudo-observations in lasso inferential model?
Thank you very much for your support
Looking forward to hearing back from you
Kind regards
I would like to model the results of a symptom severity questionnaire of a certain disorder using GLM. However, I have a big problem with what distribution to adopt for the questionnaire results. The tool consists of 47 questions using a 1-5 scale, so the score is always positive, takes only discrete values, and has a finite range of values it can take (47 - 47*5). The empirical distribution is additionally strongly right-skewed. Is there a "classical" probability distribution that I can use to model such a variable?
The Multivariate Linear Regression from IBM's GLM syntax provides the results:
1. Multivariate Test: having 4 test results: Pillai's trace, Wilk's Lambda, Hotelling's trace and Roy's largest roots.
2. Test of Between-Subjects Effects; providing the results of Sum Squares, Means Squares, F and Sig.
3. Parameter Test
4. Contrast Results (K-Matrix).
I am running multivariate test for 2 model :
1st Model : 1 IV with 2DV
2nd Model: 7 IV with 2 DV
Sub-question 1: If I want to reject the null hypothesis; which table should I look into?
Sub-question 2: Which figure to use for hypothesis testing of the relationship between each predictor and two variables?
Sub-question 3: As per in the headline questions, where can I look into the beta values of each predictor and each dependent variable in the multivariate regression analysis.
Thank you very much for your kind suggestion.
Regards,
Sitthimet
I am looking to run an analysis with multiple IVs (nominal) and DVs (all ordinal). I would like to have something similar to a hierarchical regression, where adding the additional variables can indicate change in model fit. While a multivariate GLM will allow me to look at multiple DVs/IVs, I'm not sure if there is a way to do this step-wise. Multinomial logistic regressions appear to only allow for one DV.
Hello, everyone, thank you for clicking this post. So, I want to analyze the effect of Corporal Punishment Myth on Corporal Punishment behavior (measure by CTS-PC). However, I want to know if the Gender of participants is a confounding variable or not. Then, should I add the gender variable to the model or should I run a separate analysis?
Thank you :)
Can I include the Generalized Linear Model in Phytoplankton Community Structure studies? Please share some related reference papers if someone found the same. Thank you in Advance. Stay Safe and Stay Happy dear Researchers.
Kind Regards
Anila P Ajayan
Rewriting this with more detail...
Hi!
I ran an experiment where participants took a test of 16 questions, yes/no (binary).
They were tested at either 7-days or 28-days - two groups between subjects.
Analayzed as subjects, their results for the test are higher at RI-7.
RI-7: 5.5
RI-28: 3.6
t-tests confirm that this difference is sig - t = 2.171, p = .032
Analyzed as items, RI-7 scores are still higher.
RI-7: 515 incorrect - 269 correct
RI-28: 596 incorrect - 172 correct (more incorrect, fewer incorrect).
However
When put into a GLMM (binomial, logit), RI-7 scores come out sig lower, no matter what I do and what other variables I put in or take out.
An example from one model:
RI effect: F = 28.542 p = <.001
RI-7->RI-28 coefficient -.625, OR .535, p = <.001
EMMs: RI-7: .668; RI-28: .790
(more detailed stats in attachment)
Can anyone explain how this is possible or give ideas for something I can check?
Thanks in advance!!
Hi all,
I am running a mixed repeated GLM design, in which I have a within-subjects categorical variable with 3 levels (factor called charisma, specific variables called av_none av_mid av_high), and my between-subjects variable is continuous (called looks). I have a significant interaction between these 2 variables in GLM Repeated. Can anyone tell me how to run custom hypothesis on this interaction through the syntax? I have tried compare (doesn't seem to work with a continuous variables), lmatrix and kmatrix (don't seem to work with within subjects), but keep getting errors. Let's say that I want to test the difference between my level 1 and level 2 of the within subjects variable, at -1SD of the between subjects variable, what should I put in the syntax for this specific hypothesis?
Hello everyone,
I have used a multivariate repeated measures ANCOVA (GLM repeated measures) in order to analyse the following:
the effect of the moderator (continuous, measured at baseline) on the effect of an intervention (2 groups with different intervention strategies: condition 1 and 2) on the DVs (2 DVs, both consisting of a pre-intervention and post-intervention measurement).
In this output, I get a significant Sig. for Wilks' Lambda in the Multivariate Tests table for the between subjects effect of my covariate and for the within subjects effect of ''time''. Furthermore, I have significant Sig. for DV1 (in the row of my covariate) in the Tests of Between-Subjects Effects, for DV2 this is not the case.
I am having some trouble interpreting this output. I figured it meant the treatment had a significant effect (because ''time'' is significant)? Also is it safe to say there is no moderation effect (because there was no significance for time*condition or time*covariant)
Or should I have conducted a different analysis?
Thanks a lot to anyone willing to help!
kind regards
I performed a meta-analysis of proportions with two different software (Rx64 4.0.5, and Stata 16).
To examine the potential impact of moderator variables on study effect size, I performed a meta-regression using a continuous candidate variable.
I read this very useful and clear document (Modeling Non-Linear Associations in Meta-Regression - https://www.metafor-project.org/doku.php/tips:non_linear_meta_regression).
I ask:
beside visual inspection of plots of effect size (in this case probably logit transformed) vs the continuous moderator, there are specific formal tests to evaluate if all the GLM assumption are respected in the context of meta-regression or the approach for linear regression diagnostics is the same as for linear regression out of context of meta-analysis and meta-regression?
Thanks for your time.
Mario Petretta
We have like cross-sectional data (sample=56k where A=9k and B=47K. We used generalized linear model (GLM) with log link and gamma distribution to compare positive medical expenditure (cost >0) for the exposure (A as Ref. vs B). Medical expenditure is right skewed by its nature because few are high cost patients. We got incremental/marginal effect in dollar value using Stata software. So far it is okay!
The issue is, we have covariate as fixed effect containing 8 sites (sites as categorical variable 1-8) but one site has extremely few observations and one other site has no observations on the exposure "A". How do we account this in modeling to minimize the bias of estimator associated with the two sites?
Is that better to remove both sites (where both sites are 25% of the data in B) from regression? OR, Is that better to include all the sites in the regression given the the problem?
Do we have any rigorous statistical/econometric modeling to account such technical issues? What else?
Thanks!
If we want to study arthropod abundance and diversity on various managements/landscapes, we also record the temperature, RH, precipitation, or sometimes specifically, we measure soil properties, host plant's phytochemical components or landscape characteristics.
I read the book authored by Alain Zuur et al. (Mixed Effects Models and Extension in Ecology with R), but I could not understand the book well because of my inadequate knowledge of statistics. Because many studies used different approaches, I don't know when we should use PCA, CCA, GLM, or LME, or their variation.
If I record arthropod abundance and diversity on various landscapes (identified at least to family level), and I measure the physical environmental properties,
What is the best analysis that could describe the effects of the environmental factors on the arthropod abundance and diversity?
To date, I only use Pearson's correlation to determine the relationship between the observed variables with the environmental factors. But I know that the correlation analysis cannot give robust results or the analysis results could be overestimated. Even in my own experience, if there is no significant correlation, I'll just remove one of the variables/factors.
Thank you.
Hi,
I'm currently conducting a MANCOVA with 3 DVs, 1 IV (treatment) with two groups, and multiple covariates. My question is, how do I control for categorical variables like gender and education when conducting a MANCOVA in SPSS? As I understand, the covariates needs to be continuous?
Thanks in advance.
I am looking for a tool in ArcGIS to help in identifying priority areas for conservation based on the output of species distribution models (maxent, GLM, BRT,..). Something maybe similar to the zonation algorithm.
Hi,
I am reading some papers that use GLM to deal with repeated measure. They often mentioned that they used "GLM repeated measures ANOVA". There are also F values and p values reported.
For example, in Vlčková (2012): "There was a statistically significant difference in protein contamination (A260/A280 ratios) using the three storing methods (GLM repeated measures ANOVA; F2,20 = 7.20, p < < 0.01)."
When I ran a GLM model, where should I look for the F value?
I found one answer online. It suggests that we use avo() function: aov(dv ~ iv, data = mydata, dispersion = NULL, test = NULL). Is this the right way?
Thank you!
I am conducting GLM on my count data with Poisson distribution settings in R and I was wondering if I could calculate log odds ratios in the models selected. I can't find it anywhere in the literature or relevant codes for the same. They all point towards binomial distributions. I would appreciate the help.
Hello all,
So I am doing an assignment and have made a GLM with two factors (nominal) for a metric DV.
I am having some confusion with understanding the process to follow here. Please note that I am using SPSS.
The tests of between subjects output suggests that there is a non-sig interaction. It also suggests that for factor 1 there is a sig difference between groups, but for factor 2 there is no sig difference between groups. For factor two post hoc supports this, but the parameter estimates show a sig (<p.05) for one of the factor two groups.
I am not sure how to interpret this. I would have thought if between subjects suggests no sig difference between groups for factor two, that none of the parameter estimates should be statistically sig. The teach emphasizes that we must interpret the marginal means.
I have spent a lot of time trying to get my head wrapped around this, but have decided it's time to reach out for help.
- on a side note - how would one approach assumption testing for this? - At this stage I have used a combination of simple linear regression (to get standardized predicted values), QQplot (from descriptives) and also the plots given through the GLM function.
Is this okay for categorical variables? Because of course the predicted value plots are rather clumpy due to the groupings...
Thank you kindly for your time :)
There seems to be vagueness when it comes to the difference between two way repeated measures and generalized linear mixed model (GLMM). As far as I know, when there are no missing values in the data of two independent variables, even if one of them is within and the other is between, there won't be any problem if the two way repeated measures is used instead of GLMM. In my case, there are no random variables but just fixed ones. One of them is time as a categorical variable. The other is treatment as a categorical variable,too. Treatment is a between-subjects variable, whereas time is a within-subjects variable. So, in this scenario, I consider I don't need to use generalized linear mixed model. I just use two way repeated measures, since there are no missing values. Is it correct ?
Hello everyone, hope you are well.
I am planning to use a Generalized Linear Model in SPSS. I would like to add gender to the model as a covariate, however from what I have read, only continuous variables can be added to the covariates box. I would like to know if I can add gender as a factor in a Generalized Linear Model, even though it is a variable that I want to adjust the model for. I also have continuous covariates (age and income) that I want to add to the model. My independent variable is Health Profile which I will add as a factor.
I have read that a categorical covariate can be added if it is coded as a dummy variable but I am unsure of which approach is best. The covariates gender, age and income are important in the context of my research.
Kind regards,
Hana
Hello,
I am searching for a statistical test to compare two proportions of responsive cells.
I know that Khi2, z-test and Fisher’s exact test allow this kind of comparison but the issue I have is that I have several proportions in each cases (ie : several samples on the same condition that all contain a proportion).
For example:
Drug A = [12/25, 1/100, 40/70]
Drug B = [17/35, 40/200, 20/50]
I suppose that I cannot use a t-test because the data is discrete (1 if the cell respond, 0 if it does not).
I also does not want to do the mean of my data because I want to consider the variation between each measures.
Should I use a GLM with a Bernoulli law or something like that?
Really looking forward to your inputs!
Thank you very much in advance :)
Angel
I am running a Generalized linear model in SPSS with linear response. Can I rely only on AIC criterion for the best model?? Or do I also need to check the deviance/df , if I should , is there any value to be achieved in case of continuous (approximately) normally distributed response??
Hello every one ,
I have a SNPs data of 3 genes so how I can analysis the association between SNPs sites and phenotypic traits and how to analysis GLM and MLM models
let me know please. your guidance will be fruitful for me.
thanks
Configurations:
Rstudio v1.3.1093
Packages: car, lme4, MuMIn, DescTools
Model: myGLM<- glm(interaction~Species+Individual+interacion item type, family=poisson)
Response variable (461 interactions): count data
Predictor variables ('species' (n = 10), 'individuals' (n = 39), 'interaction item type' (n = 3)): categorial data
Description:
Regardingless how I order the predictor variables the ANOVA (Anova(myGLM, type=c("II"), test.statistic=c("LR")) outputs zero Df and no other results for the variable ‘species’.
‘Individual’ is highly nested in ‘species’ since every individual belongs to a certain species. If tested alone (each the predictor variable ‘individual’ and ‘species’) both variables show a statistically significant effect. Therefore, the variable ‘individual’ is a more complex categorial ‘version’ of ‘species’. But it is an important focus for my study. But compared with the AICc and McFadden’s pseudo-R² the model with the individual shows a significantly better fit, than with the predictor variable ‘species’. My cautious interpretation here is currently, that the ‘real’ effect is individual then species-specific while both predictors are collinear. Is that a decent interpretation? Are there other possible reasons why ‘species’ gets no ANOVA results? Maybe there is a solution regarding the missing ANOVA results for ‘species’? I wasn’t able to find much about that topic. Maybe someone can help me. Thanks! :)
Hello.
I have an outdoor experiment with 2 species, 5 salinity levels and 2 light treatments (50% and 100% daylight). To produce the light treatments we used shade cloth and for practical reasons we created one large plot with no shade cloth (100%) and another with shade cloth (50%). Within each light treatment each species were combined with all salinity levels and replicated 8 times. To even out random variation from the two light plots, shade cloth and all plants were switch over approximately half way through the experiment (day 36 of toal 90 days). Initially i have analysed the data with a three way anova but i have been told that the nesting is a problem and switching over does account for this. Therefor my statistical analysis is supposedly flawed. To my understanding im not able to conduct an nested analysis as i only have 1 replicate of each light tretament. I was adviced that an GLM is more appropriate due to the design!? Can anyone shed light on why an GLM should be more appropriate? Any alternative analyses?
Please provide references to support your answer and indicate you research field(biology, economics etc...) in you answer.
Thanks in advance :)
I have used the following negtaive binomial model:
p3<-glm(formula = Infected ~ log(1 + PopDensity) + log(1 + GDPPPP)+ LifeExpectancy,family=negative.binomial(1),maxit=10000,data=df)
How can I justify the following questions?
1. What is the rationale behind the explanatory variables, log(1+Popdensity),
log(1+GDPPPP)? Why do we need such an unusual transformation for those variables?
2. What is the problem if we use Popdensity and GDPPPP as covariates?
My intention was to use log transformation because I had some countries like India with high population density. Some countries had very low population density. I don't know how to justify this log transformation. I appreciate your suggestions! Thanks!
I have conducted a study in which the participants are divided in two separate groups according to the language of instruction. Also, they are divided in three different groups according to age. They write two texts, and based on the structure of the texts, I classify them in seven different categories. There are two questions I want to ask:
-What is the effect of age in each structure category?
-Are there differences between the two language groups?
I applied Generalized Linear Models to answer these questions, with ANOVA post-hocs, but I'm not sure if this is the right decision or not.
Thanks
Dear community,
I'm looking into ways how to do an a-priori power analysis for an fMRI experiment where the main analysis will be a representational similarity analysis (RSA).
The experiment will present the same stimuli in two successive fMRI-sessions (with a behavioral training in between). For each fMRI session, I plan to do a model-based RSA on brain responses elicited by the stimuli. Voxel restriction will be done with a searchlight procedure. The most interesting outcome will be the difference in these results between the two training sessions, as estimated with a GLM contrast.
I think this is not uncommon as i found other experiments adopting a similar analysis procedure. I found no clue however on how to estimate the necessary sample size to achieve a certain statistical power (say 80%).
Since this is a bit of a frankenstein made from other common statistical approaches, I'm not sure if the general logic of fMRI-power analysis applies here.
Has anybody experience in this area or can point me to literature that contemplates this issue?
Thanks,
Oliver
I was fitting binary logistics regression with glm function and I discovered the standard error was extremely high, hence, I tried brglm and the standard error is now okay.
So, I have issue computing the confusion matrix for the fitted brglm.
Let's say for examples sake, I build in SPSS a multivariate GLM/MANCOVA with four groups (elderly, adult, child, infant) with my dependent variables being various cognitive performance measures. Further, I find significant between-group differences which leads me to perform a polynomial contrast, with contrast weights of 1, 3, -1, -3; my hypothesis being that performance will be lowest in infancy, increase slightly in childhood, clearly peak in adulthood, but slightly diminish as one becomes elderly. For all three cognitive measures as use as DVs, my polynomial contrast is significant for the "linear" condition.
This would imply to me that my model supports my hypothesis. However, if I perform an ANCOVA to directly compare the adult and elderly group, I find that there is not a significant between-group difference.
How do I interpret this? Can I truly say my hypothesis is supported, even though direct comparison fails to suggest a difference between the cognitive performance of adults and elderly? Can I chalk this up to simply a power problem (in this version of my real world problem, my group 1 has fewer individuals than the other 3, which seems problematic as I have a key interest in the differences between group 1 & group 2)?
Sorry if I'm missing out on some big fundamentals, I only recently started graduate school and am very novice with stats
I am trying to regress a continuous dependent variable with a poisson distribution. Will multiple regression work despite non-normality (I've heard it remains quite robust despite violations of the normality requirement), or do I have to use a generalized linear model?
Some of you might recognize my question and the background, but I struggle with this issue for more than a month now and there is only little progress...
Here some background info:
I have a variable called 'Success' which is derived from dividing the number of hatchlings by the number of eggs. (thus, how many eggs have hatched)
This Success variable is thus a proportion and is skewed towards 1.
Now I want to look if the Success is explained by each Group (n=4). From previous questions it was suggested that, because of the skewed proportional data, I should use a quasibinomial family for my glm. Therefore, my formula looks as follows:
model <- glm(Success ~ Group, family = quasibinomial('logit'), data=dd)
If I summarize the model I get:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.0834 0.3422 3.166 0.00195 **
GroupC.cornuta-C 0.6137 0.4513 1.360 0.17635
GroupC.ricordii-A 0.5020 0.4762 1.054 0.29389
GroupC.ricordii-B 0.3679 0.4567 0.806 0.42205
Because of the quasibinomial data I do not get an AIC as a reference which model has the best fit, so instead I look if the Null deviance - Residual deviance is larger or smaller in a model with for example another explanatory variable included.
But for now this question related to the summary:
I see that the success of GroupC.cornuta-B does not significantly differ from the other groups. How can I then see if for example GroupC.ricordii-A differs from GroupC.ricrodii-B?
I want to use linear mixed model in R program. I have many explanatory variables and some of them are nested. I read about this but I did not find the correct R coding. Is it possible to do the same code as the generalized linear model but adding the random effect. If so, how can I right it in R?
Sincerely,
Yassine.
Hi everyone, I have a methodological question to ask. It's my first time performing GLM on SPM12. I'll be measuring BOLD signal over 2x3 ANOVA. Normally, I have 6 conditions, however, if I take the side where the stimulus is shown into consideration, I happen to have 12 conditions.
When I have 6 conditions, I'll have 6 onset points per condition. But if I go with 12 conditions, then only three onset points I will add to each predictor(i. e. condition).
From the fmri course, we learnt that the less parameters you have the better estimate you'll get. Then six conditions would yield better estimates. However, because it is a visual perception experiment, the BOLD signal I'd get from whole brain will differ due to the display side of the stimulus (i. e. on the right or on the left). Thus, having 6 conditions might increase noise and decrease true signal.
As you can see, I am a bit confused about what to do, could you give any advice?
Cheers~
there are many method to calculate marker-trait association, namely stepwise regression, GLM, etc. Before running those methods, how we define the X (independent variables) and Y (respon variables) matrices, especially for SSR marker.
Hi there,
I know I already posted some questions on this issue, but I still cannot perform this GLM according to expectations.
First, I have a dataset with multiple explanatory variables (e.g. nest temperature, nest measurements, location and species) and one skewed, proportional response variable (nest success).
Because it is a proportional response variable, my GLM + summary look as follows:
Call:
glm(formula = Success ~ Species + Location + `Average temperature` +
`emergence tunnel (cm)`, family = quasibinomial("logit"),
data = dd)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4768 -0.5145 0.2655 0.6588 0.8621
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -10.592625 20.056906 -0.528 0.600
SpeciesRicordii -0.015988 0.722221 -0.022 0.982
LocationPuente Arriba -0.221543 0.998854 -0.222 0.826
LocationTierra -0.550702 0.823761 -0.669 0.508
`Average temperature` 0.137862 0.223718 0.616 0.541
`emergence tunnel (cm)` -0.004118 0.008694 -0.474 0.638
(Dispersion parameter for quasibinomial family taken to be 0.4711331)
Null deviance: 20.175 on 43 degrees of freedom
Residual deviance: 19.569 on 38 degrees of freedom
(180 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 4
Now I do get an output, but I just threw some possible explanatory variables in of which I don't know if they really contribute to the model (perhaps I need more or less variables).
Because I used a quasibinomial family, I do not get an AIC to see if this model is good. How can I check if my model is good then? And imagine this glm output is right, what conclusions can you make from it?!
Also when I try to check the normality of my residuals by performing...
hist(residuals.glm(model))
...the histogram shows skewed residuals towards 1.0.
In order to do a GLM I learned that the residuals MUST be normally distributed, but now it does not seem like it...
How should I solve this or am I doing something wrong?
I'm a real newbie to R, so I hope someone could help me by using understandable R-language ;).
Hello,
Can I use "years" as a continuous variable ("years" as calendar years from 1984 to 2014) to see if NDVI (normalized difference vegetation index), of the same area at the same time (summer), has changed positively or negatively over the years, for example with a GLMM or GLM ?
Thanks!
Hi!,
I am processing my data about the effects of combining two stressors on growth rates, mortality, survival, etc of aquatic insects. I used in some cases GLM models. I am quite confuse about on how interpret the results in the model (antagonist, synergistic or additive effects). For example, in one GLM my results suggested me that the best candidate model was the one without the interaction of the stressors. Is that meant that there are not (antagonist, synergistic, additive) response of the stressors?. OR maybe I need to applied other type of analysis?. Thanks for your answers.
I have GPS collaring data and I have conducted some analysis on habitat utilization using the methods described by Neu et al (1974) and Byers (1984). Besides these methods, what other analysis can be conducted to study habitat use? I have some environmental variables that (possibly) influence habitat use (for example, slope, elevation, NDVI, distance to roads, etc.,). I have run some Generalized Linear models that tell me which variables are significant but I don't feel that my study brings out anything new. Any advice or help on what other analyses can be done to gain a deeper insight into what is driving habitat use would be appreciated. Thank you.
And If GLM the which family and link will have sense to interpret the data well?
These are functional and taxonomic diversity indices of macrobenthic fauna attached in the file. and want to discuss on spatial differences within habitats.

I am able to generate GAM models using the package mgcv in R. However, I need to write out the equation explicitly. How can I do that? The command 'coeff' gives me intercepts and coefficients, but I do not know what these coefficients really mean!
Hello,
I am using generalized linear (mixed?) models and ranking selected candidate models with AIC (corrected for small sample size) to determine how weather influences the migration of birds (Hook-billed Kites). My predictor variables are precipitation and temperature with various time periods and various regions. My response variable for number of kites that migrate (magnitude) is number of kites counted at a single count-site annually/ number of hours counted (kites/hour, density or rate), so there are NO zeros and negative numbers (continuous). My response variable for timing (phenology) is the julian day when 50% of the counted population is counted. There are two count-sites that I am using data from. Mexico, 1995-2019 (25 years) and Belize, 2013-2019 (7 years). I am using different candidate model sets and running separate analyses for each response variable and for each site.
My research questions is how does precipitation and temperature certain times of the year (breeding season, month prior to migration, entire year, year prior) influence the migration of kites since there is high variation of both response variables every year. Some years 700 kites migrate and some years 8000 migrate and timing varies between years.
My questions:
1) If I use GLMM, my random effect would be year, since I am really only interested in how weather influences the migration and the years that were counted were "random." Is this right approach? If I used GLM and year was used fixed would this dredge my modeling?
2) I am considering using Poisson or Negative Binomial distribution family since the response is a density and continuous. It seems like NB is more for count data that is discrete. Is this accurate and what is the best way to test what family to use for GL(M)M?
3) Now the issue is overdispersion of data. Is the best way to test this by running the GLM and dividing the residual deviance by the df? Greater than 1 there is overdispersion? Or is there a better way to determine this?
Dear experts and peers in NeuroImaging field,
Hello, this is Jean and I just began to do structural connectivity analysis for DTI data.
So far, I constructed FA-weighted network from DSI studio. (Nodes : AAL2, Edges : mean FA along the trajectories end in the ROIs)
So, what I am doing is do t-test of group A and group B after GLM.
I did GLM for those groups, and I want to do t-test, but I noticed that I really need to do the multiple comparison corrections.
Questions:
1. Is there any way to figure out what type of multiple comparison correction should I use? (based on my data distribution, or parametric/non-parametric ? or any further consideration?)
2. I was thinking about Bonferroni correction, do you think it would be enough? Do I need a more robust correction method?
It is all about statistical analysis, so if you have any kind of comments or advice, PLEASE PLEASE help me out!!
Thank you.!
Dear all,
I would like to start a discussion here on the use of generalised mixed effect (or additive) models to analyse count data over time. I reported here the "few" analyses I know in R for which I found GOOD (things) and LIMITS /DOUBTS. Please feel free to add/ comment further information and additional approaches to analyse such a dataset.
Said that, generalised mixed effect modelling still requires further understanding (at least from me) and that my knowledge is limited, I would like to start here a fruitful discussion including both people which would like to know more about this topic, and people who knows more.
About my specific case: I have counted data (i.e., taxa richness of fish) collected over 30 years in multiple sites (each site collected multiple times). Therefore my idea is to fit a model to predict trends in richness over years using generalised (Poisson) mixed effect models with fixed factor "Year" (plus another couple of environmental factors such as elevation and catchment area) and random factor "Site". I also believe that since I am dealing with data collected over time I would need to account for potential serial autocorrelation (let us leave the spatial correlation aside for the moment!). So here some GOOD (things) and LIMITS I found in using the different approaches:
glmer (lme4):
GOOD: good model residual validation plot (fitted values vs residuals) and good estimation of the richness over years, at least based on the model plot produced.
LIMITS: i) it is not possible to include correction factor (e.g., corARMA) for autocorrelation.
glmmPQL(MASS):
GOOD: possible to include corARMA in the model
LIMITS: i) bad final residual vs fitted validation plot and completely different estimation of the richness over years compared to glmer; ii) How to compare different models e.g., to find the best autocorrelation structure (as far as I know, no AIC or BIC are produced)? iii) I read that glmmPQL it is not recommended for Poisson distributions (?).
gamm (mgcv):
GOOD: Possible to include corARMA, and smoothers for specific dependent variables (e.g., years) to add the non-linear component.
LIMITS (DOUBTS): i) How to obtain residual validation plot (residuals vs fitted)? ii) double output summary ($gam; $lme): which one to report? iii) in $gam output, variables with smoothers are not estimated (only degree of freedom and significance is given)? Is this reported somewhere else?
If you have any comment, please feel free to answer to this question. Also, feel free to suggest different methodologies.
Just try to keep the discussion at a level which is understandable for most of the readers, including not experts.
Thank you and best regards