Science topic
Advanced Statistical Analysis - Science topic
Explore the latest questions and answers in Advanced Statistical Analysis, and find Advanced Statistical Analysis experts.
Questions related to Advanced Statistical Analysis
To my knowledge, the total effect in mediation reflects the overall impact of X on Y, including the magnitude of the mediator (M) effects. A mediator is assumed to account for part or all of this impact. In mediation analysis, statistical software typically calculates the total effect as:
Total effect = Direct effect + Indirect effect.
When all the effects are positive (i.e., the direct effect of X on Y (c’), the effect of X on M (a), and the effect of M on Y (b)), the interpretation of the total effect is straightforward. However, when the effects have mixed or negative signs, interpreting the total effect can become confusing.
For instance, consider the following model:
X: Chronic Stress, M: Sleep Quality, Y: Depression Symptoms.
Theoretically, all paths (a, b, c’) are expected to be negative. In this case, the indirect effect (a*b) should be positive. Now, assume the indirect effect is 0.150, and the direct effect is -0.150. The total effect would then be zero. This implies the overall impact of chronic stress on depression symptoms is null, which seems illogical given the theoretical assumptions.
Let’s take another example with mixed signs:
X: Social Support, M: Self-Esteem, Y: Anxiety.
Here, the paths for a and c’ are theoretically positive, while b is negative. The indirect effect (a*b) should also be negative. If the indirect effect is -0.150 and the direct effect is 0.150, the total effect would again be zero, suggesting no overall impact of social support on anxiety.
This leads to several key questions:
1. Does a negative indirect effect indicate a reduction in the impact of X on Y, or does it merely represent the direction of the association (e.g., social support first improves self-esteem, which in turn reduces anxiety)? If the second case holds, should we consider the absolute value of the indirect effect when calculating the total effect? After all, regardless of the sign, the mediator still helps to explain the mechanism by which X affects Y.
2. If the indirect effect reflects a reduction or increase (based on the coefficient sign) in the impact of X on Y, and this change is explained by the mediator, then the indirect effect should be added to the direct effect regardless of its sign to accurately represent the overall impact of both X and M.
3. My main question is: Should I use the absolute values of all coefficients when calculating the total effect?
What are the processes to extract ASI microdata with STATA and SPSS?
I have microdata in STATA and SPSS formats. I want to know about the process. Is there any tutorial on youtube for ASI microdata?
I want to use SPSS Amos to calculate SEM because I use SPSS for my statistical analysis. I have already found some workarounds, but they are not useful for me. For example, using a correlation matrix where the weights are already applied seems way too confusing to me and is really error prone since I have a large dataset. I already thought about using Lavaan with SPSS, because I read somewhere that you can apply weights in the syntax in Lavaan. But I don't know if this is true and if it will work with SPSS. Furthermore, to be honest, I'm not too keen on learning another syntax again.
So I hope I'm not the first person who has problems adding weights in Amos (or SEM in general) - if you have any ideas or workarounds I'll be forever grateful! :)
Hi everyone,
If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the link/s? I'd love to see how the results are laid out and reported.
Thank you!
Hi everyone,
I ran a Generalised Linear Mixed Model to see if an intervention condition (video 1, video 2, control) had any impact on an outcome measure across time (baseline, immediate post-test and follow-up). I am having trouble interpreting the Fixed Coefficients table. Can anyone help?
Also, why are the last four lines empty?
Thanks in advance!
Hello everyone, I hope you're doing well.
I recently conducted a test on simulating near-field reflections. Using a measured dataset of OBRIRs from a KEMAR HATS in an anechoic chamber, facing a reflective surface at distances of 0.25m and 0.5m as the hidden reference. I then created a simulated room and generated OBRIRs using AKTools roomsimualtion software, using various HRTFs of near-field (matching the 0.25m and 0.5m) and far-field measurements (an overall 2m measurement).
These were then presented to listeners using headphones and head tracking, convolved with separate male and female voice stimulus that had been modelled to come out of the listeners mouth, and the listener had to imagine that it was. and repeated 3 times for each voice. For each comparison, they were asked to pick which of the 3 options (the measured, near field HRTF, the far field HRTF) they thought was the most real/believable/plausible and then rate it on a scale from 1-6, 1 being not at all, 6 being very plausible. Each comparison, the options were randomised, so that the listener wouldn't get used to picking the same one. This was then repeated 3 times for each voice, then also repeated another 3 times for the other distance. This gave a total of 12 measurements per listener (3 male 0.25, 3 female 0.25, 3 male 0.5, 3 female 0.5).
My Hypothesis was that each of the options would be equally plausible and so there would be an equal selection from the listeners choices overall. So a presumed split of 1/3 between each option. I thought a Chi Square test would be suitable, however this is not true, as the data holds multiple answers from each listener.
I can't seem to find any data analysis methods that work for this setup? I thought about just taking each listeners initial response for the male 0.25, then female 0.25, then male 0.5, then female 0.5 and comparing that...somehow using Chi Square?
I was also intrigued if the distance and the voice had an effect on which option the listeners liked the most?
It does seem like there is a slight difference in which option was preferred. From a total of 22 listeners, the far-field HRTF had a higher frequency of 105, compared to the reference of 71 and the near-field of 88. I'm mostly looking at tests that can say whether this is statistically significant or not, but with a sample size of 22, I doubt I'll be able to make any huge judgements. But some listeners caught onto which they preferred and gave the same option each time. I might need to exclude this, or keep it I'm not sure?
Any advice you can provide is greatly appreciated, any further questions or information you need please let me know!
Thank you!
We are looking for a highly qualified researcher with expertise in advanced statistical analysis to contribute to a scientific article to be submitted to a prestigious journal by the end of the year (2024). The article will focus on the adoption of digital innovations in agriculture.
Key responsibilities:
- Carry out in-depth statistical analysis using a provided database (the dataset is ready and available in SPSS format).
- Apply advanced statistical techniques, including structural equation modelling and/or random forest models.
- Work closely to interpret the results and contribute to the manuscript.
The aim is to fully analyse the data and prepare it for publication.
If you are passionate about agricultural innovation and have the necessary statistical expertise, we would like to hear from you.
Hi everyone.
When running a GLMM, I need to turn the data from wide format to the long format (stacked).
When checking for assumptions like normality, do I check them for the stacked variable (e.g., outcomemeasure_time) or for each variable separately (e.g., outcomemeasure_baseline, outcomemeasure_posttest, outcomemeasure_followup)?
Also, when identifying covariates via correlations (Pearson's or Spearman's), do I use the seperate variables or the stacked one?
Normality: say the outcomemeasure_baseline normality is violated but normality for the others weren't (ouecomemeasure_posttest and outcomemeasure_followup). Normality for the stacked variable is also not violated. In this case when running the GLMM, do I adjust for normality violations because normality for one of the seperate measures was violated?
Covariates: say age was identified as a covariate for outcomemeasure_baseline but not the others (separately: ouecomemeasure_posttest and outcomemeasure_followup OR the stacked variable). In this case, do I include age as a covariate since it was identified as one for one of the seperate variables?
Thank you so much in advance!
So my student have a question that i cannot answer as well. She analyzing the effect of ICT toward labor productivity using 8 years data panel using 4 independent variables with EVIEWS 13. Frankly i quite surprised that the R-squared value on her results is 0.94 with only 2 significance variables. Theoretically, in simple regression model with higher value of R-squared most likely indicates bad and have statistics problems. Recently, i asked her to calculate the data using STATA and the results shows that only have 0,51 R-Square with exact similar coefficient.
I've search some articles about it and says that eviews might be wrong, and some says that STATA is wrong. Can someone explain what should i do and which software have to use?
note:
1. Some articles says to using areg command in stata to find similar value as eviews, but i quite doubt because areg is using for categorical regression in stata and its not quite fit in panel regression model.
2. Some says that eviews is wrong calculation.
Hi everyone,
Does anyone have a detailed SPSS (v. 29) guide on how to conduct Generalised Linear Mixed Models?
Thanks in advance!
I'm doing a research proposal and want to compare bilingual people to monolingual people on a dot perspective task.
- The first IV (IV1) will be language ability with two levels: monolingual (control) /bilingual
- The second IV (IV2) will ONLY be applied to the bilingual group: Ps are informed the avatar is bilingual or not (so two levels again, repeated measures with counterbalancing)
The DV is reaction times in the dot perspective task.
I am just wondering how I would go about analysing this? I was thinking an ANOVA, but as the control group are not exposed to IV2 do I just simply compare the means of all groups?
I want to compare
- Control group reaction times to BOTH levels of IV2 combined (overall RT for bilinguals)
- Control group reaction times to each level of IV2
- Level 1 vs level 2 of IV2 (whether avatar is said to be bilingual or not)
Is it best to split this study into 2 experiments or is it possible to keep it as one and analyse it as one?
Suppose that we have three variables (X, Y, Z). According to past literature Y mediates the relationship between X & Z while X mediates the relationship between Y & Z. Can I analyze these interrelationships in a single SEM using a duplicate variable for either X (i.e., Xiv & X Ddv) or Y (Yiv or Ydv)?
What are the possible ways of rectifying a lack of fit test showing up as significant. Context: Optimization of lignocellulosic biomass acid hydrolysis (dilute acid) mediated by nanoparticles
Hello,
I have the following problem. I have made three measurements of the same event under the same measurement conditions.
Each measurement has a unique probability distribution. I have already calculated the mean and standard deviation for each measurement.
My goal is to combine my three measurements to get a general result of my experiment.
I know how to calculate the combined mean: (x_comb = (x1_mean+x2_mean+x3_mean)/3)
I don't know how to calculate the combined standard deviation.
Please let me know if you can help me. If you have any other questions, don't hesitate to ask me.
Thank you very much! :)
Dear Community,
I would like a question regarding the use of Partial Least Square Regression Analysis. Basically ,I am confused in units. For example, I have Year 1, Year 2, Year 3 land cover and Water Balance Components. The units of Water Balance Components are "mm", while the units of each landcover type for year 1, 2 and 3 are in square Km. I am confused how the different units will perform the PLSR test.
Either, I have to use the % difference in each Year or % of particular landcover type to the total area of the basin and similarly convert the water balance variables from "mm" to percentage.
Looking for a guidance. Please teach me.
Regards
Hello everyone,
I am performing multiple comparisons at the same time (post hoc tests), but among all the possible p value adjustments available (Bonferroni, Holm, Hochberg, Sidak, Bonferroni-Sidak, Benjamini-Hochberg, Benjamini-Yekutieli, Hommel, Tukey, etc.), I don't know which one to choose... And I want to be statistically correct for the comparisons that I am making in my experiment.
In my experiment, there are 4 groups (let say A, B, C, D), but I want to compare A vs B, and C vs D. That's all. So, after performing wilcoxon tests, the non-parametrical equivalent of a t test (because I have such a low amount of repeat per group (n=6) + non-normality for some groups), for A vs B, and C vs D, I don't know which p value adjustment should be performed here.
I would like to understand 1. which adjustment I should perform here. 2. how to decide which test I should perform for any other analysis (what is the reasoning).
Thanks in advance for your response,
I have a dataset that includes 1900 companies. Also, I investigated 10 employees for each company. There is a question about the risk preference of each employee. At now, I need to calculate the ICC1 and ICC2 values for each company. I have already coded for each company, so each company will have a unique company_id. At now, I have the employee dataset, it means I have the 19000 data, and each employee will match the company according to the company_id. In this case, how to get the ICC1, and ICC2 value of each company in R. I have already tried for few days, expecting someone could resolve my problem.
In the case of a constant coefficient, where the VIF is greater than 10, what does that mean? Do all the variables in the model exhibit multicollinearity? How can multicollinearity be reduced? Multicollinearity could be reduced by removing variables with VIF >10. But I don't know what to do with the constant coefficient.
Thank you very much
I want to repeat a statistical function (like -lm-, -glm- or -glmgee-) for a lot of variables. But, it does not work for statistical functions (example 1) but works for simple functions (example 2).
Important: I do not mean multivariate regression and using cbind()!
Example 1:
a = rnorm(10, 5, 1)
b = rnorm(10, 7, 1)
c = rnorm(10, 9, 1)
d = rnorm(10, 10, 1)
i = list(a, b, c)
for (x in i) {
lm(x~d)
}
Example 2:
a = rnorm(10, 5, 1)
b = rnorm(10, 7, 1)
c = rnorm(10, 9, 1)
d = rnorm(10, 10, 1)
i = list(a, b, c)
for (x in i) {
plot(x+d)
}
You can check in this site: https://rdrr.io/snippets/
Dear all,
I hope this message finds you well. I am currently in the process of applying for an Alexander von Humboldt Foundation fellowship, and I am actively seeking a host professor in Germany who shares my research interests and expertise.
As an experienced epidemiologist, my primary research focus lies in the fields of obesity and diabetes from a life course perspective. Over the years, I have honed my skills in the intricate handling of complex data and advanced statistical analysis, including the application of multilevel growth models and causal mediation analysis.
I would be honored to explore the possibility of collaborating with you as my host professor in Germany. Your expertise and research interests align well with my background, making you an ideal candidate for this partnership.
If you are open to discussing the potential of hosting me as a fellow in your research group, I would greatly appreciate the opportunity to engage in a more detailed conversation about our research synergies.
Thank you for considering my inquiry, and I look forward to your response.
Best,
Jie
There are a lot of researchers who go by the book the right approach and write results, and observations in their field of work, proving the existing information or suggesting improvement in the experiment for better analysis and so on, very hard working but then there are other who are crazy thinkers always suggesting things with little backup from existing experiments or know facts, always radical in their understanding of results, and these people mostly get dismissed as blip by the first category of researchers.
So if I have to take your opinion who will you back for hitting gold one who is methodical and hardworking or who are crazy thinker?
It is possible to run a regression of both Seconday and primary data in the same model? I mean, when the dependent variable is primary data to be sourced via questionnaire and the Independent variable is secondary data to be gathered from published financial statements?
For Example: if the topic is Capital Budgeting moderator and shareholders wealth (SHW). Capital budgeting moderators is proxy by inflation , management attitude to risk, Economic condition and Political instability. while SHW is proxy by Market value, Profitability and Retained earnings.
Hi all,
I am trying to calculate the curvatures of the cornea and compare them with Pentacam values. I have the Zernike equation in polar coordinates (Zfit = f(r, theta)). Can anybody let me know the equations for calculating the curvatures ?.
Thanks & Regards.
Nithin
Hi! I have a dataset with a list of species of sponges (Porifera) and the number of specimens found for each specie in three different sites. I add here a sample from my dataset. Which test should I use to compare the three sites showing both which species where found in each site and their abundance? I was also thinking of a visual representation showing just the difference between sites in terms of diversity of species (and not abundance), so that is possible to see which species were just in one sites and which ones were in both sites. For this last purpose I thought about doing an MDS but I am not sure if it is the right test to do neither how to do it in R and how to set the dataset, can you help me finding a script which also show the shape of the dataset? any advice in general would be great! thank you!
Is it possible to test a 2x2x2 design where the first two variables are manipulated high/low categories and the third variable is a measured continuous variable?
Would it be suitable to convert the measured continuous variable to a categorical variable to create a 2x2x2 design?
If so, i would now have 8 categories with multiple high/low combinations.
What test would i use to identify the differences across these groups in a dependent variable if i want to hypothesize that the DV would vary as a function of high/low categorical variable (3rd variable) values?
I am conducting a study which has 3 IVs ( POP, SOE, PI) and 1 DV (COO). 2 of the IVs (POP and SOE) are manipulated variables for high/ low, making 4 groups. However, the third IV (PI) is a measured variable which is a continuous variable. This means i cannot manipulate it to create high/ low conditions.
Should i convert the continuous IV (PI) to high/low conditions to make a 2x2x2 design?
If yes, what values of the high/ low aspects will i enter into my data sheet ?
If no, what options do i have for my analysis?
Someone told me it is not a good idea to convert the continuous third IV to a categorical variable. They told me the options i have are either Hierarchical regresison analysis or multiple regression analysis with interaction terms.
I would like to mention that i would also like to see the interactive effects of all three IVs, not only combinations of 2 IVs, on the DV. I want to hypothesize that COO will be highest for the combinations of High POP, high SOE, and high PI. Alternatively, the COO outcome should vary for high PI, when POP and SOE are high.
I would like suggestions to gain clarity on the best apporach i should follow, and the tests my study needs. For any analysis, what values do i enter in my data sheet for high/ low values of the two categorical IVs?
Greetings,
I am currently in the process of conducting a Confirmatory Factor Analysis (CFA) on a dataset consisting of 658 observations, using a 4-point Likert scale. As I delve into this analysis, I have encountered an interesting dilemma related to the choice of estimation method.
Upon examining my data, I observed a slight negative kurtosis of approximately -0.0492 and a slight negative skewness of approximately -0.243 (please refer to the attached file for details). Considering these properties, I initially leaned towards utilizing the Diagonally Weighted Least Squares (DWLS) estimation method, as existing literature suggests that it takes into account the non-normal distribution of observed variables and is less sensitive to outliers.
However, to my surprise, when I applied the Unweighted Least Squares (ULS) estimation method, it yielded significantly better fit indices for all three factor solutions I am testing. In fact, it even produced a solution that seemed to align with the feedback provided by the respondents. In contrast, DWLS showed no acceptable fit for this specific solution, leaving me to question whether the assumptions of ULS are being violated.
In my quest for guidance, I came across a paper authored by Forero et al. (2009; DOI: 10.1080/10705510903203573), which suggests that if ULS provides a better fit, it may be a valid choice. However, I remain uncertain about the potential violations of assumptions associated with ULS.
I would greatly appreciate your insights, opinions, and suggestions regarding this predicament, as well as any relevant literature or references that can shed light on the suitability of ULS in this context.
Thank you in advance for your valuable contributions to this discussion.
Best regards, Matyas
I have a longitudinal model and the stability coefficients for one construct change dramatically from the first and second time point (.04) to the second and third time point (.89). I have offered a theoretical explanation for why this occurs, but have been asked about potential model bias.
Why would this indicate model bias? (A link to research would be helpful).
How can I determine whether the model is biased or not? (A link to research would be helpful).
Thanks!
Hello !
Please, relevant to the significance level of a test, I would like to know when can we increase the critical value α to 0.1.
Thank you in advance.
first i send this number to bard chat bot
6020 5603 54830 6750 23000 12370 10000
it responds this.
You have listed 7 numbers.
- 6020
- 5603
- 54830
- 6750
- 23000
- 12370
- 10000
The sum of these numbers is 103573.
The average of these numbers is 14825.
The median of these numbers is 6750.
The mode of these numbers is 10000, which appears twice.
The range of these numbers is 23000 - 10000 = 13000.
The standard deviation of these numbers is 3844.74.
Is there anything else I can help you with?
then i ask it to standardize them.
and the respond was this.
Sure. Standardizing the numbers will mean subtracting the mean and then dividing by the standard deviation. This will transform the numbers so that they have a mean of 0 and a standard deviation of 1.
The standardized numbers are:
- -2.18
- -2.88
- 2.44
- -1.38
- 3.22
- -0.78
- 0.00
----------------------------------------
Bing chat respond to the same question what this
I can standardize the numbers you provided. Standardization is the process of transforming data to have a mean of 0 and a standard deviation of 1. This is done by subtracting the mean from each data point and then dividing by the standard deviation.
Here are the standardized values for the numbers you provided:
- 6020: -0.705
- 5603: -0.801
- 54830: 2.891
- 6750: -0.524
- 23000: 0.693
- 12370: -0.191
- 10000: -0.363
Is there anything else I can help with? 😊
------------------------------------------
and perplexity respond was this.
Based on the search results, it is possible to standardize the numbers provided by converting them into z-scores. Standardizing data is a common practice in statistics to compare variables that have different scales or units of measurement. To standardize the numbers provided, one would need to calculate the mean and standard deviation of the data and then transform each value to its corresponding z-score
Here are the steps to standardize the numbers:
- Calculate the mean of the data: Mean = (6020 + 5603 + 54830 + 6750 + 23000 + 12370 + 10000) / 7 = 15038.43
- Calculate the standard deviation of the data: Standard deviation = sqrt(((6020 - 15038.43)^2 + (5603 - 15038.43)^2 + (54830 - 15038.43)^2 + (6750 - 15038.43)^2 + (23000 - 15038.43)^2 + (12370 - 15038.43)^2 + (10000 - 15038.43)^2) / 6) = 17523.22
- Calculate the z-score for each value: Z-score = (X - Mean) / Standard deviationZ-score for 6020 = (6020 - 15038.43) / 17523.22 = -0.706 Z-score for 5603 = (5603 - 15038.43) / 17523.22 = -0.768 Z-score for 54830 = (54830 - 15038.43) / 17523.22 = 2.238 Z-score for 6750 = (6750 - 15038.43) / 17523.22 = -0.426 Z-score for 23000 = (23000 - 15038.43) / 17523.22 = 0.455 Z-score for 12370 = (12370 - 15038.43) / 17523.22 = -0.152 Z-score for 10000 = (10000 - 15038.43) / 17523.22 = -0.853
By standardizing the numbers, we have transformed them into a common scale that allows for easier comparison and analysis. It is important to note that the interpretation of z-scores depends on the context and the distribution of the data.
-----------------------------------------------------
so, as you can see it can't even get sum of the data correct. my question is why this models cant get sum of this number right even when they apply all the correct steps
Hello, could someone assist me in interpreting the results of the sequential Mann-Kendall Sneyer test? Indeed, according to Dufek (2008: Precipitation variability in São Paulo State, Brazil), "In the absence of any trend, the graphical representation of the direct series (u(t)) and the backward series (u'(t)) obtained with this method yields curves that overlap several times." In my case, I observe two to three overlaps, often with sequences that exhibit significant trends. Should I also conclude that there is an absence of trends in my dataset?
In plant breeding, what are uses discrimination function.
I am looking for a graphical tool like visual basic software to define R codes for interactive graphical buttons and text boxes.
For example, I want to design a windows application with graphical design for calculation of body mass index (BMI). I want to have two boxes for weight and height imputation and a button for run. When clicking the button, I want to the below code be run.
BMI < - box1/(box2^2)
some of the people who consult are only users of statistics, while others are the ones who develop statistics, and we would love that people use it correctly.
But, "I believe" that many arrive late, always post process of experimentation, asking "what statistical process can I do or apply". Perhaps they do not know that they should always consult, with the question or the hypothesis that they wish to answer or verify, since it would allow a better answer.
On the other hand, some come with simple queries, but usually a statistics class is given as an answer, which I feel in some cases is late. In some cases it is extremely necessary, but in others, it opens a debate that leads to serendipity.
Wouldn't it be better, to try to advise them in a more precise way?
I read them:
Dear colleagues,
I analyzed my survey data using binary logistic regression, and I am trying to assess the results by looking at the p-value, B, and Exp(B) values. However, the task is also to specify the significance of the marginal effects. How to interpret the results of binary logistic regression considering the significance of the marginal effects?
Best,
I constructed a linear mixed-effects model in Matlab with several categorical fixed factors, each having several levels. Fitlme calculates confidence intervals and p values for n-1 levels of each fixed factor compared to a selected reference. How can I get these values for other combinations of factor levels? (e.g., level 1 vs. level 2, level 1 vs. level 3, level 2 vs. level 3).
Thanks,
Chen
Has anyone conducted a meta-analysis with Comprehensive Meta-Analysis (CMA) software?
I have selected: comparison of two groups > means > Continuous (means) > unmatched groups (pre-post data) > means, SD pre and post, N in each group, Pre/post corr > finish
However, it is asking for pre/post correlations which none of my studies report. Is there a way to calculate this manually or estimate it somehow?
Thanks!
Hello,
I’m working with ANCOVA models in R-studio. I’ve constructed a model as follows:
fit<-aov(outcome~factor1+cov1+cov2+cov3)
Where “outcome” is a normal distributed continuous variable; “factor1” has 3 levels; “cov 1” and 2 are continuous variables; and “cov3” is a 2 levels variable.
The model fits well, but I want to perform multiple comparisons between the levels of my factor. That is:
1 vs 2
2 vs 3
3 vs 1
Therefore I’ve been trying the “glht” function:
postHocs<-glht(fit, linfct = mcp(factor = "Tukey"))
And I receive this error:
Error: unexpected '=' in "postHocs<-glht(fit[…]
I’ve also try to use the function “as.factor” in my "factor" to avoid problems related to the type of variables, but I get the same error.
I will appreciate any help
Thanks in advance!
Joan.
Hello everyone,
I'm going to conduct a meta-analysis of psychological interventions relevant to a topic via Comprehensive Meta-Analysis (CMA) software. I have a few questions/points for clarification:
- From my understanding, I should only meta-analyse interventions that have used a pre-test, post-test (with and/or without follow-up) design, as meta-analysing post-test only designs with the others is not effective. Is my understanding correct?
- Can I combine between-subjects and within-subjects designs together or do I need to meta-analyse them separately?
Thanks in advance!
I have ordinal data on happiness of citizens from multiple countries (from the European Value Study) and I have continuous data on the GDP per capita of multiple countries from the World Bank. Both of these variables are measured at multiple time points.
I want to test the hypothesis that countries with a low GDP per capita will see more of an increase in happiness with an increase in GDP per capita than countries that already have a high GDP per capita.
My first thought to approach this is that I need to make two groups; 1) countries with low GDP per capita, 2) countries with high GDP per capita. Then, for both groups I need to calculate the correlation between (change in) happiness and (change in) GDP per capita. Lastly, I need to compare the two correlations to check for a significant difference.
I am stuck however on how to approach the correlation analysis. For example, I dont know how to (and if I even have to) include the repeated measures of the different time points the data was collected. If I just base my correlations on one timepoint the data was measured, I feel like I am not really testing my research question, considering I am talking about an increase in happiness and an increase in GDP, which is a change over time.
If anyone has any suggestions on the right approach, I would be very thankful! Maybe I am overcomplicating it (wouldnt be the first time)!
Hello! I would like to address the experts regarding a question about conducting a statistical analysis using only nominal variables. Specifically, I would like to compare the responses of survey participants answered the question whether they take certain medications "Yes" or "No", and analyze the data with different criteria such as education level, economic status, marital status, etc. I have conducted a Chi-squared test to determine if there is a significant difference between the variables, but now I would like to compare the answers of whether or not this medicine is taken depending on each group, for example in the education variable (higher, secondary, vocational and basic education). Is there a statistical test similar to Tukey's test that is suitable for nominal variables? I would also like to know if it is possible to create a column chart with asterisks above the columns indicating the significant differences between them based on this test for nominal variables.
I usually use Statistica StatSoft and R studio. But none of my attempts to do post-hoc for nominal variables analysis on any of them were successful. In R studio I tried pairwise.prop.test(cont_table, p.adjust.method = "bonferroni")
But I got an error:
Error in pairwise.prop.test(cont_table, p.adjust.method = "bonferroni") :
'x' must have 2 columns
I assume that this is due to the fact that I have groups in one of the variables and not two.
What should I do?
Thank you in advance for your help!
The variables I have- vegetation index and plant disease severity scores, were not normal. So, I did log10(y+2) transformation of vegetation index and sqrt(log10(y+2)) transformation of plant disease severity score. Plant disease severity is on the scale of 0, 10, 20, 30,..., 100 and were scored based on visual observations. Even after combined transformation, disease severity scoring data is non-normal but it improves the CV in simple linear regression.
Can I proceed with the parametric test, a simple linear regression between the log transformed vegetation index (normally distributed) and combined transformed (non-normal) disease severity data?
Hi everyone! I need to examine interactions between categorical and continuos predictors in a path analysis model. What strategy would be more accurate: 1) including the categorical variable, the continous one and the interaction as separate terms, 2) run a multigroup analysis?
I have the same problem with several models. For instance, examining potential differences of executive function (continuos predictor) effects on reading comprehension (outcome variable) among children from different grades (categorical predictor).
Thank you so much for your help!
I want to study the relationship between parameters for physical activity in a lifespan and the outcome of pain (binary). I have a longitudinal data with four measurement, hence repeated measures.
Should I do an GEE or a mixed method? And does anyone guides on how to rearrange my dataset so it will fit the methods? I have tried the GEE with long data and wide but I keep on getting errors.
To clarify, my outcome is binary (at the last measurement) and further my independent variables are measured at four times (with the risk of them being correlated).
How can I define a graphics space to make plots like the attached figure below using the graphics package in r?
I need help locating each position (centering) using the "mar" argument.
Reghais.a
I am tiding up with the below problem, it's a pleasure to have your ideas.
I've written a coding program in two languages, Python and R, but each came to a completely different result. Before jumping to a conclusion, I declare that:
- Every word of the code in two languages has multiple checks and is correct and represents the same thing.
- The used packages in two languages are the same version.
So, what do you think?
The code is about applying deep neural networks for time series data.
Hi, I am looking for a way to derive standard deviations from estimated marginal means using mixed linear models with SPSS. I already figured where SPSS provides the pooled SD to calculate the SMD, however, I still need the SD of the means. Any help is appreciated!
I have an data of 30 X 1 matrix, in which by using gradient descent algorithm is it possible to find the best optimized value.If yes, please share me the procedure or link for the detailed background theory behind it.it will be helpful for me to proceed further on my research.
I want to display the bivariate distribution of two (laboratory) parameters in sets of patients. I have available the data of N, mean +- SD of the first and second parameters. I am looking for software that could draw a bivariate distribution = ellipse from the given parameters. Can someone help me? Thank you.
Hi,
There is an article that I want to know which statistical method has been used, regression or Pearson correlation.
However, they don't say which one. They show the correlation coefficient and standard error.
Based on these two parameters, can I know if they use regression or Pearson correlation?
How to run the Bootstrap method to estimate the error rate in linear discriminant analysis using r code?
Best
reghais.A
Dear all,
I want to know your opinions
Also, there is good paper here
Also,
How can I add the robust confidence ellipses of 97.5% on the variation diagrams (XY ilr-Transformed) in the robcompositions ,or composition packages?
Best
Azzeddine
Res. Sir/ Madam,
I am working as Scientist (Horticulture) and my research focus is improvement of tropical and semi arid fruits. I am also interested in working out role of nutrients in fruit based cropping systems.
Looking for collaborators from the field of Genetics and Plant Breeding, Horticulture, Agricultural Statistics, Soil Science and Agronomy.
Currently working on Genetic analysis for fruit traits in Jamun (Indian Blackberry).
I am testing hypothesis of relationships between CEA and Innovation Performance (IP). If I am testing the relationship of one construct , say Management support to IP , is it ok to use single linear regression? Of should I be testing it in a multiple regression with all the constructs?
What are current recommendations for reporting effect size measures from repeated measures multilevel model?
Concerning analytical approach, I have followed procedure by Garson (2020) with matrix for repeated measures: diagonal, and matrix for random effects: variance components.
In advance, thank you for your contributions.
Merry Christmas everyone!
I used the Interpersonal Reactivity Index (IRI) subscales Empathic Concern (EC), Perspective Taking (PT) and Personal Distress (PD) in my study (N = 900) When I calculated Cronbach's alpha for each subscale, I got .71 for EC, .69 for PT and .39 for PD. The value for PD is very low. The analysis indicated that if I deleted one item, the alpha would increase to .53 which is still low but better than .39. However, as my study does not focus mainly on the psychometric properties of the IRI, what kind of arguments can I make to say the results are still valid? I did say findings (for the PD) should be taken with caution but what else can I say?
I have done my qPCR experiments and gave me some results, I used the DDCt method and I calculated the 2^(-DDCt), I transformed my data in base 10 logarithm and separated my samples between control and patients. I want to ask if I see that there is for example a fold change 4 times higher in patients for my gene of interest then I use one-tail or two-tail t-test, and what if the distribution is not normal, will I do non-parametric test, or I can skip the outliers and do the t-test. I am very confused in that statistical conundrum.
Dear all,
I have conducted a research about snake chemical communication where I test the reaction of a few adult snake individuals (both males and females) to different chemical compounds. Every individual is tested 3 times with each of the compounds. Basically, I put a soaked paper towel in each of the individual terrariums and record the behavior for 10 minutes with a camera. The compounds are presented to the individuals in random order.
My grouping variable represents the reactions to each of the compounds for each of the sexes. For example, in the grouping variable I have categories titled “male reactions to compound X”, “male reactions to compound Y” etc. I have three dependent variables as follows: 1) whether there is an interest towards the compound presented or not (binary), 2) chin rubbing behavior recorded (I record how many times this behavior is exhibited) and 3) tongue-flick rate (average tongue-flicks per minute). The distribution is not normal.
What I would like to test is 1) whether there is a difference in the behavior between males and females, 2) whether there is a difference between the behavior of males snakes to the different compounds (basically if males react more to compound X, rather than to compound Y) and the same goes for females, and finally 3) whether males exhibit different behavior to different types of compounds (I want to combine for example compounds X, Y and Z, because they are lipids and A, B and C, because they are alkanes and check difference in male responses).
I thought that PERMANOVA will be enough, since it is a multivariate non-parametric test, but two reviewers wrote that I have to use Generalized linear mixed models, because of the repeated measures (as mentioned, I test each individual with each of the compounds 3 times). They think there might be some individual differences that could affect the results if not taken into consideration.
Unfortunately, I am a newbie in GLMM, and I do not really see how such model can help me answer my questions and test the respective hypotheses. Could you, please, advise me on that? And how should I build the data matrix in order to test for such differences?
Isn’t it also possible to check for differences between individuals with the Friedman test and then use PERMANOVA?
Thank you very much in advance!
Holzinger (Psychometrika, vol. 9, no. 4, Dec. 1944) and Thurstone (Psychometrika, vol. 10, no. 2, June 1945; vol. 14, no. 1, March, 1949) discussed an alternative method for factoring a correlation matrix. The idea was to enter several clusters of items (tests) in the computer program beforehand, and then test them, optimize them and produce the residual matrix (which may show the necessity of further factoring). These clusters could stem from theoretical and substantive considerations, or from an inspection of the correlation matrix. It was an alternative to producing one factor at a time until the residual matrix becomes negligible, and was attractive because it spared much calculation time for the computers in that era. That reason soon lapsed but the method is still interesting as an alternative kind of confirmatory factor analysis.
My problem is: I would like to know the exact procedure (especially the one by Holzinger) but I cannot get hold of these three original publications (except the first two pages), unless against big expenses, nor can I find a thorough discussion of it in another publication, except perhaps in H.H. Harman (1976): Modern factor analysis, Section 11.5, but that book has disappeared from the university library, while on Google-books it is incomplete. Has anyone a copy of these publications, or is he/she familiar with this type of factor analysis?
Please share this question with expert in statistics if you don't know answere.
I am stuck here, as i am working on therapy and trying to evalute the changes in biomarker levels. So I have selected 5 patients and analysed their biomarker levels prior therapy and then after first therapy and followed by 2nd therapy. So as i apply anova results show significant difference in their mean values but due larger difference in their standard deviations i am getting non significant results
like in this table below.
Sample Size Mean Standard Deviation SE of Mean
vb bio 5 314.24 223.53627 99.96846
cb1 bio 5 329.7 215.54712 96.3956
CB II 5 371.6 280.77869 125.56805
So I want to know from all those good statsticians who are well aware about the clinical trial studies.
Please suggest
Am i performing statistics correctly?
Should not i worry about non significant results?
What are the statistical tests I should use?
How will I represent my data for publication purposes?
Please be eloberative in answers?
Try to teach like you are teaching to the fresher to this field.
I have a distribution map produced with only presence data. And there is a certain number of presence data that is in no way included in the model. How can I evaluate the compatibility of the presence data not included in the model I have with the predictive values corresponding to these points in the potential distribution map? So we can also think like this: I have two columns. The first column has only 1 values, the second column has the predictive values. Which method would be the best approach to examine the relationship between these two columns?
Hello, I currently have a set of categorical variables, coded as Variable A,B,C,etc... (Yes = 1, No = 0). I would like to create a new variable called severity. To create severity, I know I'll need to create a coding scheme like so:
if Variable A = 1 and all other variables = 0, then severity = 1.
if Variable B = 1 and all other variables = 0, then severity = 2.
So on, and so forth, until I have five categories for severity.
How would you suggest I write a syntax in SPSS for something like this?
I am using -corr2data- to simulated raw data from a correlation matrix. However, some variables that I need should be binary. How can I convert?
Is it possible to convert higher amounts to 1 (and the other ones to 0) as the form to reach the same mean? How should I do it?
Is a way in R?
(I want to perform a GSEM on a correlation matrix)
(I know -faux- package in R. But my problem is that just some of [not all of] my variables are binary.)
Hello, I currently have a set of categorical variables, coded as Variable A,B,C,etc... (Yes = 1, No = 0). I would like to create a new variable called severity. To create severity, I know I'll need to create a coding scheme like so:
if Variable A = 1 and all other variables = 0, then severity = 1.
if Variable B = 1 and all other variables = 0, then severity = 2.
So on, and so forth, until I have five categories for severity.
How would you suggest I write a syntax in SPSS for something like this? Thank you in advance!
I am creating a hypothetical study in which there are two drugs being tested. Thus I have taken 60 participants and randomly split them into three groups: drug A, drug B and a control group. A YBOCS score will be taken before the trial after the trial has ended and then again at a 3-month follow-up. Which statistical test should I use to compare the three groups and to find out which was most effective?
For example:
If there are 40 species identical between two sites, they are the same. However, two sites can each have 40 species each, but none in common. So by species number they are identical but by species composition they are 0% alike.
How can I calculate or show the species composition of the two sites over time?
During the lecture, the lecturer mentioned the properties of Frequentist. As following
Unbiasedness is only one of the frequentist properties — arguably, the most compelling from a frequentist perspective and possibly one of the easiest to verify empirically (and, often, analytically).
There are however many others, including:
1. Bias-variance trade-off: we would consider as optimal an estimator with little (or no) bias; but we would also value ones with small variance (i.e. more precision in the estimate), So when choosing between two estimators, we may prefer one with very little bias and small variance to one that is unbiased but with large variance;
2. Consistency: we would like an estimator to become more and more precise and less and less biased as we collect more data (technically, when n → ∞).
3. Efficiency: as the sample size incrases indefinitely (n → ∞), we expect an estimator to become increasingly precise (i.e. its variance to reduce to 0, in the limit).
Why Frequentist has these kinds of properties and can we prove it? I think these properties can be applied to many other statistical approach.
Assuming that a researcher does not know the nature of population distribution (the parameters or the type e.g. normal, exponential, etc.), is it possible that the sampling distribution can indicate the nature of the population distribution.