Science topic

# Research Statistics - Science topic

Explore the latest questions and answers in Research Statistics, and find Research Statistics experts.
Questions related to Research Statistics
• asked a question related to Research Statistics
Question
I have been trying to find some tools to apply statistics to my data. But most of the tools like OriginLab, PRISM, etc. are paid tools. Can anyone please suggest some free tools that can help in data analysis?
• asked a question related to Research Statistics
Question
Our group conducted an undergraduate research about fiber-polypropylene composite panels, we have a 4 different independent variables that includes fiber ratio (0% ; 5% ; 10%), length(1cm/2cm), fiber age(old-young), and fiber treatment (treated/untreated) to measure the strength(dependent variable) of the composite panel. Is factorial ANOVA will do or ANCOVA? If not, what statistical treatment would be appropriate for our research?
Neil Patrick raguine Razon I guess Factorial ANOVA is the appropriate one here. ANCOVA is relevant only when one of your independent variables is non-categorical, but in this case, all your independent variables are categorical.
Best !!
AN
• asked a question related to Research Statistics
Question
We have two Likert-type scales: Brief-COPE and Perceived Stress Scale.
Likert
• asked a question related to Research Statistics
Question
I often read research articles that do not appropriate describe the sampling procedure. For example, the authors mention that the sampling was accidental or purposive but they don’t explain why they had only 10 participants or they say that the sampling was theoretical and describe the a priori established socio-demographic characteristics of their subjects.
I think that the sampling procedure is one of the most important elements of a research. I think that the research should be evaluated according to this procedure and I think that many pieces of research should be rejected because of sampling not being appropriate.
The sampling procedure should state the sampling criteria and should justify the number of participants.
There are two kinds of sampling in social research: statistical – probability or non-probability (when the sampling criteria and the number of participants are established before entering the field according to some rigid sampling rules) and theoretical (when the sampling criteria and the number of participants are flexible, decided in the research process according to relevancy and saturation rules).
Very useful discussion. I remember, the saying "it is better to study one rat for 1000 hrs rather than studying 1000 rats for one hr if you need to study behaviors of rats". But that particular rat should be selected carefully; not to choose a 'mad' or 'disabled' rat. Now by studying this carefully selected rat (theoretical sampling) we can generalize certain behavioral patterns of rats, as well as can gain a deep understanding of the behavior of rats, in that time and space. Observing 1000 rats, randomly or even purposively (statistical sampling), for one hour would help to identify a single or couple of features of rats, in that particular context. So, the sampling technique depends on the nature (not the purpose) of the research.
• asked a question related to Research Statistics
Question
Dear Colleagues
Have you ever had such publishing experience that the editor evaluates your qualitative research according to the rigors of quantitative research?
My story:
The editor of one of the journals assessed (negatively) my article containing a qualitative case study research, according to the quantitative research rigor. He stated:
"The analysis falls well short of the rigor required of an academic publication. No evidence is provided about whether the results would generalize to other settings, apparently there were no control groups, there are no statistical tests, the results did not generate enough clicks or sales to have a clear interpretation, etc. "
I have answered (and waiting for reaction):
"...you assessed it according to the rigor of the quantitative research ("no statistical test", "to generalize"), while my research is qualitative ("to identify categories and/or patterns") and implemented according to the procedure suggested by K. Eisenhardt.
Please read the classic article by Bourgeois and Eisenhardt (attached), which analyzes four observations (stories) and doesn't contain tests. Eisenhardt  is quoted in almost all articles made with the case study (qualitative) research method. Please rate my article according to the principles of the qualitative research rigor."
Bourgeois LJ, III and Eisenhardt KM (1988) Strategic Decision Processes in High Velocity Environments: Four Cases in the Microcomputer Industry. Management Science, Vol. 34, No. 7 (Jul., 1988), pp. 816-835
Have you ever had such experience when editor or reviewer uses quantitative research rigors to assess a qualitative research?
What is your experience as to publishing the qualitative research?
Regards
Richard Kleczek
Hallo!
Unfortunately, I faced another example of the use of quantitative study evaluation criteria to evaluate my qualitative study by an editor: "There is a concern that the sample size in your study is rather small – 33 individual interviews. It may not produce robust results ". By the way one of classic sigle case study research (Echeverri and Skålén 2011) did 33 interviews: 'The present article draws mostly on the 34 interviews conducted during the second and third rounds' (Echeverri and Skålén 2011, p. 13)
Echeverri P and Skålén P (2011) Co-creation and co-destruction: A practice-theory based study of interactive value formation. Marketing Theory, 11(3): 351-373.
What do you think of it?
• asked a question related to Research Statistics
Question
Hi everyone,
I ask what is the software and the steps to analyse ADV measuring data?
I recommend WinADV, you can easily sort and analyse the ADV data and it's really useful for people working with ADV.
• asked a question related to Research Statistics
Question
For conducting research statistics is very important. Without the proper knowledge of statistical analysis, proper interpretation of data is not possible. Just good research is not enough alone if the results are not presented in a palatable and accurate way. Only statistical knowledge can help in this regard.
You may want to consider this free online course which starts with the basics and becomes quite advanced - it was vetted and approved externally from our Centre
It can be followed in MLwiN, Stata and R; each module is divided into two parts - conceptual and practical; we provide the data for the practical. There are quizzes that help you evaluate you understanding.
1. Using quantitative data in research (watch video introduction)
2. Introduction to quantitative data analysis (watch video introduction)
3. Multiple regression
4. Multilevel structures and classifications (watch video introduction)
5. Introduction to multilevel modelling
6. Regression models for binary responses
7. Multilevel models for binary responses
8. Multilevel modelling in practice: Research questions, data preparation and analysis
9. Single-level and multilevel models for ordinal responses
10. Single-level and multilevel models for nominal responses
11. Three-level multilevel models
12. Cross-classified multilevel models
13. Multiple membership multilevel models
14. Missing Data
15. Multilevel Modelling of Repeated Measures Data
• asked a question related to Research Statistics
Question
Dear researchers
Is regression analysis suitable for scientific research method to the data set obtained from secondary sources (index, annual reports, statistics published by institutions, etc.)?
Best Regards...
Thank you for your answer. I think it is very useful for me.
Best Regards...
• asked a question related to Research Statistics
Question
Hello my expert friends
I am currently looking for any articles or researches or statistics on the moderating effect of legal knowledge on behavioral intention.
I would be obliged if anyone of you could kindly share your research or any research u know which specifically touch on this issue.
God bless. Thank you very much.
Tq Sir @Ariful Islam
• asked a question related to Research Statistics
Question
Hello my expert friends
I am currently looking for any articles or researches or statistics on the moderating effect of ethical knowledge on behavioral intention.
I would be obliged if anyone of you could kindly share your research or any research u know which specifically touch on this issue.
God bless. Thank you very much.
Regards
Cikgu Armand
Thanks for the response....sure.
• asked a question related to Research Statistics
Question
Hey!
I'm currently doing research on academic spin-offs. I send my survey to the complete population (196 companies/founders). Normally, I would need a sample size of 132, but I'm sure this is impossible. What to do with my research when I only get +-75 responses, do I just have to change my margin of error or is the whole research not statistical anymore? I'm confused.
That answer makes a huge difference.
In a thesis it is allowed to discuss one or more projects that are failures (in my opinion). The important part is what you learned from the failure. With 4 questions and 23 completed surveys you can analyze the data. The next part is to interpret the analysis. The discussion is then about what failed, why, and how to change things to increase the chances of success, and within this to also talk about what the data might say. Even if it turns out to be unpublishable, this project could still be an outstanding success.
• asked a question related to Research Statistics
Question
Good day,
I am currently writing my bachelor thesis which involves quite a bit of research and statistics.
Could anyone please help me understand how having data that is highly skewed and having Outliers/extremes in the data affects the performance of the following two tests:
-shapiro-wilk and kolmogorov smirnov tests of normality
-levene's test for assessing homogeneity of variance for 2+ groups.
Would removing the outliers/extremes and using logarithmic transformation on the data before these tests make them more accurate/inaccurate? Or would logarithmic transformation increase the chance of Type 1 error?
Throughout my previous statistical courses, 99% of the time the problems were extremely simple and we could always assume that the data is normally distributed. When using t-test, we could simply use welch test with df when the variance could not be assumed equal. However, now that I am in a real world situation it is quite different. My analysis focus is on analyzing the current performance situation across multiple entities of the same organization and detecting where the biggest differences are, how big is the difference (CI for difference between means) and why (factors)/
For this of course, I need to look into non-parametric alternatives for ANOVA and t-test.
For ANOVA I have found that Kruskal-Walis test is the most used non-paremtric alternative to ANOVA. For post-hoc Dunn Bonferroni test, Mann-whitney U, games-howell's. - Mann-whitney U I can also use as a non-parametric t-test.
Looking forward to all the responses and suggestions.
Best regards,
Janis Frisfelds
My first problem is that I don't understand what you need to test normality and homoscedasticity if you know that the distribution is skewed.
If you transform a variable, the tests (Shapiro-Wilk or Levene) are about the distribution of the transformed variable. That does not make them more or less accurate.
Removing outliers is only sensible if these values are "bad values", that is, when they are extremely implausible (or even actually impossible, e.g. as result of a typo or some other mistake) o if there are extrenal reasons that invalidates them (e.g. it is known that the animal from which that outlying value was taken was unintendedly sick, or that a person being interviewed turned out to have misunderstood the question, was drunk or sponsored to give a manipulated answer etc.).
You current studies don't seem to involve experiments (it seems you take observational data / questionaires). I personally don't see much sense in significane testing in these cases. Particularily not, if the task is to explore different factors (their correlation with the outcome measure), and to identify or find those with the "biggest effects" (these could be the statistically non-significant ones!).
When using rank-based methods (supposedly what you call "non-parametric") you loose the quantitative relationship between the variables, something I consider not really helpful (at least in 99% of the cases, to make up a statistic, too :)). Be careful when talking about "non-parametric alternatives (to parametric tests)". These "alternatives" are about different hypotheses. For instance, MWU and t-test do not test the same hypothesis (except in those cases where the distribution is in fact normal - where you would not think about any "alternative" at all). The sad truth is that most people re-iterating these textbook phrases just don't know what hypothesis the MWU-test (or similar non-parametric tests) is (are) actually testing. As an analogy of this absurdity I could ask you if the apple or the pear I have is heavier (testing the null then the weight of both fruits is equal). Now you say: "ah, sorry, man, I don't have a good weighing maschine for these things son I wouldn't trust it. But, hey! I can easily spot the color, and let me tell you: the apple is significantly more reddish than the pear (p < 0.01). You have that nice p-value. That should answer your question!"
Ok, it' not that stupid, but the direction is right, I think.
Take-home message: Don't let tests dictate what you should do. Think thoroughly what you want, what data you need to provide the relevant information, and how you can extract, visualize and communicate that information. Tests may not at all be helpful. Look for plausibility rather than for (statistical) significance.
• asked a question related to Research Statistics
Question
The research group OASYS (Optimization and Analytics for Sustainable EnergY Systems) at the University of Málaga (Málaga, Spain) is currently looking for talented and entrepreneurial candidates to fill a postdoctoral position on the topic "Data-driven optimization for decision making in energy."
The Postdoctoral Researcher is expected to possess a PhD preferably in mathematics (operations research, statistics, mathematical programming), control or electrical engineering.
The appointments involve a competitive salary (commensurate with qualifications), healthcare and social security insurance.
Detailed information on the position we are offering and on the application procedure can be found at https://sites.google.com/view/groupoasys/open-positions?authuser=0
Applications should be received no later than December 2, 2018.
Question was asked 2 months before, amd i saw it now :(
• asked a question related to Research Statistics
Question
Which test should be run for determining statistical differences between 2 (r values) correlations (correlated correlations)?
I am looking to see if there are any differences between a placebo and an intervention in a pre-post crossover study design.
I have read that a Steiger's Z test should be used, however my samples are not independent.
Any help would be appreciated.
Hello Terun. You listed SPSS as one of the topics for your question. I suspect that either file 6 or file 7 from the page linked below will give you what you want.
HTH.
• asked a question related to Research Statistics
Question
I am currently working on a project related to retention of bedside nurses. I would like to know if anyone has statistical data that explains the experience level of icu nurses in the US.
Hi Heidi,
Given your paramaters of within-US, I believe with a bit of digging you can come up with the descriptive data you're seeking by searching the US Bureau of Labor Stats (BLS), URL --> https://www.bls.gov/opub/ted/employment.htm?view_full
I can suggest to keyword search (variable name scope) 'work experience' OR 'TENURE'.
Hope this may of some useful help to you, and best of luck on your study!
With kind regards and cordially yours,
Matthew J. Kerry
• asked a question related to Research Statistics
Question
Hi,
I have used MANOVA to examine a hypotheses. However, I am not not very confident about its interpretation.
Any help from you guys is highly appreciated.
Dear Abdullah
The attached documents are very helpful in calculating and interpreting the measures.
Regards,
Zuhair
• asked a question related to Research Statistics
Question
hi, so i'm doing my final year project for psychology but ive never learnt research statistics. my professor isnt really helping either. anyway, im investigating the interaction between gender of participants + gender of victims + country of participants and the level of tolerance for domestic violence. my question is, do i use 3 way anova? or is it mixed? or repeated measures? i googled and saw these terms a lot but not sure which is suitable for my research. please help!
• asked a question related to Research Statistics
Question
I want to conduct research of 5 malnourished children (stunting, wasting and underweight)?
/*(For NFHS-3)
For Underweight (-SD), Stunting (-SD), Wasting (-SD)*/
use IAKR52FL
gen haz06=hw70
replace haz06=. if hw70>=9996
gen waz06=hw71
replace waz06=. if hw71>=9996
gen whz06=hw72
replace whz06=. if hw72>=9996
(For -2SD)
gen below2_haz = ( haz06 < -200)
replace below2_haz=. if haz06==.
gen below2_waz = ( waz06 < -200)
replace below2_waz=. if waz06==.
gen below2_whz = ( whz06 < -200)
replace below2_whz=. if whz06==.
(For -3SD)
gen below3_haz = ( haz06 < -300)
replace below3_haz=. if haz06==.
gen below3_waz = ( waz06 < -300)
replace below3_waz=. if waz06==.
gen below3_whz = ( whz06 < -300)
replace below3_whz=. if whz06==.
label variable haz06 "Length/height-for-age Z-score (stunting)"
label variable waz06 "Weight-for-age Z-score (Underweight)"
label variable whz06 "Weight-for-length/height Z-score (Wasting)"
label variable below2_haz "Stunting (-2SD)"
label variable below3_haz "Stunting (-3SD)"
label variable below2_waz "Underweight (-2SD)"
label variable below3_waz "Underweight (-3SD)"
label variable below2_whz "Wasting (-2SD)"
label variable below3_whz "Wasting (-3SD)"
save IAKR52FL, replace
• asked a question related to Research Statistics
Question
Can anybody help me how I will calculate mean absoulte error(MAE) in pose estimation. I know it is error between ground truth pose and predicted pose but how i will caluclate this. Thanks
Hello Khalil Khan How did you solve the problem. Please share the solution.
• asked a question related to Research Statistics
Question
I am researching for statistical theory of reliability and life testing.
Assignment of Reliability for an industrial product that has several parts. These parts act in series.
• asked a question related to Research Statistics
Question
I have a question about the use of publication bias modeling approaches in meta-analyses of proportions.
The traditional approaches of assessing publication bias, such as the rank correlation test, Egger’s regression model, and weight function approaches have all assumed that the likelihood of a study getting published depends on its sample size and statistical significance (Coburn and Vevea, 2015). Although it has been confirmed by empirical research that statistical significance plays a dominant role in publication (Preston et al., 2004), this is not entirely the case. Cooper et al. (1997) have demonstrated that the decision as to whether to publish a study is influenced by a variety of criteria created by journal editors regardless of methodological quality and significance, including but not limited to, the source of funding for research, social preferences at the time when research is conducted, etc. Obviously,the traditional methods fail to capture the full complexity of the selection process.
In practice, authors of meta-analyses of proportions have employed these methods in an attempt to detect publication bias. But, studies included in meta-analyses of proportions are non-comparative, thus there are no “negative” or “undesirable” results or study characteristics like significant levels that may have biased publications (Maulik et al., 2011).
Therefore, in my opinion, these traditional methods may not be able to fully explain the asymmetric distribution of effect sizes on funnel plots. It is also possible that they may fail to identify publication bias in meta-analyses of proportions in that publication bias in non-comparative studies may arise for reasons other than significance.
I'm not sure if my reasoning is correct. Can someone shed some light on this? If someone could point me to some papers regarding this topic, that'd be wonderful.
References: Coburn, K. M., & Vevea, J. L. (2015). Publication bias as a function of study characteristics. Psychological methods, 20(3), 310.
Cooper, H., DeNeve, K., & Charlton, K. (1997). Finding the missing science: The fate of studies submitted for review by a human subjects committee. Psychological Methods, 2(4), 447.
Preston, C., Ashby, D., & Smyth, R. (2004). Adjusting for publication bias: modelling the selection process. Journal of Evaluation in Clinical Practice, 10(2), 313-322.
Maulik, P. K., Mascarenhas, M. N., Mathers, C. D., Dua, T., & Saxena, S. (2011). Prevalence of intellectual disability: a meta-analysis of population-based studies. Research in developmental disabilities, 32(2), 419-436.
Hello Naike,
When the goal is estimation of a parameter (like a population proportion) rather than a comparison of conditions/treatments/methods, then publication bias (due to nonsignificant results) really isn't pertinent, as you correctly suggest. You can, of course, generate funnel plots and use checks such as Egger's to see whether the distribution of sample estimates follows what one would ordinarily anticipate (less variation with higher N, roughly symmetric dispersion about the middle values). Past that, though, the exercise is not very informative.
Proportions, though, are often transformed to behave more like continuous (e.g., unconstrained) variates; the arcsine transform (2arcsin(sqrt(p)) and logit transform [ln(p/(1-p))] are common methods (see http://strata.uga.edu/8370/rtips/proportions.html).
• asked a question related to Research Statistics
Question
Is it disjoint, as in the US, or coordinated, as in the case of Statistics Canada, or the Italian National Institute of Statistics?  What advantages and/or disadvantages do you see?
Very good point, Adrian and Peter, regarding one statistical program checking another.  (I was not aware of the details on Canada which you provided, Adrian. Thank you.)  But in the US, agencies generally are not allowed to share data, so if data from one agency might inform another regarding issues such as regressor/auxiliary data, or even enough information to be better able to stratify when sampling, it may be difficult to impossible to allow  cooperation.  If say a manufacturers' energy  consumption survey would benefit/be more efficient if the survey methodology were planned and estimators based on supplementary information, such as number of employees or expenditures, which may establish a size measure for each member of the population, that kind of thing may not be possible.  US agencies use memoranda of understandings, MOUs, to try to work together when allowed, but even then it can be tedious/inefficient.
• asked a question related to Research Statistics
Question
Need a list of research design and best statistics for analysis.
Answers inform of e books.Text or reference are welcomed.
There are many textbooks that could help such as
Kellar, Sacey & Kelvin, Elizabeth. (2013). Munro’s Statistical methods for health care research. 6th. ed. Philadelphia: Lippincott.
I use this book to teach applied statistics for master students.
If you specify your research quesitions, then it will be come easier to guide you through the best statistical analysis.
• asked a question related to Research Statistics
Question
The manuals I have seen are a bit frustrating. They are either too simple, or too complex. Is there a manual that has a strong intro to R, good explanation of multivariate stats, and good intro to graphics? Should I look for separate manuals then?
Hi Gerardo,
A Handbook of Statistical Analyses
Using R is my favorite..
• asked a question related to Research Statistics
Question
I want to study climatic and hydroloic variables, I need to find out the specific time duration of the trend shown as the normalized forward and backward sequences provide that. I am not sure about the procedure, How to conduct it.
If you have the R software (which is open source) installed, the following steps can help perform seqMK test:
1. Write the command: install.packages('pheno')
• A CRAN mirror will be displayed, choose a mirror of 'INDIA' from the given list (I think this is the location close to you). It'll take some time while installing the necessary packages.
2. Write the command: library(pheno)
• This again will take some time to install.
3. Write the command: seqMK(c(data values separated by comma))
• This will give the progressive/forward and retrograde/backward series, and  identifies the change point as 'TRUE' value in the time series.
• Also, by plotting the progressive/forward and retrograde/backward series, the change point can be determined at the intersection between the curves. If the intersection occurs within the confidence interval (e.g. +/- 1.96 for α = 0.05), it indicates a change point.
Hope that helps!!
• asked a question related to Research Statistics
Question
Two-way ANOVA on a dataset (codes and results given below) showed no interaction. However, the post hoc output from lsmean package showed a compact letter display which sounds like there is an interaction. The compact letter display for "treatment 2" at "week1" is "c" whereas it is "ab"  at week2 and week3, respectively. Does this show an interaction (the effect of treatment changes with change in week)or am I misinterpreting the result? The codes given below.
library(lsmeans)
week1 1 664
week1 1 617
week1 1 647
week1 2 732
week1 2 819
week1 2 843
week1 3 850
week1 3 670
week1 3 722
week2 1 561
week2 1 581
week2 1 586
week2 2 597
week2 2 669
week2 2 654
week2 3 747
week2 3 708
week2 3 705
week3 1 630
week3 1 630
week3 1 664
week3 2 576
week3 2 666
week3 2 716
week3 3 776
week3 3 773
week3 3 726
dat1\$treatment<-as.factor(dat1\$treatment)
aov(effect~time*treatment,data=dat1)->m1
summary(m1)
tk <- lsmeans(m1, pairwise ~ treatment*time,adjust="tukey",sort=F)
cld(tk,Letters=letters, sort=F)
Without knowing more about your data, it is possible to have a nonsignificant overall test but show significance in main effects or simple main effects.
Check power and effect sizes.
Marcie Zinn
• asked a question related to Research Statistics
Question
SPSS is an important research statistic tool,some students would like to practice in poor resource setting ,any free online study packages for beginners.
Charbel: Didn't the Holy Spirit mention to you that software piracy was illegal, and that it deprived people of their livelihoods? Please don't post links to pirate software, and please pay for what you use.
I notice that you have a chapter on ethics and marketing. This is kinda ironic.
• asked a question related to Research Statistics
Question
I've calculated a value for user satisfaction (via CSI - Customer Satisfaction Index) and its result is expressed in percentage [%]. My calculated value (for Customer Satisfaction Index) expresses, that the general satisfaction of users of any intern system is 76 [%]. Now I am asking myself, if there is any scale or lookup table, which "compares"/"expresses" this value in a more appropriate way. I wouldn't like to leave my result just as a standalone value. I would like to moreover express my result in words and give detailled information on what it generally means (e. g. for improving). Is there any scientifically proven way to do this?
There is a growing body of research linking customer satisfaction to outcomes like customer retention, word-of-mouth, sales, stock price and so forth. We provide an overview in our MSI paper, which may be downloaded at the link provided. Hope it helps.
• asked a question related to Research Statistics
Question
Please guide me if one can use the combination of both PLS and CB (AMOS). Like in my case, I am using the items of the instrument developed and tested in other cultures/ countries. My first objective is to confirm the items and their relevance to our culture. Here I wish to run EFA in SPSS and CFA in AMOS.My second objective is to explore the relationship between these variable through a measurement model. Here I intend to run the model in Smart PLS.
Please guide me on this also share relevant literature if possible.
Dear Sougui,
To be clear: there is no problem in using both techniques in the same research. However, there is an adequate context suggested by literature for each one.
Now, using two different data processing techniques in the same research, it is necessary to bear in mind that in addition to the double effort of processing, analysis and interpretation of data; all theoretical support, should also be built to support both interpretations and conclusions made about the data analysis with CB or PLS.
In other words, you need: 1) a theory to support interpretations with CB-SEM (The goal is theory testing, theory confirmation, or the comparison of alternative theories), and 2) a theory to support interpretations with PLS-SEM (The goal is predicting key target constructs or identifying key "driver" constructs).
Kind Regards
• asked a question related to Research Statistics
Question
I've searched Google and the question database of RG, but couldn't find an answer. Hope you can help.
In a study I use four IV's (linked to product attributes - within individuals - level 1) and several level 2 variables (to measure characteristics between individuals).
I use SPSS 'mixed > linear', following Heck et al.'s book on multilevel and longitudinal modeling with SPSS (2014). I aim to analyse how the IV's (level 1) predict the DV and how level 2 variables moderate the IV-DV main effects.
My question is twofold.
First, when trying to examine variability in intercepts across individuals (the relationship between the IV's and the DV), do you need to build a model only with the IV's as fixed-effects parameters? Or should you run a full model (with IV's, control variables and interactions) and then look at the parameter coefficients and significance of the IV's? This confuses me, also because of the difference between both models in IV coefficients and significance levels.
Second, when trying to explain variability in an IV-DV slope across individuals (so introducing cross-level interactions to see whether a level 2 variable moderates the IV-DV main effect), you need to specify the level1-DV slope as a randomly varying parameter in the model. In my case, the IV1-DV, IV2-DV (etc.) slopes are the randomly varying parameters. My question is, do I need to add all IV's (so all slopes) to the model? Or do I need to model the cross-level interactions on each slope, one at a time? To be more specific, the syntax in SPSS may look like this in both options:
WIth only the IV1-DV as a random slope (see bottom line):
MIXED DV2 WITH IV1 IV2 IV3 IV4 ctrl1 ctrl2 variablelvl2_1 variablelvl2_2
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0,
ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=IV1 IV2 IV3 IV4 ctrl1 ctrl2 variablelvl2_1 variablelvl2_2 variablelvl2_1*IV1
variablelvl2_2*IV1 | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT IV1 | SUBJECT(IndividualNr) COVTYPE(VC).
WIth all four IV-DV slopes as a random slope (see bottom line):
MIXED DV2 WITH IV1 IV2 IV3 IV4 ctrl1 ctrl2 variablelvl2_1 variablelvl2_2
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0,
ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=IV1 IV2 IV3 IV4 ctrl1 ctrl2 variablelvl2_1 variablelvl2_2 variablelvl2_1*IV1
variablelvl2_2*IV1 | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT IV1 IV2 IV3 IV4 | SUBJECT(IndividualNr) COVTYPE(VC).
Thank you very much in advance.
It all depends how much information you have - how variable are the X's and how many level 1 and level 2 units you have. That is the issue is statistical power to estimate such a sizeable model. To give you some idea Raudenbush and Bryk (1992) indicate  with 60 students in each of 160 schools it is possible to estimate 4 random variables at school level (so random intercepts and 3 lots of random slopes).
I would proceed pragmatically
1 include all the level 1 variables in a random intercepts model, then add
2 each level 2 variable (one at a time) then sequentially in terms of the biggest reduction in the deviance
3 then allow random slopes at level 2 (one at a time) then sequentially in terms of the biggest reduction in the deviance ; to see what can be fitted and to give you an idea of what needs to be accounted for at level 2
4 Finally include each cross-level interaction  (one at a time) then sequentially in terms of the biggest reduction in the deviance
This is deliberately exploratory and you have to be open about it so that people will understand what you have done. If you have lots of data you can have a calibration dataset and a hold -out sample to ';test' the final model.
You can see a published example of this type of approach here
• asked a question related to Research Statistics
Question
I'm doing a Systematic Review of Quantitative research. The research topic is very limited so to have any studies to compare the homogeneity is lost needed to do a meta-analysis (varying populations, different tools used & different factors measured. Therefore performing a meta-analysis is discounted.
My lecturers are most comfortable doing Qualitative Research and so have suggested doing a thematic analysis to discuss the findings, but this does not sit right with my theoretical perspective and use of strictly quantitative data, and seems like making it fit rather than sit naturally. Why can't I just do a Quantitative data analysis that is Quantative, not pushed into Qualitative methods.
Is there an alternative to Meta-Analysis, for example as (Quantitative Research Synthesis, Q. Systematic Review, Q Data synthesis, or Q Data Analysis) as my data analysis method akin to Thematic Analysis but for Quantitative Research. Can't find any literature to suggest an alternative method for Quantitative Evaluation apart from Meta-Analysis, and I can't find evidence that suggests Thematic Analysis can work for Quantitative even if I wanted it do.
In summary, is a Quantitative Research Synthesis (or other term) a viable data analysis method? Or do I have to do thematic analysis if meta-analysis is not possible?
Dear Nicki,
First if the same underlying structure is measured with different tools in different studies for example quality of life, it is possible to combine across studies using the SMD (see for example http://handbook.cochrane.org/  section 9.2.3.2 'The standardized mean difference')
Second if this is not advisable because of different populations etc. I would suggest a simple descriptive analyses. To give a structure you could use the PICO format (population, intervention, comparator and outcome) when formulating your research question. Then for the reporting make a table describing the population, the intervention and the comparator in each study. Group the reporting of the outcomes so that for example if your question is about an intervention on how to help people quit drinking and one of your outcomes is alcohol consumption, then for each study you report the outcomes you think covers the overall concept of alcohol consumption for example no. of heavy drinking days, drinks per week etc. Your next outcome might be abstinence here you could report no. of persons abstinent or time to first drink or first binge.
I hope this help.
Cheers,
Britta
• asked a question related to Research Statistics
Question
Is it 'scientifically rigorous' to study the entire population instead of selecting a sample in a study?
To my mind you can never observed the 'population' even if you have a census as this is a theoretical concept that is not achievable in practice.
I have set out the argument and consequences of this in much more detail in a previous posting
as I say there " the observed count should be considered to be the outcome of a stochastic process which could produce different results under the same circumstances. It is this underlying process that is of interest and the actual observed values give only an imprecise estimate of this"
In brief I think you are always dealing with a sample.
I know that a lot of people will disagree with me!
• asked a question related to Research Statistics
Question
Hello,
I have a feature set of 195 features and 17 classes. Is there any way to have graphical representation of my data? Don't want to use PCA space
Hi, You may try a heatmap grid.
• asked a question related to Research Statistics
Question
I am completing my dissertation on the attitudes and beliefs of teachers to students with disabilities.  My chair suggested the use of the survey however the program director is challenging the use of survey,
Thank you for the responses.  It is greatly appreciated.
• asked a question related to Research Statistics
Question
Dear All,
I would like to request you to highlight me how to work with longitudinal data in R. Sharing any relevant  resources (book, articles, tutorials and youtube links etc) would be highly appreciated.
Sincerely
Perhaps you would like to take a look at the book of Zuur et al. on Mixed Models... This is an excellent intro with lots of examples on "zero-inflated" data.
• asked a question related to Research Statistics
Question
I don't know, how to understand score of ICIQLUTSqol? What does it mean that patient has 52 scores in this questionnaire? Can you help me? :)
Thank you for help! I think it is good resolve, decision.
Best regards, Magda
• asked a question related to Research Statistics
Question
CMV has been relabeled common source bias and similar three-letter "invocations" continues to crowd out reasonable conversations. Analyses using single datasets are being used to make ambitious generalizations -- the importance of theoretical arguments now seems to be in the background.
Paul Spector in a 2006 piece in ORM had articulated a balanced perspective on this issue. What are you seeing in your disciplines -- thoughtful balance or labeling and quixotic quests?
Personally, I particularly enjoy the work of Podsakoff on this matter, who presents specific ways to try to cope with CMB. Also, using a single survey-based data source might be the best method when 'both the predictor and criterion variables are capturing an individual’s perceptions, beliefs, judgments, or feelings’ (Podsakoff, MacKenzie, and Podsakoff 2012, 549).
Nevertheless, in one of our recent studies that was accepted for publication in Public Money & Management we coped with CMB as follows: The independent variables (formality of and participation during strategic planning) were asked to single informants (one per municipality) whereas the dependent variable (strategic-decision quality of the strategic plan) was asked to multiple informants (at least 2 per municipality). This allowed us to, to some extent, cope with CMB while still using data from a single survey-based source.
Food for thought!
Sincerely,
Bert
• asked a question related to Research Statistics
Question
I have over 5 years developing predictive models with years of experience as a researcher, statistical analyst as well as data scientist. One aspect that I have experienced within the big data sphere as well as predictive modeling landscape is that a lot of emphasis are either place on data quality and quantity, the experience and expertise of the modeler, the kind of system that is being used to build the model, validate, test, and continue to monitor and assess the model quality and performance over time. With this said, I would like to see what others here on Research Gate think are some of the challenging task building either a statistical or predictive models and what were some of the strategies you employed to help address those challenges? What were some of the tradeoffs you had to make and how would you approach similar situation in the future?
Information provided to this inquiry will be used for personal and professional growth.
Hello,
For me, possibly the most challenging is/was/will be to identify a niche within a vast amount of knowledge to be able to introduce meaningful research questions that would be novel and could enhance understanding of mechanisms underlying psychological disturbances (Yes, I am a scientist and a psychologist).
Sounds like a philosophical statement. But it matches your question - very broad and philosophical too.
Regards, Witold
• asked a question related to Research Statistics
Question
I premise I'm not expert in statistics. I have two indipendent random variables, both Weibull distributed with known shape and location. How can I obtain the distribution (or an approximation) of the sum of the two random variables?Any references? Can MonteCarlo simulation be useful to obtain empirically the distribution of the sum?
There is possible to get a very accurate approach trasforming to taylor series ;).
• asked a question related to Research Statistics
Question
1) What is the nature of the relationship between EO and the level of revenues collected from SE strategies within the NPO?
2) What is the nature of the relationship between the level of current funding and the ability to meet the organizational mission?
3) What is the relationship between the level of revenues collected from SE strategies and the levels of revenues collected from fundraising?
4) What is the nature of the relationship between education and/or business experience to the level of EO within the organization
Dear Leigh,
that is not the same as repeating or confirming the other study. This a highly acceptable and good way to assure that the results are valid.
• asked a question related to Research Statistics
Question
Hello everyone! I am conducting a Social Life Cycle Assessment (S-LCA) on fresh white asparagus from Peru. The goal of the study is to identify and assess social impacts (positive or negative) along the life cycle of the product, using the S-LCA methodology proposed by UNEP (http://www.unep.org/pdf/DTIE_PDFS/DTIx1164xPA-guidelines_sLCA.pdf).
This methodology defines 5 stakeholder groups (workers, society, local community, suppliers and consumers), which have themes of interest or subcategories (i.e. The subcategories “Child labour” or “Fair salary” for the stakeholder “Workers”). In order to collect the data, indicators for each subcategory have to be designed. Currently I have initiated the data collection phase and I am having some trouble trying to define an acceptable number of elements to interview. I am only considering to interview elements from the stakeholders “Workers”, “Local Community” and “Suppliers”, being the sample frames: 250 (workers), 300,000 (inhabitants) and 20 (suppliers). Since I consider this to be a qualitative study, I think the best option is to use saturation for determining sample sizes. However, is this method enough in order to make inferences?
Thank you very much!
As explained by you  it seems as though you are doing a mixed method study .You either do a qualitative analysis ,look at emerging themes and then use these themes to do a quantitative analysis which would involve the process of sampling . Alternatively you can collect data quantitatively and do and analysis to see the correlations and then by interviews  validate the results of the analysis. It is advisable to determine methodology before sampling
• asked a question related to Research Statistics
Question
In multiple regression on SPSS or Mplus I have two predictors
1. Years in education
2. Years of working experience
Based on those two I am predicting annual income.
I have two coefficients for predictors (both significant). So I report that one of those two is a significant predictor of annual income while controlling for the other predictor. But what exactly does this mean - "CONTROLLING FOR".
In multiple regression analysis, suppose there are two independent variables, will get two partial coefficients and in interpretation of these coefficients, one will give the change in dependent variable for unit change in that independent variable and other variable is kept constant.
• asked a question related to Research Statistics
Question
One of the reviewers questioned the rationale of 3-way interaction in my article.
In the article I predicted that reporting one’s gender before some test completion and having an opposite gender test administrator would activate stereotype threat for women and that women would perform as well as men only in a condition when women would report gender after testing and they would be paired with a woman experimenter.
I run a 2 × 2 × 2 ANOVA with participant’s gender (man vs. woman), experimenter’s gender (man vs. woman), and location of the gender question (before the test vs. after the test) as between-participants factors.
Results of my study confirm my assumptions (significant 3-way interaction): men performed always better than women with only one exception, where the group of women who reported gender after testing and were paired with a woman experimenter outperformed the group of men assigned to the same condition. Among men, there were no statistically significant differences in scores across all conditions; whereas women achieved the best in the condition where they reported their gender after the test and were paired with women experimenters respect to all the others; no differences emerged between the other women groups).
I got also one significant main effect (participant’s gender: women achieved  lower scores than men) and one 2-way interaction (participant’s gender x location of the gender question: when gender was reported before testing women yielded lower scores than men, whereas they perform exactly as men when gender was reported after testing). 2-way interaction participant’s gender x experimenter’s gender was insignificant.
I thought that my prediction that women would perform as well as men only in one condition and that in three other conditions women will underperform men will fully justify 3-way interaction. However, for my reviewers this rationale isn't sufficient. I'm not sure how else could I justify this 3-way interaction or why my current rationale isn't enough.
I would be very grateful for any help!
Reviewers are concerned about fishing for results and then presenting as if you have not.
I think that complex interactions are entirely plausible if you explain what is going on - voting behavior is now affected by gender by age by class and not to model this in full is missing out on  important relationships.
This paper
has some arguments at the beginning for the importance of interactions - it also presents a model that automatically guards against over-interpretation as differential effects are shrunk back to no effect if they are based on unreliable estimates.
But we will probably have difficulties getting accepted too!
• asked a question related to Research Statistics
Question
i have performed placket butman experimental design. But  i am unable to calculate p value and confidance level. So can any one give me formula to calualte these values?
• asked a question related to Research Statistics
Question
Hello,
I need urgent help to calculate absolute values of standard deviations from surface & volume calculations of cylindrical shapes. The tricky part is to calculate according to the law of error propagation:
The 2 formulas are the following:
lenght of mantle m = root[(R-r)2 + h2]
mantle surface M = (R+r)* Pi* m
measurement values are: r=0,89 R=1,43 h=27,  as well as Dr=0,79 DR=0,08 Dh=0,5
Can somebody help me out with the exact formula for the standard deviation of the measures?
Thanks for help! Verena Hoelzer
You first calculate the partial derivatives of the two fromulas for each measured variable. Then you calculate the sum of the products of the errors (Dr, DR, and dh) with the squared corresponding partial derivative.
Example for the length of the mantle:
dm/dR = (R-r)/root(w)
dm/dr = -(R-r)/root(w)
dm/dh = h/root(w)
where w = (R-r)²+h². The squared derivatives are
(dm/dR)² = (R-r)²/w
(dm/dr)² = (R-r)²/w
(dm/dh)² = h²/w
The propagated error is
Dm = DR*(dm/dR)² + Dr*(dm/dr)² + Dh*(dm/dh)²
Dm = DR*((R-r)²/w) + Dr*((R-r)²/w) + Dh*(h²/w)
Dm = (DR+Dr)*((R-r)²/w) + Dh*(h²/w)
I hope I did not make some error somewhere.
You will surely be able to do the same for the simpler formula of the surface.
• asked a question related to Research Statistics
Question
I am investigating impacts of Climate change on water resource in Jordan using statistical downscaling models.How I can downscale and project data from CMIP5 or CORDEX. any related studies or experiments.
Dear Leonard,
Thanks for respond. Actually, I have the observed data from 1961 until 2010 and the projection should be until 2099.
may be Cordex is another option?
• asked a question related to Research Statistics
Question
I am also looking for code (Matlab, R, Python) to implement morphological filtering of time series.
Dear Jeff and Iryna, thank you very much for your suggestions!
Best
JP
• asked a question related to Research Statistics
Question
I want to associate patient perception of hospital quality with staff perception of hospital quality. Two different surveys were used for the 2 different population. The only common thing is that they were collected from the same hospitals.
I believe that its not possible to do correlation or regression ,but are there any other statistical methods?
It sounds like you asked different questions in each of your samples (so the difference is more than just 4-point versus 5-point scales). If so, there is no possibility of doing correlations between the two sets of questions because no one answered both sets.
If you are thinking in terms of writing an article, I would organize into what is sometimes known as a Study One and Study Two format, where you would present each set of results separately, and then compare them in the Discussion section.
• asked a question related to Research Statistics
Question
Because of the non-normality of the distribution of the items I'm using to measure a latent variable, I decided to use the ULS - Unweighted Least Squares method for a CFA.
AMOS doesn't produce the RMSEA index, but an editor pretends it to me.
I found an equation to calculate it based on Chi Square, DF and Numerosity of the sample (see the file in attachment).
My question is: is it correct or it doesn't work with ULS?
Any other solution?
Thank you
Diego
RMSEA is does not work with ULS estimator but you still can evaluate your model with GFI (Goodness of fit). This is because RMSEA and GFI is considered under absolute fit. Actually, there are three categories namely absolute fit (RMSEA, GFI), incremental fit (IFI, CFI, TLI, AGFI, NFI) and parsiminous fit (chisq/df). Thus, GFI is adequate for evaluate your model.
• asked a question related to Research Statistics
Question
For example, we have germinated several genotypes of Arabidopsis seeds. From that we get only single number (germination rate in %) per genotype. How can one determine, whether the difference is statistically significant? Shall we repeat several times it after some time and make some average from that? Or calculate the germination rate for each pot separately?
You can calculate the significance of the difference between two independent proportions.  If you need to do many tests look into a correction of the criterion p value to protect against Type I errors (e.g., Bonferroni).
Here is a worked example from the internet:
• asked a question related to Research Statistics
Question
Trend assumption: Linear deterministic trend
Series: KS NK SS BS
Lags interval (in first differences): 1 to 4
Unrestricted Cointegration Rank Test (Trace)
Hypothesized Trace 0.05
No. of CE(s) Eigenvalue Statistic Critical Value Prob.**
None * 0.187103 2258.186  47.85613     1.0000
At most 1 * 0.174294 1608.145 29.79707 1.0000
At most 2 * 0.156655 1007.168 15.49471 0.0001
At most 3 * 0.139790 472.5168 3.841466 0.0000
Trace test indicates 4 cointegrating eqn(s) at the 0.05 level
* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values
Hi,
If your data is stationary at level, you can not use JJ Cointegration. You should use simple regression analysis. or You may use stock prices data for JJ test.
regards
• asked a question related to Research Statistics
Question
Dear researchers
I have a random variable X that is following a bimodal distribution, when I squared each element in that variable X, I got a distribution shape similar to exponential or Chi-squared, Could you please advise me any reference or explanation on how to prove what's exactly the distribution in such case?
Best regards
If the bimodal distribution is a mix of two normal distributions, then the squared bimodal is a mix of two gamma distributions.
Xbimodal = i * Xa + (1-i)*Xb
with i ~ Bernoulli(p), Xa ~ N(µa,s²a), Xb ~ N(µb,s²b)
(Xbimodal)² = i * (Xa)² + (1-i)*(Xb
=>
(X)² ~ Gamma( shape=µ²*rate + 1, rate=exp(-s)/2 )
• asked a question related to Research Statistics
Question
Any good suggested links, books etc. for handling interactions for repeated measures data (mixed model: 3 time points, continuous outcomes, categorical and continuous  predictors). Working in STATA - xtmixed.
Hi Bonnie,
the two best books on longitudinal data in a mixed model (using Stata) context are:
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford university press.
Rabe-Hesketh, Sophia, and Anders Skrondal. Multilevel and longitudinal modeling using Stata. Stata press, 2012.
You will also find that the help file in Stata is extremely helpful.
I hope this helps
Ariel
• asked a question related to Research Statistics
Question
I have a dataset of quantitative variables, and I want to study their relationship, considering one of them as dependent, and the others as independent. An examination of scatter plots, I can see that this relationship is not linear. Which approach can I use for my analysis?
Maria,
first, do not dichotomize any of your variables. There are so many problems resulting from dichtomization, that you could not trust your results - either a linear relationship or a nonlinear relationship.
This article is state-of-the-art and will explain the problems meticulously: MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19 40.
Second, 13 independent variables are too many for multiple regression. If these variables are correlated, you would have a multicollinearity problem. This means, that parameter estimates may be quite small and standard errors may be quite large. Therefore it would be better to reduce the number of independent variables. You could either perform an exploratory factor analysis and use the factor scores as independent variables (or a PCA, if you do not expect latent constructs to be the origin of the correlations), or you could reduce the number of variables based on theory.
Third, check your independent variables whether they are censored above or below  or terribly skewed - this would introduce artificial nonlinearity. However, your dependent variable should be nonnormally distributed if a nonlinear effect exists in the population. This article (which also explains problems of moderated and quadratic regression) may be helpful: Dimitruk, P., Schermelleh-Engel, K., Kelava, A. & Moosbrugger, H. (2007). Challenges in nonlinear structural equation modeling. Methodology, 3, 100-114.
Then, include quadratic terms (products of single variables) into the regression equation as suggested by David and - in the next step - moderator terms (products of two independent variables).
HTH
Karin
• asked a question related to Research Statistics
Question
Some say if it is stochastic use first difference to detrend data, but if it is deterministic use anomaly as detrended data.
What does mean deterministic and stochastic ?
Does it mean trend is stochastic or deterministic?
I want to detrend some climate data and production factors to detect impact of climate change on crop productivity.
There is a nice Wikipedia article about stochastic drift: Go to http://en.wikipedia.org/wiki/Stochastic_drift and scroll down to the section titled "Stochastic drift in economics and finance".
The first model, yt = f(t) + et, (notation: the t in yt and et should be a subscript) is the deterministic trend model. Your time series (in your case, climate data and productivity data) is then a function of some deterministic function of time, plus a stationary error. "Deterministic" means that it has no random component. One example of such a function is a linear time trend, another example is a quadratic time trend. If you plot your time series data, in the case of a linear deterministic trend, you would see that the Y values keep increasing (or decreasing, for a negative trend) at a constant rate (in other words, if you draw a straingt line that trends up, the yt observations would randomly jump up and down from that line).  For the linear trend, you need to create a column of data with the values t = 1, 2, 3, 4... representing the time index, and fit a regression model where your target variable (climate or productivity variable) is the Y variable, and the time index t is the X variable. Save the residuals. These will be the "detrended" version of your Y variable. For the quadratic trend, you would have to also create a variable-squared, with values t2 = 1, 4, 9, 16, 25, ... Again fit a regression model with t and t2 as the two predictors and save the residuals to get the de-trended version of your Y.
The second model, yt = yt-1 + c + et, is a random walk with drift (notation: t in yt and et and t-1 in yt-1 are subscripts). This model is first-order autoregressive. Each observation at time t is a function of the observation at time t-1. If you bring these two components to the left-hand side of the equation, you get yt - yt-1 = c + et. The left-hand side is the first difference. The right-hand side is a constant and a stationary error et. If you plot your time series data, in the case of the random walk with drift, You would observe a long-term increase (or decrease for negative drift), but, in the short run, you would see the data to go up, up, up, also randomly fluctuate up and down a little, but keep going up, up, up, for a few periods, maybe 6 or 10 periods, then start going down, down, down, again randomly fluctuate up and down a little, then down, down, for a few more periods, then again up, up, up, etc. So, within 1 or 2 observations you would see random up and down fluctuation, within 6-10 observations you would see a consistent upward or downward local trend, and within 50-100 observations you would see a systematic drift, either upward or downward. Based on this model, you can de-ternd your data by simply computing the first differences! Create a new column of data where zt = yt - yt-1. Then zt (day-to-day differences, or month-to-month differences, depending on your periodicity) would be a random error with mean equal to c (i.e., random fluctuation from a horizontal line placed at level c).
So, which one of these models should you use? The bst way is to (1) observe the time series plot for the patterns I explained above, and (2) produce  autocorrelation function and partial autocorrelation function plots. In Minitab, select Stat > Time Series > Autocorrelation, and Stat > Time Series > Partial autocorrelation. In SPSS, select Analyze > Forecasting > Autocorrelations, which will produce both ACF and PACF plots. If all bars in BOTH plots are insignificant (i.e., within the threshold lines), then your data should be modeled using the deterministic trend model. If the PACF plot has one significant bar at lag 1, with all the other bars insignificant, and the ACF plot starts significant at lag 1 and decreases very slowly, staying significant for a few lags, then the stochastic  trend model (random walk with drift, or AR(1) ) is appropriate. If you see a significant bar at lag 1 on the ACF plot, with all other bars insignificant, and significant bars at lags 1, 2, 3, etc. that go down slowly on the PACF plot, then you have a different model, called MA(1), or moving average.
In SAS, use proc ARIMA with the statement identify var = y(1), which produces the first differences, and see if the first differences pass the autocorrelation test (producing high p-values on the chi-square test for zero autocorrelation) and also produce ACF and PACF plots that only have non-significant bars at lags greater than 0.
Good luck!
• asked a question related to Research Statistics
Question
Hello,
I am trying to analyses my questionnaire. It includes three components; students' experience which have 15 items, students' feeling which have 10 items, students' background which have 10 items. All these items measuring by 4 likert scale. My independent variables are; gender and students status.
I want to run Anova with my data set but I couldn’t figure out how to treat lukert scale in SPSS, and do I have to consider all these items as dependent variables?
Thank you
Tameem
I agree with Helena that you should start with using Exploratory Factor Analysis on your multi-item scales to determine if they are Uni-dimensionsal (i.e., they each measure one and only one thing). If that works, then use Cronbach's alpha to assess the reliability of the scales.
There have been so many questions on RG about using these two methods that I have compiled a set of resources on this topic:
• asked a question related to Research Statistics
Question
Uncertainty in horological modelling  is quantified by MCS. MCS results are generally presented in the form of CDF and histograms. How histograms/CDF/Quantiles reflect uncertainty?
Hi Rajesh,
I am not sure that histograms can provide you with anything other than visual displays of your data. More importantly, the binning process based on kernel densities are usually automated, so that you cannot be certain that the visual display will always look the same, even after altering the binning procedure. Again, histograms are good visual depictions of the data, but they do not provide you with the sort of details you are looking for. You may need to examine the distribution of the data using standard statistical measures such as percentiles, etc.
Ariel
• asked a question related to Research Statistics
Question
Increased value of invertebrates - Mediating variable
In the following article, there are multi methods used to test mediation. hope it help
Dardas, L., Ahmad, M. (2015). For Fathers Raising Children with Autism, Do Coping Strategies Mediate or Moderate the Relationship between Parenting Stress and Quality of Life? Research in Developmental Disabilities. 36, 620-629. http://dx.doi.org/10.1016/j.ridd.2014.10.047
• asked a question related to Research Statistics
Question
There is a large amount of literature in financial statistics on sums of a continuous random variable where the number of occurrences in the sum also is a random variable. A typical application in finance involves modeling the sum of financial losses per quarter in a bank. Both the size of the loss and the number of losses are random variables, so the cumulative distribution of the sum involves an infinite sum. Standard methods for approximating this distribution include recursion (e.g., Panjer 1981) and simulation (e.g., a Bayesian approach with simulation from the joint posterior distribution). I'm interested in applying this kind of model to certain psychological research problems, but I'm wondering whether anyone has come across such applications in psychology.
I eventually answered this question myself:
Smithson, M. & Shou, Y. (2014). Randomly stopped sums: Models and psychological applications. Frontiers in Psychology: Quantitative Psychology and Measurement, 1-11, doi: 10.3389/fpsyg.2014.01279.
• asked a question related to Research Statistics
Question
If I want to estimate the uncertainties of the data (theoretical with experimental). Which is the best way to find the uncertainties of those data?
This paper is an example in serology, but I think that the principle is just the same in your case (observed and expected values):
Hope that will be useful
Best regards
• asked a question related to Research Statistics
Question
I have r-project with meta and metafor packages.
I know how to perform meta-analysis of mean differences but I need to report also the meta-analyzed means (and sd) of the two comparison groups.
Doeas anyone know how to do this?
Have I understood correctly? You are being asked to present the "pooled mean" of each of the two groups, and the "pooled SD" of each of the two groups?
Why do you need to report that? I guess you can do it in the same way that you can get pooled estimates of anything, take the mean from each study, and the standard error (either from the SD and n, or from the confidence interval) and conduct meta-analysis on that. The "pooled SD" could possibly be derived from the within-study variance.
So I think it's easy enough to do, but I'm just not convinced you really need to do it. The mean and SD could be more prone to differences between the study populations chosen, and it's the difference between the means for each study that you need to look at and pool.
I'd be interested in other people's views. Maybe I'm missing something.
• asked a question related to Research Statistics
Question
I am studying seasonal changes in abundance of a fish species along a disturbance gradient. I sampled three locations at four seasons. My sampling sites at each location were very heterogeneous and the overall data was overdispersed . I am planning to analyze data using a GLMM with zero inflated model, considering LOCATION as a fixed factor and sampling site as a random factor. Should I also consider SEASON as a random factor (due to probable autocorrelation) or just nest it within LOCATION?
Thank you. I am actually interested in differences among seasons but the data are highly correlated across seasons and I am not sure about using season as a fixed effect.
• asked a question related to Research Statistics
Question
Dear researchers,
This is a silly question but I have got to ask anyway.
How would you parameterise an asymmetric peak? Would it be meaningless if I were to fit it with multiple Gaussians? Any suggestions on how I should extract my data?
I have no experience
Thank you.
Ah, in that case, the integrated values are probably all you need.  Fitted Gaussians will not really have individual significance.  You can also get the average bond length from the integrations - it is the first moment of the distribution.  Simply divide integral of rT(r) by integral of T(r).
• asked a question related to Research Statistics
Question
Propensity matching score technique and its application in research methodology is still unknown for many researchers.
Propensity score (PS) matching (or other PS techniques such weighting, or subclassification) are part of a greater family of causal inferential techniques that try to emulate the randomized controlled trial (RCT) by ensuring that participants and controls are comparable on baseline characteristics. The difference is that RCTs randomize participants and thus groups are assumed to be comparable on both observed and unobserved characteristics. When using PS techniques, participants can only be matched to controls on observed characteristics and we assume that there is no residual confounding.
I am attaching links to several of my papers on the topic. They provide very specific examples of how these approaches are applied in different contexts.
I hope this helps
Ariel
• asked a question related to Research Statistics
Question
I am using proc glimmix in SAS to fit a multilevel model for a multinomial outcome with unordered response categories. Proc glimmix requires that you specify the group= option in the random statement to obtain random effects for each outcome group. However, I suspect this prevents that correlations between random effects of different outcome groups are computed, even if the type=un option is specified.
Is there any way within the glimmix procedure to circumvent this issue? Thanks in advance!
Kumar
It is perfectly possible to have correlated random effects. The term Random Variable simply means 'allowed to vary' - that is variables can take on different values (according to a distribution) and does not necessarily mean 'independent'.
• asked a question related to Research Statistics
Question
Have been given the former for a client in a psych review in a RAVLT table.
While you cannot get THE mean and SD, you can convert to a z-score (mean = 0 and SD = 1).  From there, you can convert it to any scale you wish.
• asked a question related to Research Statistics
Question
I want to compare effect of three different treatments on animal weight for 14 weeks. What statistical test I can use to compare the effect of treatment with each other.
I am not quite sure if you want to test the effect of weeks on animal weight. If you only want to test the difference between treatment a repeated measures ANOVA would be more appropriate, including time as the error term. And make sure to do pretests for normality of data and especially homoscedastity!
For further information this site could help you a lot i guess:
With the results you can argue about the treatment effect on animal weight, if that is your research interest.
Hope that helped a bit? Feel free to ask further questions!
• asked a question related to Research Statistics
Question
..even if our design is ex-post facto, non-experimental?
I have observed in many papers where researchers make recommendations of intervention (e.g. increase salary of employees to improve their performance) on the basis of strong positive relationships arising from non-experiments.
Probably an example helps:
The fact that one celebrates his 70s birthday is perfectly correlated with not dying in a car accident at the age of 20.
Obviousely, neither the 70s birthday-party causes to have no accident, nor does the absence of a deadly accident at an age of 20 cause the birthday party.
This example shows that
a) one needs a reasonable causative MODEL to think about a possible cuasation at all,
b) the time of events and the direction of the possible causation must be considered,
c) other possible confounders must be (logically or -better- experimentally) excluded.
Simple correlational studies often lack all three of these requirements. Often there is no concrete causative model proposed, further is is impossible to see in what time-order the events happen, and there are usually more than a million possible confounders.
A clear conclusion about causation can only be derived from a planned experimental study where the suspected cause is controlled by the experimenter, everything else is kept constant, and the experiment is designed to read the response in time after the suspected cause could operate (according to a well described model of how this may happen - the causative model).
• asked a question related to Research Statistics
Question
I am trying to find an affordable minimum number of questionnaires that I need to be filled by tenants of social housing buildings in order to get reliable and representative results. The questionnaire is about tenants perception and their degree of interest of being part of the renovation project.
Giulia,
1) What's acceptable/published in your field. In the large-scale survey world, some people think of sample sizes of less than 1000 as "small," but much (maybe most) of the academic research out there is done with smaller samples.
2) Sampling method aside (and whether you have good population coverage, response rate, etc), sample size specifically affects variance, standard errors, p-values, and confidence intervals (as all those things depend on n). The only way to get a concrete answer to your question is to plan your analysis and decide on what effect size you're trying to obtain, and under what  criteria (e.g., alpha level). Once you do that, you can do a power analysis to see what sample size you should have to get that effect size under those assumptions.
3) Another similar approach that will give you a specific number is to plan the sample size around the confidence intervals that you want (i.e., how wide do you want them to be). Like power analysis, you put in your assumed point estimate and standard error of the estimate, and you can see how different n will affect your CI width (or, logically, you could define a CI width and get a different n). This is like power analysis but doesn't actually choose the sample size based on power of a statistical test, just confidence intervals.
There are a lot of other things to consider, like others have mentioned, such as whether your sample is truly random, the target population you are measuring, the sampling frame you are using and its properties, but these really apply to population surveys only. However, on a practical note, if you're doing any kind of design (even a convenience sample) where you contact people or send them something to respond to, make sure you take into account your response rate (i.e,. you won't get 100% of the people you contact or request to participate), so your calculated n based on a power analysis needs to be inflated by that factor.
Here's a good set of references from AAPOR on sampling, response rates, and survey best practices http://www.aapor.org/AAPORKentico/Education-Resources/For-Researchers.aspx
• asked a question related to Research Statistics
Question
I am Professor Emeritus of Psychiatry I have interest in studying why some people are very interested in tracking down biological parents or other family members that they have not known and others who could do so and are not interested.
I have conceptualized a survey research project to examine the variables.I would like to collaborate with a research statistician who could work with me to design the form to gather the data and analyze it. I don't anticipate receiving a grant for this research but whomever I work with would be a second or third author ( if we had two additional researchers ) I would hope we could publish the results and present at appropriate meetings. I believe the results could be interesting and valuable. There is no funding - Contact me if you are interested and we could talk further
Michael Blumenfield, M.D.

Hello Michael
The topic sounds interesting. I am an Assistant Professor at Statistics Department, University of Benghazi, Libya and am interested in helping you if I could.
Best Regards,
Tel: 00218924392190
• asked a question related to Research Statistics
Question
My questionnaire composed of 49 items. none of them are normally distributed. Is it correct to do a CFA or EFA with non- normal data?
If you use SPSS, you need to do only KMO (test for sampling adequacy) & Bartlett Test for sphericity (to ensure moderate inter-correlations). Normality test can be over looked
• asked a question related to Research Statistics
Question
I found that item-trait interaction statistics were bonferroni adjusted in several recent Rasch studies. This would influence my fit analysis considerably. Is there any reason to adjust and is this an accepted procedure?