Science topic
Analytical Statistics - Science topic
Explore the latest questions and answers in Analytical Statistics, and find Analytical Statistics experts.
Questions related to Analytical Statistics
Hi I have 4 samples belonging to 2 groups, so 2 replicates for each group. I am using edgeR to do differential analysis, and just find that many variables which hold not small logFC values are not significantly changed (FDR > 0.05). But I checked these variables, they actually are suitable to be defined as differential.
I think the problem is that the variance between samples are large due to I only have 2 replicates, so it is hard to pass the significance test in edgeR.
How I lower the sample variance properly? Which math and statistic methods should I use?
Thanks for your attention!
I have a dummy varible as the possible mediatior of a relationship in my model. By reading the Baron and Kenny's (1986) steps, I see that, in the second one you have to test the relationship between the indepentend variable and the mediator, using the last one as a dependent variable. However, normally you won't use an OLS when you have a dummy as a dependent variable. Should I use a Probit in this case?
𝑃 ( 𝑋 ≥ 𝑘 − 1 ) for X∼Binomial(n−1,p).
P(X≥k) for X∼Binomial(n,p)

What are the processes to extract ASI microdata with STATA and SPSS?
I have microdata in STATA and SPSS formats. I want to know about the process. Is there any tutorial on youtube for ASI microdata?
Hi everyone,
Does anyone have a detailed SPSS (v. 29) guide on how to conduct Generalised Linear Mixed Models?
Thanks in advance!
"Dear Researchers,
In the context of the Brief COPE Inventory (B-COPE), which comprises 14 scales (each with two items) with responses ranging from 1 (“I haven't done this at all”) to 4 (“I have done this a lot”), I noticed that for statistical analysis, the questionnaire is often divided into adaptive and maladaptive coping subscales. The adaptive coping subscale is derived from a cumulative score of 8 scales, while the maladaptive coping subscale is based on the remaining 6 scales.
My question pertains to the methodological implications of this division: How do you ensure a fair and balanced comparison between adaptive and maladaptive coping strategies when they are based on an unequal number of scales? Specifically, I am interested in understanding the statistical rationale behind this approach and how it might influence the interpretation of a participant's coping strategies as more adaptive or maladaptive. Additionally, are there any considerations or adjustments made during the analysis to account for the discrepancy in the number of scales between these two subscales?
Thank you
Short Course: Statistics, Calibration Strategies and Data Processing for Analytical Measurements
Pittcon 2024, San Diego, CA, USA (Feb 24-28, 2024)
Time: Saturday, February 24, 2024, 8:30 AM to 5:00 PM (Full day course)
Short Course: SC-2561
Presenter: Dr. Nimal De Silva, Faculty Scientist, Geochemistry Laboratories, University of Ottawa, Ontario, Canada K1N 6N5
Email: ndesilva@uottawa.ca
Abstract:
Over the past few decades, instrumental analysis has come a long way in terms of sensitivity, efficiency, automation, and the use of sophisticated software for instrument control and data acquisition and processing. However, the full potential of such sophistication can only be realized with the user’s understanding of the fundamentals of method optimization, statistical concepts, calibration strategies and data processing, to tailor them to the specific analytical needs without blindly accepting what the instrument can provide. The objective of this course is to provide the necessary knowledge to strategically exploit the full potential of such capabilities and commonly available spreadsheet software. Topics to be covered include Analytical Statistics, Propagation of Errors, Signal Noise, Uncertainty and Dynamic Range, Linear and Non-linear Calibration, Weighted versus Un-Weighted Regression, Optimum Selection of Calibration Range and Standard Intervals, Gravimetric versus Volumetric Standards and their Preparation, Matrix effects, Signal Drift, Standard Addition, Internal Standards, Drift Correction, Matrix Matching, Selection from multiple responses, Use and Misuse of Dynamic Range, Evaluation and Visualization of Calibrations and Data from Large Data Sets of Multiple Analytes using EXCEL, etc. Although the demonstration data sets will be primarily selected from ICPES/MS and Chromatographic measurements, the concepts discussed will be applicable to any analytical technique, and scientific measurements in general.
Learning Objectives:
After this course, you will be familiar with:
- Statistical concepts, and errors relevant to analytical measurements and calibration.
- Pros and cons of different calibration strategies.
- Optimum selection of calibration type, standards, intervals, and accurate preparation of standards.
- Interferences, and various remedies.
- Efficient use of spreadsheets for post-processing of data, refining, evaluation, and validation.
Access to a personal laptop for the participants during the course would be helpful, although internet access during the course is not necessary. However, some sample- and work-out spreadsheets, and course material need to be distributed (emailed) to the participants day before the course.
Target Audience: Analytical Technicians, Chemists, Scientists, Laboratory Managers, Students
Register for Pittcon: https://pittcon.org/register
We measured three aspects (i.e. variables) of self-regulation. We have 2 groups and our sample size is ~30 in each group. We anticipate that three variables will each contribute unique variance to a self-regulation composite. How do we compare if there are group differences in the structure/weighting of the composite? What analysis should be conducted?
I have 3 papers suitable for inclusion in my systematic review looking at high versus low platelet to red call ratio in TBI, and want advice as to whether I can combine their estimates of effect in a meta-analysis.
One RCT which provides an unadjusted odds ratio and adjusted odds ratio of 28-day mortality for two groups (one intervention (high ratio) and one control (low ratio), adjusted for differences in baseline characteristics).
One retrospective cohort study which provides absolute unadjusted 28-day mortality data for two groups (one exposed to high ratio, and another exposed to a low ratio). They have also performed a sophisticated propensity analysis to adjust for the few differences between the groups and multivariate cox regression to adjust for factors associated with mortality, and presented hazard ratios.
Finally, a post-hoc analysis of a RCT, which compares outcomes for participants grouped according to presence/absence of haemorrhagic shock (HS) and TBI. This generates 4 groups - neither HS nor TBI, HS only, TBI only and TBI + HS. I am interested in the latter two as they included patients with TBI. One group was exposed to a high ratio, whereas the other a lower ratio. The authors provided unadjusted mortality data for all groups, and they adjust for differences in admission characteristics, to generate odds ratio of 28-day mortality. However, they present these adjusted odds ratios of death at 28days for the HS only, TBI only and TBI + HS groups compared to the neither TBI nor HS group, not to each other.
I could analyse unadjusted mortality in a meta-analysis, but want to know if I can combine all or some of the adjusted outcome measures, I have described instead? Any help greatly appreciated.
Hi everyone,
I need to convert standard error (SE) into standard deviation (SD). The formula for that is
SE times the square root of the sample size
By 'sample size', does it mean the total sample size or sample sizes of individual groups? For example, the intervention group has 40 participants while the control group has 39 (so the total sample size is 79) So, when calculating SD for the intervention group, do I use 40 as the sample size or 79?
Thank you!
Hi,
There is an article that I want to know which statistical method has been used, regression or Pearson correlation.
However, they don't say which one. They show the correlation coefficient and standard error.
Based on these two parameters, can I know if they use regression or Pearson correlation?
Hello everyone,
I am currently doing research on the impact of online reviews on consumer behavior. Unfortunately, statistics are not my strong point, and I have to test three hypotheses.
The hypotheses are as follows: H1: There is a connection between the level of reading online reviews and the formation of impulsive buying behavior in women.
H2: There is a relationship between the age of the respondents and susceptibility to the influence of online reviews when making a purchase decision.
H3: There is a relationship between respondents' income level and attitudes that online reviews strengthen the desire to buy.
Questions related to age, level of income and level of reading online reviews were set as ranks (e.g. 18-25 years; 26-35 years...; 1000-2000 Eur; 2001-3000 Eur; every day; once a week; once a month etc.), and the questions measuring attitudes and impulsive behavior were formed in the form of a Likert scale.
What statistical method should be used to test these hypotheses?
Merry Christmas everyone!
I used the Interpersonal Reactivity Index (IRI) subscales Empathic Concern (EC), Perspective Taking (PT) and Personal Distress (PD) in my study (N = 900) When I calculated Cronbach's alpha for each subscale, I got .71 for EC, .69 for PT and .39 for PD. The value for PD is very low. The analysis indicated that if I deleted one item, the alpha would increase to .53 which is still low but better than .39. However, as my study does not focus mainly on the psychometric properties of the IRI, what kind of arguments can I make to say the results are still valid? I did say findings (for the PD) should be taken with caution but what else can I say?
I am measuring two continuous variables over time in four groups. Firstly, I want to determine if the two variables correlate in each group. I then want to determine if there is significant differences in these correlations between groups.
For context, one variable is weight, and one is a behaviour score. The groups are receiving various treatment and I want to test if weight change influences the behaviour score differently in each group.
I have found the r package rmcorr (Bakdash & Marusich, 2017) to calculate correlation coefficients for each group, but am struggling to determine how to correctly compare correlations between more than two groups. The package diffcorr allows comparing between two groups only.
I came across this article describing a different method in SPSS:
However, I don't have access to SPSS so am wondering if anyone has any suggestions on how to do this analysis in r (or even Graphpad Prism).
Or I could the diffcorr package to calculate differences for each combination of groups, but then would I need to apply a multiple comparison correction?
Alternatively, Mohr & Marcon 2005 describe a different method using spearman correlation that seems like it might be more relevant, however I wonder why their method doesn’t seem to have been used by other researches? It also looks difficult to implement so I’m unsure if it’s the right choice.
Any advice would be much appreciated!
Hello, I currently have a set of categorical variables, coded as Variable A,B,C,etc... (Yes = 1, No = 0). I would like to create a new variable called severity. To create severity, I know I'll need to create a coding scheme like so:
if Variable A = 1 and all other variables = 0, then severity = 1.
if Variable B = 1 and all other variables = 0, then severity = 2.
So on, and so forth, until I have five categories for severity.
How would you suggest I write a syntax in SPSS for something like this?
Presently I am handling a highly positively skewed geochemical dataset. After several attempts, I have prepared a 3 parameter lognormal distribution (using natural log and additive constant c). The descriptive statistic parameters obtained are log-transformed mean (α) and standard deviation (β). The subsequent back-transformed mean and standard deviation (BTmean and BTsd) are based on the formula
BTmean = e ^ (α + (β^2/2)) - c
BTsd = Sqrt [(BTmean)^2) (e ^(β^2)-1)] - c
However, someone suggests to use Lagrange Multiplier. I am not sure about the
1) Equation using the Lagrange Multiplier
2) How to derive the value of Lagrange multiplier in my case.
Kindly advise.
Regards
Hi everyone,
I am struggling a bit with data analysis.
If I have 2 separate groups, A and B.
And each group has 3 repeats A1,A2,A3 and B1,B2 B3 for 10 time points.
How would I determine statistical significance between the 2 groups?
If I then added a third group, C, with 3 repeats C1,C2,C3 for each timepoint.
What statistical analysis would I use then?
Thanks in advance
I am hoping to see if there is a statistically significant difference between the number of trauma patients receiving a certain procedure between two time frames, but am unsure on what test I should be using.
Time frame 1: 474 trauma patients admitted, 7 received the procedure
Time frame 2: 365 trauma patients admitted, 9 received the procedure
I would be grateful for any advise and can provide more information as needed.
Many thanks!
In Hayes model 58, the moderation variable is input to both the path of 1.independent variable->mediation variable and 2.mediation variable->dependent variable path. At this time, when the moderation effect in the path of 1 is rejected and only the moderation effect in the path of 2 is adopted, how to interpret the moderated-mediation effect?
Even in the above case, the effect difference of -1SD, Mean, and +1SD is presented. At this time, if the bootstrapping LLCI and ULCI values of all effects do not include 0, can this result be interpreted as having a moderated-mediation effect?
We look forward to hearing from our seniors.
I need suggestions for groundwater assessment-related articles used discriminant analysis in their analysis and study, as well as how to apply this analysis in R programming.
Reghais.A
Thanks
I'm doing a study based on compare two orbital sensors data, and on the study i'm basing on there is this normalization formula for the rasters: ((Bi<= 0) * 0) + ((Bi >= 10000) *1) + ((Bi >= 0) & (Bi < 10000)) * Float((Bi)/10000), Where "Bi" means "band". Is there someone who understad e could explain this formula? Thanks very much.
I have six kinds of compounds which I then tested for antioxidant activity using the DDPH assay and also anticancer activity on five types of cell lines, so I got two types of data groups:
1. Antioxidant activity data
2. Anticancer activity (5 types of cancer cell line)
Each data consisted of 3 replications. Which correlation test is the most appropriate to determine whether there is a relationship between the two activities?
I want to know the list for the residuals (error) tests for time series models? also the list of Stationarity test for time series data?
I am looking at gender equality in sports media. I have collected two screen time measures from TV coverage of a sport event - one time for male athletes and one time for female athletes.
i am looking for a statistical test to give evidence that one gender is favoured. I assume I have to compare each genders time against the EXPECTED time given a 50/50 split (so male time + female time / 2), as this would be the time if no gender was favoured.
my first though was chi square? But I’m not sure that works because there’s really only one category. I am pregnant and so my brain is not working at the moment lol. I think the answer is really simple but I just can’t think of anything.
If I want the annual average of the country production of oil for 2019 and I have 25 stations,
1- should I take the sum ( of 12 months) for each station individually so I get the annual sum for each station and then divide by 25 to calculate country annual
2- or I take the sum of January for the 25 stations and then February .... etc. and then divide by 12 which is number of months to get the annual average of the country
Hi, I have data from various patients, each patient has 7 values corresponding to different time points and I would like to do an average of all the patients on GraphPad to create an XY plot that respresent the average of all patients for each X value but I don't know how to do it. I have all the datasets and the graph from each patient alone in separate GraphPad documents. Is there a way to do an average of all of them on GraphPad that isn't arranging the values manually?
Thank you in advance
Hello everyone,
I need help understanding whether my two groups are paired or not.
I am collecting data from one group of cells. We have developed two different workflows (let's call them A, and B) for data analysis. We want to test whether these two workflows give the 'same' results for the same set of cells.
At the end, I obtain:
- Group 1 (contains the variable obtained with workflow A)
- Group 2 (contains the variable obtained with workflow B).
I have been considering the two groups as independent because the two workflows do not interfere with each other. However, the fact that both workflows operate on the same cells is throwing me off and I am wondering if these groups are actually paired.
Could you advise me on this and on what test is best to use?
The hypothesis for the test would be:
- the distributions of the variable is the same with both workflow A and B; and/or
- the median of the distribution from workflow A equals the one from workflow B
Thank you.
GN
Hi, I have this model with many parameters (variables) and I wonder if there is a statistical method to determine how big the influence of each variable is. Anyone has an idea? Thanks.
Is it possible to determine a regression equation using SmartPLS 2?
Howdy.
I am currently working with a set of samples, divided in 3 consecutive phases (8 samples for phase 1, 9 samples for phase 2 and 6 samples for phase 3). My data are homoscedastic and not normally distributed. What test does SPSS (version 21) employ to analyse the pairwise comparisons after the Kruskal-Wallis’ test? Are the values I get (from the post hoc testing) the result of a Dunn’s test? If that is so, how should I report them on my abstract? Something like “Subsequent pairwise comparisons with the Dunn’s test showed a significant increase between phase 1 and phase 2 (p < 0.05)” or should I take into account even the value in the first column (the one labelled as “test statistic”, which I highlighted in red in the attached image)?
Is it correct to use this kind of post hoc testing for my data or should I employ some other kind of test (Behrens-Fisher test or Steel Test – Nonparametric Multiple Comparisons), since I got a different number of samples for each phase?
Thank you

1. Is randomizing subjects to one of the four groups a must?
2. What statistical analysis can be used? Could these tests be affected if samples are not randomized? Which statistics is preferable?
As per the attachment: There are three sets of students. Each set is evaluated by individual judges. Hence the marks are very varied in the three sets. How to give rational marks to all? I wish to make the highest and lowest marks of all three sets equal, and the other marks follow. Is this possible?
Hello, There is a dataset with several KPIs which are varying between (0,1). What is the best analytical approach to split the data and define a line in two dimensional (or define a plane in multi-dimensional space) based on data behavior and practical assumptions/considerations (there is some recommended ranges for each KPIs etc.)?
For instance in the attached screenshot, I want to flag the individuals/observations in Ae area for more investigation. I want to be able to apply the proposed approach in multi-dimensional space with several KPIs as well. Any thoughts would be appreciated.
please I am working on a pre and post-survey on which I found the results shown in the attachments (Test of normality and paired sample T-test) can you help me write the interpretations of these results to add them to my scientific article.
Thank you very much in advance.


Hello, In one of the projects, I conducted a questionnaire for the skills of students before the project (PRE survey), and after the completion of the project, I conducted a post-project survey.
I calculated the results and processes of the questionnaire (the percentage of the level of each skill increase)
But I have no experience in making an interpretation of these results.
If you can help me or provide me with publications in this area.
Thank you.
I want to do a descriptive analysis using the World Values Survey dataset which has an N=1200. However, even thought I have searched a lot, I haven't found the methodology or a tool to calculate the sample size I need to get meaningful comparisons when I cross variables. For example, I want to know how many observations do I need in every category if I want to compare the social position attributed to the elderly over sex AND over ethnic group. That is (exemplying even more), the difference between the black vs indigenous women in my variable of interest. What if I have 150 observations in black women? Is that enough? How to set the threshold?
Expressing my gratitude in advance,
Santiago.
When should household panel data be weighted to increase the representativeness of the data? Does this depend on the number of reporting households?
I am researching gender bias in sport media have done a survey which involved 8 sets of 4 images of athletes (4 male and 4 female sets), each being followed up with 3 questions. Participants had to select which image they thought best fit the 3 questions. (So i ended up with 8 answers to each of the questions)
I'm struggling with figuring out how to analyse my data? I need to keep my data in terms of 'amount of times this image was chosen', so i need it to be in whole number (image 1,2,3,4) but everything I try gives me the mean answer from 1-4 across ALL the images for the question.
Questions I am trying to answer are:
Was a certain image chosen more often in the female athlete sets than the male athlete sets (and vice versa)?
Did male/female participants differ from eachother in their responses? (was one gender more likely to select one type of image compared to the other gender)
Happy to answer follow up questions. I feel like the answer is simple but I havent done stat analysis in ages and I just cant think of anything.
- Hello. I am struggling with the problem. I can measure two ratios of three independent normal random values with their means and variances (not zero): Z1=V1/V0, Z2=V2/V0, V0~N(m0,s0), V1~N(m1,s1), V2~N(m2,s2). These are measurements of the speeds of the vehicle. Now I should estimate the means and the variances of these rations. We can see it is Cauchy distribution with no mean and variance. But it has analogs in the form of location and scale. Are there mathematical relations between mean and location, variance and scale? Can we approximate Cauchy by Normal? I have heard if we limit the estimated value we can obtain the mean and variance.
I have purchase data from supermarkets that include a variable for weighting. This weighting is supposed to represent how representative a household is.
I want to separate my data into two very unequal groups and aggregate the values into months. Can or should I use the weighting here?
Hello,
I am performing statistical analysis of my research data by comparing the mean values by using Tukey HSD test. I got homogeneous group in both small and capital alphabets. This is because of large number of treatments in my study. Is this type of homogeneous group is acceptable for publication in any journal?
Just getting a gauge from various sides of the community regarding which statistical analysis method is underrated.
Thank you.
Hi everyone.
I have a question about finding a cost function for a problem. I will ask the question in a simplified form first, then I will ask the main question. I'll be grateful if you could possibly help me with finding the answer to any or both of the questions.
1- What methods are there for finding the optimal weights for a cost function?
2- Suppose you want to find the optimal weights for a problem that you can't measure the output (e.g., death). In other words, you know the contributing factors to death but you don't know the weights and you don't know the output because you can't really test or simulate death. How can we find the optimal (or sub-optimal) weights of that cost function?
I know it's a strange question, but it has so many applications if you think of it.
Best wishes
Hello,
I replicated a study in which participants are asked to rate the importance of some user experience dimensions (like efficiency, usefulness, perspicuity, etc.) for a product (0-7 rating).
I added some new dimensions like ease of use. How can I statistically determine whether the added dimension is measuring a new dimension? It is obvious that the new dimension have strong correlation with other pragmatic dimensions. My question is how to show the new dimension is worth being included and is different from the current ones.
P.S. There is no list of items per dimension. Definition of dimensions are provided and participants just rate their importance.
Thank you
Hello,
I am a bit confused about appellation and calculation of validation parameters.
Am I right that Accuracy is qualitative (not quantitative) parameter and is a combination of Trueness and Precision?
Am I right with calculation of Trueness?
ref/x_avg,
where ref is reference value, x_avg is average of values (but sometimes I read median?)
And Precision depends on systematic error and is calculated
s/x_avg,
where s is standard deviation based on a sample (or the entire population?), x_avg is average of values
Thanks for every answer
Dear colleagues,
I have been helping analyse a sustainability project that compares % of biomass in composts.
As the design is 5x2 (4 replicates each, that's what I got given) using 15 predictors I'm using PERMANOVA and will later analyse the power to see if the analysis is valid.
However, the variables (chemical compounds and physical characteristics) have different units and quite different range values and I need to standardize them(I'm using z-score).
Have been looking for a while, but can't find an answer to the questions:
Should I apply the standardization by variables, meaning find each variable mean and standard deviation, or should I use the central point (the whole dataset, mean and standard deviation of all applied to each measurement)?
They give me different results and I would like to be able to support the choice I will make.
Would love to hear some insights and references into that.
All the best,
Erica
Imagine there is a surface, with points randomly spread all over it. We know the surface area S, and the number of points N, therefore we also know the point density "p".
If I blindly draw a square/rectangle (area A) over such surface, what is the probability it'll encompass at least one of those points?
P.s.: I need to solve this "puzzle" as part of a random-walk problem, where a "searcher" looks for targets in a 2D space. I'll use it to calculate the probability the searcher has of finding a target at each one of his steps.
Thank you!
I have a data set of particulate concentration (A) and corresponding emission from car (B), factory (C) and soil (D). I have 100 observations of A and corresponding B , C and D. Lets say, there are no other factor is contributing in particulate concentration (A) other than B, C and D. Correlation analysis shows A have linear relationship with B , exponential relationship with C and Logarithmic relationship with D. I want to know which factor is contributing more in concentration of A (Predominant factor). I also want to know if any model can be build like following equations
A = m*A+n*exp(B)+p*Log (C), where m, n and p are constant, from the data-set I have
Hi,
I am going to intervene mice with drug and want to see is there any effects of intervention compared to sham controls in mouse model. Do you have any idea about the priori power calculation tools/methods used in animal interventional studies?
Many thanks,
Nirmal
Hello Everyone,
I need to find the similarity or compare two different sensor data values whether these two patterns are the same or not. The values are measured each day and the length of the data points for each day isn't the same
eg: day1 - length of the data points are 700,
day2 - length of the data points are 1000
data are collected at the same time interval, but senor didn't capture completely that's the reason the length of the data points varies.
Similarity refers here, how close the day1 pattern to the day2 pattern.
As these data points don't fall into any distributions, I have applied multiple non-parametric tests like Kruskal, Mann-Whitney, etc. But these tests aren't consistent. Can anyone recommend to me how to proceed here or what is the best approach for this problem?
I have attached the sample plot for two different dates.

I am conducting design analysis of simplex-centroid mixes, with 7 points and 2 repetitions of the central point totaling 9 tests.
When I perform regression analysis, the only model that fits is the special cubic, however Total Error and Pure Error values are the same, so the lack of fit value is equal to 0.
I would like to know if the fit of the special cubic model is adequate for the responses obtained?
How can I know if I have generated an over-fit equation?
I have a very simple model where I measure the effect of tone of voice on purchase intention, moderated by brand alignment. I have a 2x2 between subjects model, with tone of voice levels being informal vs formal, and brand alignment levels being warm vs competent.
When I run a two-way anova, I see that the main effect for tone of voice is not significant. However when I run it in a one way ANOVA, this effect is significant. Can someone explain why this is, and if i were to report the one way anova would this be incorrect?
for bachelors thesis
I consider three readers circularly involved in Bland-Altman 2-by-2; threfore I obtained 3 Bland-Altman analysis with corresponding Limit of Agreement (and Confidence Interval). In order to get more accurate estimations, I would like to average the three Limit of Agreement but I am not sure this is correct. That's my first question. My second concern is about confidence interval when averaging standard deviation that themselves have CI. How to compute Confidence interval for averaged Standard Deviation (or Limit of Agreement)? Thanks for the help.
So i'm doing a meta analysis, and i have some question about a study that i have.
This is the data for one group(lower intensity group):
pre intervention mean(SD): 274(70.5)
post intervention mean(SD): 286.7(73.9)
mean difference pre and post (95%CI): 12.7 (-27.1, 40.0)
This is the data for other group(higher intensity group)
pre intervention mean(SD): 267.2(61.3)
post intervention mean(SD): 291.3(63.7)
Mean difference pre and post(95%CI): 24.2(-12,63.7)
As you can see the 95%CI between pre and post is asymmetrical around the mean. What causes this?
i'm comparing the mean differences (between pre and post) of higher intensity group compared to lower intensity group using revman, how do i use this data for meta analysis?
Here's the link to the study: (table 2, 6mwd)
we have some experimental data of mechanical strength of rock material. We compare this data with the estimated strength (calculated using several existing criteria) and also determined the error percentage for each criteria.
So i want to know that
what is the maximum percentage of error, that is acceptable for rock mechanics purposes, specially when we compare the experimental data with the estimated ones.
I'm working in a lab that is currently doing some research on 2 species of duckweed and I did a simple experiment to compare whether or not a certain way of cleaning jars has an effect on the growth of duckweed. The data is an exponential distribution since duckweed has exponential growth.
I've attached a picture of my data along with the equations of the trendlines.
I'm having trouble trying to figure out the best way to determine whether or not the differences between the sets of data for each species is significant or not. How should I go about comparing them? (I know that the method labeled "washer" appears to have less growth but I want to make sure that the difference is statistically significant)
I've been searching around the internet but I haven't really found anything that makes complete sense to me.

Hello everyone, I would like to ask if the way that the sample size of this research was calculated is valid or correct. Is a study to evaluate the effect of gargling with Povidone iodine among COVID-19 patients. The text says “For this pilot study, we looked at Eggers et al. (2015), using Betadine gargle on MERS-CoV which showed a significant reduction of viral titer by a factor of 4.3 log10 TCID50/mL, and we calculated a sample size of 5 per arm or 20 samples in total”. From this data of the reduction of the viral titer in a previous study on MERS-CoV ¿It is valid to calculate the sample size this way for a new study on COVID-19?
Hello, I need some help on what statistical test I should use for my data analysis. I have a Data set A, which is an array of 5000 numbers, all of which are zero, and a Data set B, which is an array of random continuous numbers that do not necessarily follow a normal distribution. Both data sets can be plotted on a histogram for visual aid. Data set A is my "ideal" and Data set B is my "measured" - I would like to compare the similarity of Data set B to Data set A (ideally it would be a single output figure such as a % similarity). I would then go on to test another Data set C (the same style of array as data set B - it does not have normal distribution and is continuous numbers) and compare its similarity % with Data set A. I would then be able to make a "ranking" on whether Data set B or Data set C was most similar to Data set A. Some of the considerations:
- The similarity value has to account for the shape (ie. the histogram of Data set B will rank with a higher similarity % the closer it is to a vertical straight line as shown in Data set A)
- The similarity value has to account for x axis distance on the histogram (ie. the further from zero the poorer the % similarity to data set A)
- The weighting of each has to be equal (ie. neither the shape or distance on the x axis is more important)
- Because the weighting is equal, if data set B was a straight line at -5, it should have the same % similarity to data set A if it had been a straight line at +5.
- the order of the values in array B does not matter
I'm essentially trying to rank data sets against the "ideal" data set A (but taking into account non normal distribution, histogram shape similarity, distance etc). I have no idea what test to apply that can give me a % similarity to the ideal under these conditions.
Thank you so so so much.
I have tested the effect of EO on 4 different bacteria and I have repeated it 5 times. I am not sure whether I can use T test as it usually covers two factors and not 4. If not what Can I use instead?
Hello,
So I am Supposed to do a study in order to answer a statistical question regarding the association between two categorical data or whether the 2 variable were dependent. I did a survey with 2 questions MCQ ( 1 with 3 choices and the second is 4). when we hear about association and categorical data we usually go for Chi-square and this what were asked to get. however my sample sample is too small (N=50) and as a result i had contingency table as in the attachment:
As a result of the values I got, I cannot use the Chi square because about 50% of my cells are with less than 5 for the expected value. While I was trying to find an answer, I saw someone stating that if I combined 2 columns I can have higher expected frequencies and as a result I can use the chi square. I did combine A and B and as a result I had higher expected values in all cells. Will this affect the results of my hypothesis and what to do in this case? which test shall I use. Thank you in advance.
NB: I want to do it using manual calculation not SPSS . Kindly look for the second reply for clarification.
Again thank you
Hi,
I’m studying the effect of joint (cracks) sets spacing and persistence on blasted rock size so I have two independent categorical variables (labelled SP for spacing and PER for persistence) that have 5 levels of measurement ranges each. The dependent variable is the blasted rock size (Xc ) i.e I want to know how the spacing and persistence of the existing joints on a rock face would affect the size of blasted rocks. Measurement levels for both spacing and persistence are listed below
spacing levels:
SP1: less than 60mm
SP2: 60-200mm
SP3: 200-600mm
SP4:0.6-2m
SP5: more than 2m
persistence levels:
PER1: more than 20m
PER2: 10-20m
PER3: 3-10m
PER4: 1-3m
PER5: less than 1m
Spacing and Persistence were recoded as ranges since they were estimated and not measured individually as it'd take too much time to measure each one (1 set of joint may have at least 10 joints, some can reach 50 or more and the measurement are not exactly the same between joints belonging to the same set. Measurement was done manually on site)
Initially, I ran the regression with these two variables as categorical variables but the problem is the levels are not mutually exclusive. 1 rock slope could consist of 2 or more crack sets hence the situation where more than 1 levels of spacing and persistence can be observed. As an example, rock face A consist of 3 crack sets:
Set 1 (quantity: 25) SP3 PER5
Set 2 (quantity: 30) SP4 PER6
set 3 (quantity: 56) SP2 PER3
As can be seen, 1 rock face contains 3 different levels of SP and PER.
Technically, these are ordinal variables and as explained above if I choose to treat them as categorical I face the problem of non-mutually exclusive levels. Recently, I found out that that ordinal variable can be treated as continuous which seems to solve my problems with non-mutually exclusive levels of the variables if I enter the variables as categorical. My main concern is to look at the variable as a whole not by its levels so it might be what I need.
My question is, is it correct if I assign the numerical value to the levels like this in order to treat the variables as continuous? 1 to 5, from lowest to highest.
Spacing:
1: less than 60mm
2: 60-200mm
3: 200-600mm
4:0.6-2m
5: more than 2m
persistence:
1: less than 1m
2: 1-3m
3: 3-10m
4: 10-20m
4: more than 20m
and then run regression as I would with the usual continuous variable? Plus, for prediction, once I get the equation, do I insert the value 1-5 as the X in the equation? I am still confused with the prediction step since even if I treat it as continuous I'd still have the problem with the presence of different levels of SP and PER. Or is there another way around this problem?
2nd question is: as provided in data example for rock face A, is it correct to repeat the data input according to the quantity? as in I’d entered set 1 data 25 times, set 2 data 30 times and set 3 data 56 times.
I am very new to statistic and learning it on my own so I might be wrong with something in this field. Any answers, suggestion and advise are very much appreciated. Thank you in advance!
I have a dataset with one nominal independent variable with 10 different levels and one dichotomous dependent variable.
What would be the appropriate statistical test to compare the different levels of the IV?
Example:
Independent variable: "Favourite color". Different non-ranked levels: Yellow, green, orange, blue, red, purple, black, white, brown and pink.
Dependent variable: Dichotomous: Smoker: Yes (1) vs. No (0)
I am not interested in choosing a reference level (in this example a specific colour) since there is no solid way to decide which of these colours should be the reference.
The only idea I can come up with for statistical testing is chi square (Fishers) comparing each level of the IV to the combination of all the other levels. In other words creating dummy variables (but without a reference level) - e.g. "Yellow vs. not yellow" and then perform chi square. Next "green vs. not green" etc. till the end (with all the levels).
Is this an accepted way to compare the different levels of a nominal variable?
My results will then be something like shown here:
"People with favourite colour yellow smoke significantly more than others".
"People with favourite colours green, orange, blue, red, purple, black, white and brown does not smoke significantly more (or less) than others".
"People with favourite colour pink smoke significantly less than others".
This analysis is easy to perform but is it statistically sound?
Are there any better alternatives?
Or should I simply stick to descriptives without any statistical comparison?
Thank you
Dear all,
I am processing nondiagnostic pottery shards. I analysed fragmentation and abrasion in the field.
I first divided my nondiagnostics into clay groups. After that for each clay group I sorted out the individual shards according to size categories using a grid (1x1cm, 2x2cm, 3x3cm etc) and I also analysed shards from each clay group according to 3 levels of abrasion.
Now, I want to calculate the level of fragmentation and abrasion of each clay group. Any suggestion on what would be the best way to do this?
Best wishes,
Uros
I have this set of data:
Treatment 1: 30.5; 34.5; 24.4
Treatment 2: 24.8; 20.8; 16.8
Treatment 3: 19.1; 21.4; 21.0
Treatment 4: 22.3; 26.1; 27.1
Treatment 5: 26.5; 31.2; 22.9
Analysis with SAS gave the following result:
ANOVA: p = 0.047 (significant)
Tukey test (alpha = 0.05):
Treatment 1: a
Treatment 2: a
Treatment 3: a
Treatment 4: a
Treatment 5: a
So which one is correct? How can I interpret this result?
what is the appropriate statistical analysis to show correlation between number of hours spent for online classes and its perceived effect on students’ mental health? The number or hours was one multiple choice question with 4 choices and the perceived effect on students’ mental health is a likert scale of 5 intervals indicating likelihood (never, rarely, sometimes, often, always) with 10 different questions. I see people recommending Pearson or spearman and i am thoroughly confused since i have ten questions. how do i condense the ten questions into one variable?
furthermore, i was initially planning to see the effect of online learning to students perceived mental health, but did that mean using mean and standard deviation. is that correct analysis for the hypothesis “online learning does not have an effect to students’ mental health”?
I am checking the relative expression of 10 genes via qPCR between control and patient cell lines (iPSC vs. NSC vs. Neurons). After analysis I have obtained:
a. 2-ΔCt values for each cell line which I have plotted as 3 grouped bar graph for each line (multiple t-test)
b. ΔΔCt
c. 2-ΔΔCt (fold change?)
Please help me represent b & c in the most useful way also, feedbacks on a. (particularly statistical analysis) would be greatly appreciated
Thanks in advance.
error bars represent SEM.
I have the energy specter acquired from experimental data. After normalization, it can be used as a probability density function(PDF). I can construct a Cumulative distribution function(CDF) on a given interval using its definition as the integral of PDF. This integral simplified as a sum because of the PDF given in discrete form. I want to generate random numbers from this CDF.
I used Inverse transform sampling replacing CDF integral with sum. From then I am following the standard routine of the Inverse transform sampling solving it for sum range instead of an integral range.
My sampling visually fits experimental data but I wonder if this procedure is mathematically correct and how it could be proofed?
Suppose that X1, X2 are random variables with given probability distributions fx1(x), fx2(x).
Let fy(x) = fy( fx1(x) , fx2(x) ) be a known probability distribution of "unknown" random variable Y. Is it possible to determine how the variable Y is related to X1 and X2?
After literature review, current draft as below:
Variables:
Independent = B (represented by I, M, O, S, C),
Moderating = W
Dependent = F
Control = CV
Hypotheses:
H1a. I has significant positive relationship with F.
H1b. M has significant positive relationship with F.
H1c. O has significant positive relationship with F.
H1d. S has significant positive relationship with F.
H1e. C has significant positive relationship with F.
H2. W has significant positive relationship with F.
H3a. W strengthen the positive relationship between I and F.
H3b. W strengthen the positive relationship between M and F.
H3c. W strengthen the positive relationship between O and F.
H3d. W strengthen the positive relationship between S and F.
H3e. W strengthen the negative relationship between C and F.
H4. B has a significant positive relationship with F.
H5. W strengthen the positive relationship between B and F.
Proposed Equations:
1) Pit = a + β1Iit + β2Mit + β3Oit + β4Sit + β5Cit + β6CVit + εit [H1a-e]
2) Pit = a + β1Iit + β2Mit + β3Oit + β4Sit + β5Cit + β6Wit + β7Iit x Wit + β8Mit x Wit + β9Oit x Wit + β10Sit x Wit + β11Cit x Wit +β12CVit + εit [H2 & H3a-e]
Question:
1) Are the proposed equations appropriated to test respective hypotheses?
2) Should an equation (e.g. weighted score for B based on I, M, O, S, C) be formulated to test H4 & H5? or It is not necessary but to conclude H4 & H5 as a whole based on the individual estimating result of H1a-e, H2 and h3a-e.
Thanks for advice / sharing in advance.
This is so far the procedure I was trying upon and then I couldn't fix it
As per my understanding here some definitions:
- lexical frequencies, that is, the frequencies with which correspondences occur in a dictionary or, as here, in a word list;
- lexical frequency is the frequency with which the correspondence occurs when you count all and only the correspondences in a dictionary.
- text frequencies, that is, the frequencies with which correspondences occur in a large corpus.
- text frequency is the frequency with which a correspondence occurs when you count all the correspondences in a large set of pieces of continuous prose ...;
You will see that lexical frequency produces much lower counts than text frequency because in lexical frequency each correspondence is counted only once per word in which it occurs, whereas text frequency counts each correspondence multiple times, depending on how often the words in which it appears to occur.
When referring to the frequency of occurrence, two different frequencies are used: type and token. Type frequency counts a word once.
So I understand that probably lexical frequencies deal with types counting the words once and text frequencies deal with tokens counting the words multiple times in a corpus, therefore for the last, we need to take into account the word frequency in which those phonemes and graphemes occur.
So far I managed phoneme frequencies as it follows
Phoneme frequencies:
Lexical frequency is: (single count of a phoneme per word/total number of counted phonemes in the word list)*100= Lexical Frequency % of a specific phoneme in the word list.
Text frequency is similar but then I fail when trying to add the frequencies of the words in the word list: (all counts of a phoneme per word/total number of counted phonemes in the word list)*100 vs (sum of the word frequencies of the targeted words that contain the phoneme/total sum of all the frequencies of all the words in the list)= Text Frequency % of a specific phoneme in the word list.
PLEASE HELP ME TO FIND A FORMULA ON HOW TO CALCULATE THE LEXICAL FREQUENCY AND THE TEXT FREQUENCY of phonemes and graphemes.
This is so far the procedure I was trying upon and then I couldn't fix it
As per my understanding:
- lexical frequencies, that is, the frequencies with which correspondences occur in a dictionary or, as here, in a word list;
- lexical frequency is the frequency with which the correspondence occurs when you count all and only the correspondences in a dictionary.
- text frequencies, that is, the frequencies with which correspondences occur in a large corpus.
- text frequency is the frequency with which a correspondence occurs when you count all the correspondences in a large set of pieces of continuous prose ...;
You will see that lexical frequency produces much lower counts than text frequency because in lexical frequency each correspondence is counted only once per word in which it occurs, whereas text frequency counts each correspondence multiple times, depending on how often the words in which it appears to occur.
When referring to the frequency of occurrence, two different frequencies are used: type and token. Type frequency counts a word once.
So I understand that probably lexical frequencies deal with types counting the words once and text frequencies deal with tokens counting the words multiple times in a corpus, therefore for the last, we need to take into account the word frequency in which those phonemes and graphemes occur.
So far I managed phoneme frequencies as it follows
Phoneme frequencies:
Lexical frequency is: (single count of a phoneme per word/total number of counted phonemes in the word list)*100= Lexical Frequency % of a specific phoneme in the word list.
Text frequency is similar but then I fail when trying to add the frequencies of the words in the word list: (all counts of a phoneme per word/total number of counted phonemes in the word list)*100 vs (sum of the word frequencies of the targeted words that contain the phoneme/total sum of all the frequencies of all the words in the list)= Text Frequency % of a specific phoneme in the word list.
PLEASE HELP ME TO FIND A FORMULA ON HOW TO CALCULATE THE LEXICAL FREQUENCY AND THE TEXT FREQUENCY of phonemes and graphemes.
I wish to estimate the expression of two types of markers by immunohistochemistry in the biopsy specimens of a particular type of cancer, and correlate their expressions with clinical parameters and outcomes. For this type of cancer, we see approximately 400-500 patients every year at our centre, which is about 10% of all of our cancer patients.The crude rate of this cancer in India is also around 10%. The estimated prevalence of the two markers vary between 50-75% as per published studies. What should be my ideal sample size ?
Hello,
I have measured morphometric parameters of plants grown in vitro (height, root mass, etc.). I have one variable, thus two test groups - control and treatment group. I've made three independent biological replicates of the experiment, 30 plants per each biological replicate. In total there are 90 plants for control group and 90 plants for treatment group. I have done single factor ANOVA for each replicate and achieved high F numbers and very low p-values. My question is, is there a way, similar to ANOVA for repeated measurements, to analyse this data as three independent units, or should I just merge the 3 replicates into one data set?
Hello,
I have little doubts about the statistical model that I am using to analyze my data. I have two groups of residue studies data Group 1 n=7 and the other group 2 n=47. they are independent and the studies are expensive and rare so I couldn't increase the sample size by any means.
I have tested the normality of both groups using the SPSS and found to be not normally distributed then I transformed all the data to fit the normal distribution using the square-root calculation - they fit the normal distribution p= 0.2 (more than 0.05).
which test should I use especially that I am using SPSS package
thanks
I have two datasets. The first one has 20 patients. While changing the LBNP pressure for each patient , (in a period of time), the physiological signals (ECG and blood pressure and Spo2) are recorded.
The second data set has 30 patients and again for a period of time , the same signals are recorded.
In total the first dataset has 400 samples where each sample corresponds to an ECG, blood pressure, Spo2 reading and for each sample there is the output (LBNP). (3 features and one output)
The second dataset is the same except we have 600 samples. For each sample we change the LBNP and read ECG, blood pressure, and Spo2.
My question is how to compare two datasets so we know whether or not they are different from each other? Each dataset comes from a different clinical team. What are the statistical tests that can be used to do the comparison?
There are two groups in the study such as group 1 and group 2. One of the groups received treatment, but the other did not. When the mortality of the groups is compared, it seems that there is no statistical difference. However, the expected mortality rate (calculated based on PRISM3 score) in first group ( treatment group) was significantly higher than the other. I think the treatment is successful by lowering the high mortality expectation. However, I could not find how to show this statistically or how can I equalize this imbalance (mortality expectation) between groups at the beginning.
Thanks
Hello everyone,
I have some difficulties with the procedure in my empirical evaluation. My structural model consists of seven independent variables and one dependent variable and so far I have evaluated 150 German and 150 American data sets individually. However, since I want to check in which country the significant relationships between IV and DV are stronger, I am unsure how to proceed correctly.
Is it possible to compare just the significant path coefficients to find out if the relationship between IV and DP in country A is stronger than in country B?
Or does this have to be evaluated by means of a multi-group analysis?
Thanks for your help,
Baxmauer
The original series is nonstationary as it has a clear increasing trend and its ACF plot gradually dampens. To make the series stationary, what optimum order of differencing (d) is needed?
Furthermore, if the ACF and PACF plots of the differenced series do not cut off after a definite value of lags but have peaks at certain intermittent lags. How to choose the optimum values of 'p' and 'q' in such a case?
I have seen that some researchers just compare the difference in R2 in two models: one in which the variables of interest are included and one in which they are excluded. However, in my case, I have that this difference is small (0.05). Is there any method by which I can be sure (or at least have some support for the argument that) this change is not just due to luck or noise?
To illustrate my point I present you an hypothetical case with the following equation:
wage=C+0.5education+0.3rural area (
Where the variable "education" measures the number of years of education a person has and rural area is a dummy variable that takes the value of 1 if the person lives in the rural area and 0 if she lives in the urban area.
In this situation (and assuming no other relevant factors affecting wage), my questions are:
1) Is the 0.5 coefficient of education reflecting the difference between (1) the mean of the marginal return of an extra year of education on the wage of an urban worker and (2) the mean of the marginal return of an extra year of education of an rural worker?
a) If my reasoning is wrong, what would be the intuition of the mechanism of "holding constant"?
2) Mathematically, how is that just adding the rural variable works on "holding constant" the effect of living in a rural area on the relationship between education and wage?
Hi everyone,
I am planning on constructing a Fama french 3 factor model for a period from 1.1.1998-31.12.2015 for a portfolio of about 120 stocks. I have collected the monthly returns for each stock over 36 months since their IPO. The process of doing a Fama french 3 factor model for a single stock is very straight forward as seen in this video: https://www.youtube.com/watch?v=b2bO23z7cwg
However, how should I proceed with a portfolio with returns that all have different starting dates (as each firms have a different IPO date)?
My tough was as follows:
- Calculate the average 1 month return, 2 month return,, 3 month return, ….36 month return from all the stocks in the portfolio.
- Calculate the 1 month average, 2 month average, 3 month average, ….36 month average of the Rf, HML, SMB, Mkt-Rf
- Subtract 1 month average Rf from average 1 month return, repeat until the 36th month.
- Proceed with running the regression.
Many papers, such as the one by Levis (The Performance of Private
Equity-Backed IPOs), have used the Fama French 3 factor model but do not explain the mechanics behind the process.Any help is more than appreciated.
Any help is greatly appreciated
-Sebastian
Hello everyone,
I am trying to statistically analyze whether data from 3 thermometers differs significantly. At the moment, because of COVID-19, several control points have come up at the company for which I work. We have been using infrared thermometers to check up on people and to be aware if they have a fever or not. However, we don't own a control thermometer with which we could easily calibrate our equipment, we thought that using a statistical test would be helpful, but at this point, we are lost.
Normally, we would compare our data to our control thermometer and that would be it. Our other thermometers are allowed to have a difference of +-1°C at max when we compare them to their controls; we can't do that now.
What I have been doing is collecting 5 to 10 measurements from each thermometer and compare them through an ANOVA test, and then assessing the results (when needed) by running Fisher's Least Significant Difference test.
I don't know if it is right to do so because sometimes the data I collect does not seem to vary a lot (the mean difference is NEVER greater than +-1°C), and even so the test concludes that they differ significantly.
What would be right here? We don't want to work with the wrong kind of equipment or put away operating thermometers without a solid reason, we just want to do what's best to our people.
Could you guys please help me?
Hello,
I have little doubts about the statistical model that I am using to analyze my data. I have two groups of residue studies data Group 1 n=7 and the other group 2 n=47. they are independent and the studies are expensive and rare so I couldn't increase the sample size by any means.
I have tested the normality of both groups using the SPSS and found to be not normally distributed then I transformed all the data to fit the normal distribution using the square-root calculation - they fit the normal distribution p= 0.2 (more than 0.05).
the data is then analyzed by SPSS - independent t-test to compare between the two means and the p-value of unequal variance was selected to act as the significant value instead of the usual T-test (by other words Welch's test was used ) kindly have a look at the attached data
and used to plot the CI of 95% with each mean of the groups tested in a bar chart
Do you agree using a sample of 7 is not affecting or biasing the test result? and if n=5 of one group and the other group n = 16 is is still a valid method to use? or should I use another test for small samples?
I wish if someone confirm my thoughts?
Thanks
I've created a playlist on YouTube that helps researchers in analysing their survey data. Please suggest any statistical test that you think is useful and not present in the list to work on it and prepare it. You can subscribe to the channel to watch future videos.
This is how i interpreted it:
Results of a binary logistic regression analysis to assess the effect of demographic factors such as farmers education, farmland size, location of farmland in the catchment, and land use type on the likelihood that a farmer would adopt an SWC measure (coded as Yes) or not adopt an SWC measure (coded as No) on his/her farm are presented in the Table below. The full model containing all predictors was statistically significant, x2(df = 4, n = 73, Yes = 56, No = 17) = 9.723, p = 0.045, indicating that the model was able to distinguish between farmers who are likely to adopt or not adopt SWC measures on their farm. Overall success rate of the model was 66%. Based on the odd ratios, farmers with formal education were 13 times more likely than those with no formal education to implement SWC measures on their farm.

In my study I want to compare a regression coffecient between two groups. The coffecients were negative for the two group. Based on my understanding, a negative sign of a coffecient indicates inverse relationship between X and Y but not the magnitude of the relationship. If the coffecient for group A= - 0.5 and for group B= - 0.6, then does this mean that coffecient for A is larger than B.
When I calculate the coffecients difference bweteen the two groups, the system (Stata) consider that the coffecient for group A >B.
Does this estimation correct? What is the appropriate way to test the difference between the coffecients?
I have a data set as follows:
0.65, 0.86, 1, 1, 1, 1, 1, 1, 1, 1. When I am drawing a box and whisker plot in excel 2016, the whiskers on the lower side are not appearing. Instead two data points representing 0.86 and 0.65 are shown below the Q1 value of the box. I am unable to figure out the reason for this?
As per my understanding, there should have been a whisker at the min value i.e. 0.65 connecting it to the lower end (Q1) of the box.
Kindly help.
Regards
Sanchit S Agarwal
I am facing a problem when I try to calculate the hr from two different survival curves, here is the problem: in the first plot the experimental group's curve is more close to the placebo group then the second plot, even if the first plot's hr is smaller than the second plot. I wonder what the possible reasons are. Can you guys help me to solve this problem? Thanks.
Dear Researchers,
I'm badly need assistance from a statistician to interpret my data with references. Analysed results are attached.
IVs - Idealized influence, inspirational motivation, intellectual stimulation, individualized consideration
DV - Employee green behaviour
1. What is the relationship between the dimensions of environmental transformational leadership (ETL) exhibit by managers and green behaviour of field officers of ABC Plantations?
2. What is most influencing dimension of ETL on green behaviour of field officers of ABC Plantations?
3. What is the relationship between ETL and green behaviour of field officers of ABC Plantations ?
4. What are the recommended strategies to be adapted to increase the level employee green behaviour in ABC Plantations?
Given the time limitations I considered only the total population of field officers in ABC Plantations. (Which is 85 and 81 responded) and data collected through self administered questionnaires.
Thank you





+5
Hi all,
I'm interested in comparing the ratios of yes/ no results between 2 strains:
yes no
strain a 10 20
strain b 50 50
For each experiment, I have 3 biological replicates (same strains, same protocol, different days).
I was wondering whether I can use the
Cochran–Mantel–Haenszel test for repeated tests of independence to answer whether there is a difference between the yes/no proportion of strain a and b?
If so-
what is the best way to graph it?
Thanks!
I am from social science background and looking for some material that sums up data imputation methods in one reading material. The articles that I found describe the common methods used but I would like to know how the method was done. Thanks!
I tried using structural equation modeling to analysis a cross sectional design study data. The dependent variable is categorical (Dichotomous), I have 8 latent variables in my model (independent variables measured by scales), and 2 observed independent variables. The model fit results are: CFI/ TLI: 0.745/ 0.727, RMSEA: 0.046, number of free parameters: 162.
I also tried to modify my model based on the model modification results, but there was still no improvement in the model fitting result.
Do you have any suggestions to deal with this weak fit model? The software I used for analysis is Mplus version 7.4.
Thank you for giving any comments!
Reading Wooldridge's book on introductory econometrics I observe that the F test allows us to see if, in a group, at least one of the coefficients is statistically significant. However, in my model I have that, individually, one of the variables of the group I want to test is already statistically significant (measured by the t-test). So, if that is the case I expect that, no matter with which variable I test for, if I include the one that is already individually significant, the F test will also be significant. Is there any useful usage I can make with the F test in this case?
I am writing a non-parametric/parametric statistical analysis paper on three Independent data sets. (Human Development Index, Gini Index, US Aid) for 10 countries, annually over the last 10 years. I want to find out whether the Gini index can be described as a predictor for the country's Human Development, and whether US Aid impacts this.
I want to know which tests I should conduct to find an inference for my data.
I have two models. The second one is the first plus more control variables. I see that the coefficient of a variable that I have in the two models has decreased in this second model and I want to know if this difference is statistically significant. For this to be true, is it necessary that the confidence intervals of the variable in the two models don't include the value of it's coefficients? Or the condition is that not even the limits of the confidence intervals cross with each other?
I have reported life satisfaction as my dependent variable and many independent variables of different kinds. Of them, one is the area in which the individual lives (urban/rural) and other is the access to public provided water service. When the area variable is included in the model, the second variable is non significant. However, when it is excluded, the public service gains enough significance for a 95% level of confidence. The two variables are moderately and negatively correlated (r= -0.45).
What possible explanations do you see for this phenomenom?
all i can find is for public health datas which are different so need a close suggestion
Dear Scientists,
Greetings
Please, could anyone give me an alternative to analyse data generated from an augmented Block design layout?
The Following known softwares are not working! Could anyone know the reasons? I urgently need your help!
Here are the softwares/links
Indian Agricultural Research Institute, New Delhi
•Statistical Package for Augmented Designs (SPAD)
•SAS macro called augment.sas
CIMMYT – SAS macro called UNREPLICATE
•Developed in 2000 – uses some older SAS syntax
Thanks in advance for your help
Regards
I'm looking for a free and user-friendly tool. I'm familiar with python (and a bit R).
Thank you
There are some statistical test to compare the significance test between the two Pearson correlation coefficients (by fisher's r to z transformation.) I want to know if there is any statistical test to compare the significance of the difference between two Concordance correlation correlations. So that it can be compared to each other so one can claim that one is "stronger" than another.