Normal Distribution - Science topic
Continuous frequency distribution of infinite range. Its properties are as follows: 1, continuous, symmetrical distribution with both tails extending to infinity; 2, arithmetic mean, mode, and median identical; and 3, shape completely determined by the mean and standard deviation.
Questions related to Normal Distribution
I am conducting a panel data regression for a research on economic growth of few countries. In real life, it is hard to find data that are normally distributed and most of the control variables are correlated with each other in one country or another.
However, the regression test results are satisfactory and all show that the residuals are normally distributed, there exists no serial correlation and heteroscedasticity. Even the CUSUM and CUSUMSQ tests show that the model is stable.
In such a case, are the diagnostic tests enough to justify that the results of the regression model are reliable and valid even when data are not normally distributed and there exists correlation among them?
Thank you in advance for your responses.
I have a non-normally distributed variable (income) and although I tried to transform it to a normally distributed variable skewness and kurtosis values are still so high and there is lots of outliers on it. But can't delete the outliers because it is about nature of income variable. So I didn't delete a single one (by the way N=9918, I am not sure it is acceptable to delete 200 or 300 of them). I read about after conducting the OLS if residuals are distibuting normally it is acceptable to use OLS results. But I couldn't find any academic source/strong reference about it.
I wonder that when I have normally-distributed residuals can I use OLS results even if the variable has outliers and have higher skewness and kurtosis values? If this is an acceptable way to conduct this analysis, can you suggest an academic resource that I can reference to support this usage?
Thank you in advance.
Am trying to model NGN/USD exchange rate using some linear forecast methods, after transforming the data using Box-Cox method, the data failed to be normal. I checked for possible outliers and yet no outliers found. So am wondering what can be done to make the transformed data normal.
Which test do I use for not normal distributed data set (n=10)? I originally wanted to use a t-test. I want to compare two groups with each other.
I have 667 participants in my sample, and the outcome is continuous. I tested the normality, and the data on the histogram are bell-shaped, but the test results show that they are not normally distributed.
1- What is the cause of the discrepancy between the chart and the test results?
2- Can I still perform linear regression analysis on this data?
My data were not normally distributed even after transformation, so I used Kruskal Wallis test and then went further to do Bonferonni post hoc. i am a little mixed up with Dunn's test, Bonferonni test and Dunn's Bonferonni test
I just have a quick question when it comes to deciding whether a dataset is normally distributed. I have come across the situation where I have checked that the values of skewness and kurtosis remained between 1 and -1. However, when I checked these data using a normality test (using the Kolmogorov-Smirnov and Shapiro-Walk tests), these gave significant values, indicating that the data was not normally distributed. I have tried to transform the data and I encountered the same situation: Kurtosis and Skewness values remaining between 1 and -1, but normality tests giving significant values. What do you decide in this case? Is the data normally distributed?
(data contains more than 2000 participants).
Thanks in advance,
I have one three-digit integer, with known parameters of normal distribution (expectation and standard deviation). How to calculate apropriate value of such normal distribution by this integer ?
In my research, I have four categories in my questionnaire, each category comprises a set of items. My data proved not normally distributed, so a non-parametric statistical procedure will be used to determine the relationship among variables, i.e., the Spearman rank-order correlation is chosen. My question is: what do I choose from the function list in the Compute Variables option on SPSS, so I create new representative-to-sets-of-items variables to use when I carry out Spearman correlation? Is it the overall mean or median for each set of items?
Can I run the ARDL method even if my residuals do not follow normal distribution. I have already taken the log of the variables and they follow normal distribution but not when I estimate the ARDL model?
I my research question there are two independent variable and one independent variable, first independent variable has two-levels (gender) second independent variable has three-levels and dependent variable has two-levels, which are normally distributed. Can I use Two-way ANOVA?
I am planning to analyze my data through SEM. However, my data is not normally distributed and some variables are heavily negatively skewed. I have tried transform my data using log transformation, square root and inverse transformation but normality tests still indicate a non normal distribution. What else can I do to arrive to a normal distribution? Noting that I have tried to run the data through AMOS and I keep getting very large numbers for CMIN/DF fit indicators. I have a sample of over 500 cases.
I have a normally distributed data. Before conducting CFA, is it necessary to remove outliers?
First, I removed two of the outliers and conducted CFA, my CMIN/df value was <3, but one of the factor loadings were .25. Then, I conducted CFA without clearing any outliers: my CMIN/df value was slightly over 3, but the factor loading I mentioned become .29. Other values (GFI, CFI, TLI, RMSEA) did not change much in both calculations.
I have an empirical dataset with a mix of scale variables and dichotomous variables. I am seeking to test the relation between the scale variables and dichotomous, and the relation between two dichotomous variables.
Can I perform a linear regression for the scale and dichotomous? I have heard that dichotomous variables are not normally distributed, and normally is an essential assumption for linear regression.
And can I perform Chi-square to test between two dichotomous variables? I got suggested Spearman as well, but those two tests give the same results. Is the sampling supposed to be random?
I have studied the effect of hyperglycemia on subarachnoid hemorrhage (preclinic) and I used PET and MRI for this purpose. I am now comparing the data between hyperglycemic and normoglycemic animals on days 1 and 3 (I measured the volumes of damage). For doing the statistics, I am using GraphPad prism. I put all my data in columns, I performed a normality test and they do not follow a normal distribution so the parametric analysis is automatically dismissed. Then, I performed a Kruskal Wallis analysis but, here is my concern:
As I am interested in assessing the difference between hyperglycemia and normoglycemia (differences on day 1 and differences on day 3), I thought that maybe a Mann-Whitney test choosing the variables two by two would be correct as well (I have two groups, not four). What do you think? Is it correct?
Thanks in advance!!
In the use of Expectation Maximization, which is one of the methods used in missing data analysis, is it necessary for the variable containing missing data to be normally distributed?
My dependent variable is categorical and independent variables are ratio.
However, dependent and independent variables are not normally distributed and have outliers.
What suitable correlation test can be used?
One way ANOVA has multiple assumptions. Amongst them, what do these two mean?
-There should be homogeneity of variances
-Dependent variable should be normally distributed for each category of the independent variable
-what does 95% confidence interval in one-way ANOVA mean? Does this mean F statistic is 95% likely to be true compared to population mean?
-For significant F statistic, I should use dunnett's post hoc test to compare means of groups to control mean. Tukey post hoc would be if I want to compare means of each group to each other. In simple terms, What is Bon Ferroni post hoc and Sidak post hoc for?
I'm analysing data coming from a QoL scale where lower scores indicate better quality of life.
As the data in the pre and post intervention reviews is not normally distributed, I've used a Wilcoxon Signed Ranks Test for 2 related samples.
The sample size is not tiny but not very large either (n=51); there is a clear difference in the rankings, with the sum of ranks for negative ranks (which in this case indicate improved QoL) being higher than the sum for positive ranks (598.50 vs 262.50), Z=-2.188 and the p value =.029.
However there is no change in the median values (12 at pre and post).
Would it make sense to report only on the ranks and not the median? Or reporting the median change by ranks (negative, positve, ties)? I'm getting a tad confused.
I have 4 independent variables, 1 mediator, and 1 dependent. The sample size is 214
1- After Conducting the normality test by Kolmogorov–Smirnov, and Shapiro–Wilk test, I found the following :
A- 2 independent variables were nonnormal distributed according to Smirnov, and 2 were normally distributed.
However, I found all were normally distributed according to the Shapiro test.
C- The skewness, kurtosis, and histogram were clearly normally distributed.
2- After conducting a multicollinearity test, I found the independent non-significantly correlated
3- I conducted a correlation test and I found that 1 of the nonnormal distribution according to the Smirnov test was non-significantly correlated with the mediator or with the dependent.
Because of that, I will not consider them in further analysis steps.
thus now I have 3 independent were 1nonnormal distributed according to Smirnov( but normally according to Shapiro, skewness, kurtosis, and histogram data)
4- Now, I am going to use Baron and Kenny's steps for mediation analysis with Preacher and Hayes (2004) bootstrap method.
The questions are as follows:
1-if my approach is right or not?
2- If I have to conduct CFA or not? (The questionnaire was adopted after translation, content validation, and reliability test)
3- If there are missing steps in my analysis approach?
I hope you can please help me deciding if my statistical choice is good.
I have samples from diagnosed AD patients distributed in three groups (Ctrl, mild and severe groups) with n=10 for each of the groups. I have performed WB for each of those 10 samples and assessed the expression levels of different proteins. I have normalised those proteins to a loading control and then I have normalised the mild group and severe group to the CTRL group.
The data obtained does not seem to follow a normal distribution and there is a great variability between patients.
Considering that and that my n is below 30 samples, I deciced to run a non-parametric test (Kruskal Wallis H Test) followed by Dunn's correction for multiple comparisons when p value was <0.05.
I am unsure if I should have chosen an ANOVA test to compare the means of the groups even thought the data do not follow a normal distribution. I have considered doing a log transformation (the graph looks much better because I had high SEM), which apparently helps making your data normally distributed and then applying a parametric test (ANOVA) but I am still in doubt.
Any suggestions would be helpful. I have found very different opinions on previous similar disccussions but none of them helped me deciding which test to choose.
Hi RG community,
I have a dataset which has a non-normal distribution and I want to perform linear regression on it (using Graphpad Prism 9.3.1). Here is the process I follow:
1. First determine normality of data set using D'Agastino-Pearson omnibus normality test.
2. Find correlation coefficient (r): Pearson Correlation if data has normal distribution or Spearman Correlation if data has non-normal distribution.
3. Then I perform linear regression in Graphpad Prism and find out the fit equation and the goodness of fit (R-squared or R2).
Usually, for normally distributed data, the R-squared term numerically turns out to be square of correlation coefficient r (Pearson Correlation coefficient)
i.e., R-sqaured or R2 = (r)2
However, this is not true for non-normally distributed data. (Is this normal?)
Hence, I just wanted to get clarified on a few silly things: (haven't really worked much with non-normally distributed data)
1. Is it ok to have R-squared or R2 not equal to (r)2 for non-normally distributed data?
2. Is it ok to perform linear regression on non-normally distributed data as is or does it need some modification before linear regression can be performed?
I appreciate any help regarding this topic.
Is normal distribution of data necessary in methods used to complete missing data such as Markov Chain Monte Carlo (MCMC) , Expectation Maximization (EM), k-NN Algorithm, Multiple Imputation (MI)?
I have 2 groups that I'm measuring effect of intervention. There is a large variation in SD for the intervention and non intervention. Intervention SD is 74.24 and non-Intervention SD is 63. How this can reflect the Intervention as true predictor vs Non-Intervention? Do I focus on normality? (There is normal distribution between the two) or something else? How to discuss as a limitation? Thanks
the two samples show a significant Shapiro-Wilk p-value when I'm applying it for each of them. When I'm running the t-Test my stats software (JASP) allows me to check assumptions and now it says that normal distribution is not violated. How can that be? Is it about the distribution of the mean differences?
Thanks and have a nice day!
Hello. I am testing a model fit in AMOS. The data follow a normal distribution and meet the assumptions of: linearity, multicollinearity and homocestality. I have used maximum likelihood (ML).
The problem is that in the standardized weights, the correlations and the R-squared, no information about the significance and confidence intervals are provided.
I have seen that this information can be accessed using ML together with bootstrapping, also the confidence intervals. Can ML be used together with bootstrapping when the data follows a normal distribution?
Hello all! As part of my master's thesis, I did an experiment. I now have 5 groups with n=45 participants each. When I look at the data for the manipulation checks, they are not normally distributed. In theory, however, an ANOVA needs normally distributed data. I know that an ANOVA is a robust instrument and that I don't have to worry about it with my group size. But now to my question: Do I in fact need normally distributed data at all for a manipulation check measure or is it not in the nature of the question that the data is skewed? E.g. If I want to know if a Gain-Manipulation worked, do i not want to have data skewed either to the left (or right - depending on the scale)?
Would be great if somebody could give me feedback on that!
I am in the process of adapting the questionnaire for my research. The data on one of the scales has a strong deviation from the normal distribution - 64% of the answers have the lowest possible value (the sample consisted of 282 people). The rest of the responses were also distributed towards the low values. There were 3 items on the scale. A Likert scale of 1 to 7 was used for responses.
However, it seems to me that the construct being measured implies such a result in this sample.
Can you tell me whether such a scale can be considered usable? Or is it too heavily skewed? Are there any formal criteria for this?
I've already done a CFA, and I've checked the test-retest reliability and convergent validity (the results are acceptable).
Thank you very much in advance for your answers.
I run a two way random effects model and in residual diagnostics, I found that the residuals are not normally distributed. What should I do now?
If I need to find correlation between dependent variable which is not normally distributed with two independent variable one are normally distributed and other was not which test prefer person or spearman?
Note: All my variable are scale not ordinal
if for example I want to compare BMI for two group? When I use shapiro wilk test to check normality between BMI of each group! one group is normally distributed And other not .. so what test should be used to compare their mean t test or mann-whitney?
if for example I want to compare BMI for two group?
When I use shapiro wilk test to check normality between BMI of each group!
one group is normally distributed
and the other was not ?
what test should i use either t-test or mann whitney?
I'm developing a set of firm-level risk and uncertainty index. I run different density functions, for instance, PDF, CDF. In my case, most of the indexes are normally distributed. Do you think that it should be ideally normally distributed? Thanks in advance for your response.
I hope you can assist me. To complete my multiple regression analysis with a sample of 80, I examined my data for Kurtosis and Skewness as part of my normality analysis. Two of the variables showed skewness and kurtosis z-scores in the region of -2.20, which is somewhat greater than the minimum value of -1.96
My questions are:
1. Should I conduct a log 10 transformation to correct a small skewness, or should I simply report it in the results section?
2. If the response is that I must log 10 the data, should I then utilise the modified log 10 data to conduct my regression analysis (instead of my original data without the log 10)? I believe this will not affect the total outcome.
3. In addition, I performed a log 10 test, and although the variable was normally distributed, the Shapiro-Wilks and Komogorov smirnov tests revealed that the P value was less than 0.05. Should I report this, or am I required to apply a non-parametric test to continue?
Thank you beforehand for your assistance.
I found that some people tend to use Welch T-test for analysis even if the results are not normally distributed. Should I use Welch T-test or the Mann-Whitney u test? The sample size is slightly different between the 2 groups. The sample size is around 30 in each group. The results from the 1 group are not normally distributed.
I would like to analyze the p-value for 7 groups. The sample size is slightly different among the groups. The sample size is around 30 in each group. The results from the 3 groups are not normally distributed. Could I still use ANOVA for analysis?
I have designed a framework containing 3 variables, 2 dependent and 1 independent. The constructs are reflective type. I need to test direct relationship as well as the possibility of existence of mediation and moderation relationship. There are two control variables as well. The sample size is 183 and the data is normally distributed. I am facing difficulty in deciding whether to use CB-SEM or PLS-SEM.
my study sample size is 355, and after i performing Mahalanobis test, 7 out of 355 outliners were identified. the initial reliability score of each sub-scale items ranged 0.7 - 0.9, but after i removing these 7 outliners the reliability of all sub-scales were drastically dropped below 0.6, even 0.5. does it mean i cannot remove these outliners because they're naturally generated? (my data set is not normally distributed)
I have a questionnaire with 20 questions. Average score of question 1,6,7,9 (say FACTOR-1) and 2,3,5,8,10 (say FACTOR-2) are taken. The two factors are not normally distributed.
What non-parametric techniques mixed effect models are possible for this situation?
Waiting for your response
My design is a pre-post control group design. Due to the small samples size (16+16) and the absence of normality is the pre-test scores, I performed a Wilcoxon Signed Ranks Test to test the differences between pre and post test scores for both groups. However, I found out that post-test data is normally distributed for both groups. My questions is:
In this case, in order to test the difference between groups, should I perform Mann–Whitney U Test or Independent- T test?
Thanks in advance.
I have one environmental risk score for psychosis (ERS) value for each participant in my database. The scores vary from -4.5 to 10. And the data is not normally distributed.
I would like to ask whether there is a proper way to transform these scores into positive values.
After transforming these values into positive values, is there any suggestion to have a normal distribution of the data? I thought after having the positive values, I could do a log transformation to have a more normal distribution. However, I am open to all your suggestions.
I really appreciate any help you can provide.
I am currently writing my thesis on the effects of prolonged handling on the behavior and physiology of mice, by comparing a handled and non-handled group. I have worked out the duration spent per behaviors observed from four handled individuals and four non-handled individuals. I discovered that the data values from certain individuals of each group are not normally distributed and data values from other individuals are. Can anyone give me advice on which statistical test I can use to determine if there is significance between the duration spent per behavior from all of the individuals between both groups? I'd also like to test for significance between duration spent per behavior from only handled individuals and non-handled individuals. Hope this makes sense.
I'm not very clued up on statistics.
Let X be a random variable which follows a distribution, say S with parameters a, b and c. Knowing that or Assuming that a, b and c are independent of one another, which one is reasonable to do?
a) Is it okay to find joint Jeffrey's prior as the product of three Jeffrey's prior (for a, b and c)
b) Is it better to use the square root of determinant of a 3x3 Fisher's Infromation matrix?
Even for normal distribution with two unknown parameters, joint Jeffrey's prior based on the product of two Jeffrey's prior (mu and sigma) yields a slighly different prior than the one based on the determinant of a 2x2 Fisher's Information matrix.
My study design is quasi-experimental. Basically, I am trying to explore the effect of an intervention on two groups of participants (control vs experiment) over two times (pre- and post-test), and the effect of this intervention on three dependent variables (self-efficacy, motivation and self-regulation). All three DVs were measured using surveys of 5 and 6-point Likert scales. So, the data here are ordinal. However, the skewness and kurtosis fall between normal and acceptable ranges, i.e., ±2 for skewness and kurtosis. The data are normally distributed as indicated by the approximately small range of skewness and kurtosis values. My sample size is = 124. Some scholars suggested using repeated measures ANOVA in my case, but how to carry out ANOVAs when my DVs are ordinal? I still believe I should be running Wilcoxon signed-rank test instead.
I have 6 faecal samples which are divided into 3 groups. I have got Shannon's indices of group 1 is 2.20, and 2.91, group 2 is 2.52 and 2.78, group 3 is 2.28, and 3.30. The Simpson indices of groups 1 (0.798 and 0.878), group 2 (0.841, 0.851), and group 3 ( 0.910, 0.799) My question is, which would be the best statistical analysis to determine if there is a significant difference in the species that are present in each group? I've seen web pages suggesting using one-way Anova test but that works for normally distributed data. The Shapiro wilk test shows a not normal distributed data.
In my research, I hypothesize that a variable X (among others) will have an inverted U-shaped relationship with Y.
To make the residuals normally distributed I am using inverse transformation on Y.
Then, I perform a linear regression with adding X, sqaured terms for X and other variables. both X and X sq. are significant and the coefficient of X sq is positive implying u-shaped relationship.
Below is the scatterplot of X and Y (inverse). I want to know that when I report this, do I show the scatter plot of X and Y inverse (which was the variable after transformation) and then I can conclude, if inverse have U shaped effect then the variable without transformation will have an inverted U shape effect. Will it be legitimate?
or I show a scatter plot of X and Y.
My dataset is not very big (between 100-200). the residuals are not normally distributed. so:
1. Is there any other statistical method similar to multiple linear regression but suitable for this case?
2. If not, what can be the solution?
I am running an experiment to compare the difference between the control and intervention groups with 5 and 4 sample sizes. From the literature, the smallest sample size that an independent sample t-test can be used is 4. However, one of the major assumptions of the t-test is that the data from each group must be normally distributed. Normality assumption cannot be established on a small sample due to inadequate estimation of the dispersion of the data.
Can I continue using the independent t-test or consider using a non-parametric test. I have run 70% of the experiment data before the normality assumption comes to mind.
In case of not, what is the non parametric test could be used?
My null hypothesis is that "there is no significant relationship between socioeconomic characteristics of the victims and their perception of treatment by police." Perception of treatment by police consists of three statements and is measured using the 5-point Likert scale. I opted to perform ANOVA and tested the normality. But the data failed to achieve the normality. Please suggest a suitable test find an association when the data is not normally distributed.
One dependent variable (continuous) ~ two continuous and two categorical (nominal) independent variables
I'm seeking for the best method for predicting a data collection with more than 100 sites. The distribution of all continuous variables is not normally distributed.
I have a sample and in my study were going to compare 2 or 3 samples.
If I am going to compare 2 groups, I need to know if one of the sample has a normal distribution and another doesn't, can I run an parametric or nonparametric test?
Same for 3 samples, if 2 sample has normal distribution an 1 doesnt. Or, of 3 samples, 1 doesnt have normal distibribution and the others has, I run a parametric or non parametric test?
For my data analysis, I used the Kruskal Wallis test because there is no variance homogeneity and no normal distribution. As a post hoc test, I used the Games-Howell test with Turke's p-value. Can I now calculate the effect size, which Jasp does not show me for the individual post-hoc tests, simply by hand from my descriptive data?
any help with what probably is not a very complex question, i am unsure as to what statistical test to use. I am examining the relationship between a categorical variable (Tree species) with 5 levels (5 different species) and two categorical explanatory variables, size (small, medium and large) and orientation (N,S,E,W). data does not have normal distribution, what test should i use? a multinomial regression?
I would like to run a Monte Carlo Simulation for stocks. Since stock returns are not normally distributed I am wondering, what is the best distribution function?
What about Weilbull, Frechet, Gumbel, Rossi etc.?
My biggest concern is, to incorporate the difference between median and mean. My Data is:
Mean: 10,67% Median: 14,77 % Standarddeviation: 17,41%
Determining intervals for the common language effect size (CLES), probability of superiority (PS), Area Under the Curve (AUC) or Exceedance Probability (EP) is possible via multiple method Ruscia and Mullen (2012). However, is this also possible via Fishers Z transformation? For simplicity I will call the “effect size” EP.
If we make the following assumptions: we have a (real) value that can range between -1 and 1 and assume the error distribution is (approximately) normally distributed (also invoking CLT), then we would be able to obtain intervals via Fishers z transformation (I think???).
The rationale is: EP does not range from -1 and 1, but from 0 till 1. Hereby 0.5 would represent the NULL. However, it would be possible to transform the EP to a value between -1 and 1 assuming a “directionality”: < 0.5 is negative and >= 0.5 is positive. Then,
EPz = ln[ (EPd+1)/(EPd-1) ]*0.5 = atanh(EPd)
SE = √[ 1/(3-n1) + 1/(3-n2) ]
Lower = SE*1.96-EPd
Upper = SE*1.96+EPd
Transformation back to the original scale (EP) would be possible for both positive and negative values.
EP = [ exp(EPd*2)-1)/(exp(EPd*2)+1 ]/2+0.5 = tanh(EPd)/2+0.5
EP = [ 1 + (exp(EPd*2)-1)/(exp(EPd*2)+1) ] / 2 = [ 1 + tanh(EPd) ]/2
However, when comparing the analytical intervals to Monte Carlo (MC) simulations the intervals are much broader using a smaller samples size. Although the extreme intervals, either upper when EP < 0.5 or lower when EP < 0.5 Below an example of Ruscio and Mullen (2012) where n1 and n2 are both 15 and another example with the same mu and sd when nx and ny are both 150. Also the intervals by Ruscio and Mullen (2012) are much smaller. The question is, why are these intervals broader is the rationale completely wrong, did I make a mistake or is it simply impossible what I am doing? I know there are other ways obtaining the intervals but using fishers Z transformation would make it rather “elegant”.
Thank you in advance for your time!
I have two species of birds, for each species I measured 4 replicates of an enzyme for each individual. When I ran the tests for normality it came out that my data are not normally distributed so I cannot use the repeated measures ANOVA to test for differences between species. Transformations didn't work for my data. Which non parametric test should I use?
The data collected is on time to appearance of different developmental stages, of fish eggs in concentrations of toxicant. Five concentrations of toxicant and eight egg developmental stages were monitored (total of 40 cases). The data was ranked with the following results: the distribution of 15 variables were normally distributed (Shapiro-Wilk test) both before and after ranking using SPSS version 25 (37.5%). Twenty-one variables became normally distributed (52.5%), while two cases were not affected. I read somewhere that data normalized by ranking can be analyzed using parametric methods. Does this apply here? Thanks.
Effects size for repeated measures of not normally distributed data
The data I collected for my research yielded a non-normal distribution.
I aim to test a hypothetical model using SEM, and AMOS is said to be better for confirmatory research. However, I don't want an inflated model (since the data are not normally distributed).
Accordingly, I have the following questions:
1. Is SmartPLS a good fit for conducting SEM and path analyses, and is that more accurate than Amos for the data that are not normally distributed?
2. Moreover, is it better to use VB-SEM?
I asked the second question because VB-SEM is said to be more flexible regarding non-normality.
I sincerely thank the researchers who will answer these questions.
Recently I am trying to reproduce results from this paper Janich, P., Toufighi, K., Solanas, G., Luis, N. M., Minkwitz, S., Serrano, L., Lehner, B., & Benitah, S. A. (2013). Human Epidermal Stem Cell Function Is Regulated by Circadian Oscillations. Cell Stem Cell, 13(6), 745–753.
Here is the difficulty I met:
The author performed microassay to detect gene expression in mutiple overlapping time window.
Take time window 1 for example, let's say there are 100 genes, of which their expression are detected at 5h, 10h, 15h, 20h.
Then the author applied a quadratic regression model "expression = a(time.point)^2 + b(time.point) + c" to determine whether these genes change periodically within each time window (time.point can be 5, 10, 15, 20 in this example). If the coefficent "a" <0 and pvalue for the coefficient "a" < 0.05, this gene would be identified as "peak gene"; Otherwise, if the coefficent "a" >0 and pvalue for the coefficient "a" < 0.05, it would be labelled as "trough gene". But the problem is that, the author calculate the pvalue with two methods, the first one is based on t distribution i.e. pvalue = Pr(>|t|) and the R code would be:
summary(lm(formula = expression ~poly(time.point ,2, raw=T))))$coefficients [3,'Pr(>|t|)'])
the other way is based on normal distrubution i.e.
pvalue = pnorm(q=t.score, lower.tail=T)*2
(That means if |t.statistics| > 1.96, the pvalue is guaranteed to be < 0.05.)
The author chose the latter one as the final pvalue. But is it right to do so in this situation?
From what I learned, t.distribution should be better when the population standard deviation is unknown and the sample size is < 30 (for each regression model, there are only 4 observations) . Since different pvalue calculated by these two methods could greatly affect the final result and conclusion, could someone give me a detailed explanation? Any help would be appreciated!
so I am investigating the impact of sex on inspiratory muscle training as a recovery modality from long COVID.I have 30 participants (15 female) and I take 4 physiological readings and 12 psychosocial health readings these are taken at pre and post intervention. I was going to use a mixed measures anova but on testing for normality I found that all my pre values are normally distributed but all of my post values are not normally distributed. what test should I run?
I have a question regarding of ddCT method of qPCR.
Various threads have pointed out that all statistics should be done with dCT or ddCT because dCT follows a normal distribution.
I understand that dCT in single cells follows a normal distribution, since gene expression in single cells often follows a log-normal distribution, as shown in various papers (i.e, Bengtsson et al. Genome Research (2005)).
However, if mRNA is extracted from a large number of cells (e.g., more than 1.0x10^6cells), then the mRNA solution is the sum of mRNA extracted from a cell population whose gene expression follows a log-normal distribution, what happens in this case?
Does this mean that the amount of mRNA of interest between samples follows a normal distribution, according to the central limit theorem?
In this case, can dCT follow a normal distribution?
I would appreciate your response.
Majority of our questions were in Likert scale (from 5 very frequent to 1 never), and we use a pretest-posttest methodology. To compare the pretest and the posttest, we wanted to use paired sample t-test. However, this is a parametric test wherein the data should be normally distributed.
I have also read in the work of Norman, G. (2010) that parametric statistics can still be used with Likert data even with non-normal distributions.
What would be the best option here? Should we proceed in using the paired sample t-test, or go for Wilcoxon tests since the data is not normal? Thank you for answering in advance.
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in health sciences education, 15(5), 625-632.
I'm trying to figure out what this means:
"the distribution of score for all malignant cells was fitted to a normal distribution and a threshold of p < 0.001 was used for distinguishing cycling cells."
I fitted my data to normalize distribution using R
data<- fitdist(data, "norm")
However, I have no idea what p <0.001 means.
According to the paper's figure, the score should be around 0.8
However, p <0.001 would be like score of 3~4.
I don't get it.
I am aware that a high degree of normality in the data is desirable when maximum likelihood (ML) is chosen as the extraction method in EFA and that the constraint of normality is less important if principal axis factoring (PAF) is used as the method of extraction.
However, we have a couple of items in which the data are highly skewed to the left (i.e., there are very few responses at the low end of the response continuum). Does that put the validity of our EFAs at risk even if we use PAF?
This is a salient issue in some current research I'm involved in because the two items are among a very small number of items that we would like, if possible, to load on one of our anticipated factors.