Science topic

# Advanced Statistics - Science topic

Explore the latest questions and answers in Advanced Statistics, and find Advanced Statistics experts.

Questions related to Advanced Statistics

I'm doing a germination assay of 6 Arabidopsis mutants under 3 different ABA concentrations in solid medium. I've 4 batches. Each batch has 2 plates for each mutant, 3 for the wild type, and each plate contains 8-13 seeds. Some seeds and plates are lost to contamination. So I don't have the same sample size for each mutant in each batch. In same cases the mutant is no longer present in the batch. I've recorded the germination rate per mutant after a week and expressed it as percentage. I'm using R. How can I analyse them best to test if the mutations affect the germination rate in presence of ABA?

I've two main questions:

1. Do I consider each seed as a biological replica with categorical type of result (germinated/not-germinated) or each plate with a numerical result (% germination)?

2. I compare treatments within the genotype. Should I compare mutant against wild type within the treatment, the treatment against itself within mutant, or both?

I'm trying to construct a model for binary logistics. The first model includes 4 variable of predictor and the intercept is not statistically significant. Meanwhile, in the second model, I exclude one variable from the first model and the intercept is significant.

The consideration that I take here is that:

The pseudo R² of the first model is better at explaining the model rather than the second model.

Any suggestion which model should I use?

Could you please elaborate on the specific differences between scale development and index development (based on formative measurement) in the context of management research? Is it essential to use only the pre-defined or pre-tested scales to develop an index, such as brand equity index, brand relationship quality index? Suggest some relevant references.

**I am currently doing a research proposal for my thesis and I wanted to know is it possible to use two different econometric methods to carry on with the findings?**

Dear all,

I have a question about a mediation hypothesis interpretation.

We have a model in which the direct effect of X on Y is significant, and its standardized estimate is greater than the indirect effect estimate (X -> M -> Y), which is significant too.

As far as I can understand, it should be a partial mediation, but should the indirect effect estimate be larger than the direct effect estimate to assess a partial mediation effect?

Or is the significance of the indirect effect sufficient to assess the mediation?

THanks in advance,

Marco

300 Participants in my study viewed 66 different moral photos and had to make a binary choice (yes/no) in response to each. There were 3 moral photo categories (22 positive images, 22 neutral images and 22 negative images). I am running a multilevel logistic regression (we manipulated two other aspects about the images) and have found unnaturally high odd ratios (see below). We have no missing values. Could anyone please help me understand what the below might mean? I understand I need to approach with extreme caution so any advice would be highly appreciated.

Yes choice: morally negative compared morally positive (

*OR*=441.11; 95% CI [271.07,717.81];*p*<.001)Yes choice: morally neutral compared to morally positive (

*OR*=0.94; 95% CI [0.47,1.87];*p*=0.86)It should be noted that when I plot the data, very very few participants chose yes in response to the neutral and positive images. Almost all yes responses were given in response to the negative images.

Hi Folks,

I am working on a meta-analysis and I am trying to convert data into effect sizes (Cohen's

*d*) to provide a robust synthesis of the evidence. All the studies used a one-group pre-post design and the outcome variables were assessed before and after the participation in an intervention.Although the majority of the studies included in this meta-analysis reported either the effect sizes (Cohen's

*d*) or the mean changes, a few of them reported the median changes. I am wondering if there is a way to calculate the effect sizes of these median changes.For example, the values reported in one paper are:

Pre Median (IQR) = 280.5 (254.5 - 312.5)

Post Median (IQR) = 291.0 (263.5 - 321.0)

Is there any way I can convert these values into Cohen's

*d*?Thank you very much for your help.

Hi All, I was wondering what statistical test do I use for this example. Comparing participants' ratings of a person's (1) competence and (2) employability, based on the person's (1) level of education and (2) gender.

So there are two IVs:

(1) The person's level of Education [3 levels].

(2) The person's Gender [2 genders].

So there is a total of 6 conditions presented to the participants [ 3 levels of education x 2 genders]. However, each participant is only presented with 4 conditions; meaning, there is a mixture of between-participants and within-participants used in the study.

There are two DVs:

(1) Participants' rating of the person's Competence.

(2) Participants' rating of the person's Employability.

I was thinking the statistical test would be MANOVA, but want to confirm.

Also, if the participants used in the study are a mixture of between-participants, and within-participants, how can MANOVA work in this case?

Any advice or insight on the above would be really appreciated. Thank you.

I am using an ARDL model however I am having some difficulties interpreting the results. I found out that there is a cointegration in the long run. I provided pictures below.

I have long-term rainfall data and have calculated Mann-Kendall test statistics using the XLSTAT trial version ( addon in MS word). There is an option for asymptotic and continuity correction in XLSTAT drop-down menu.

- What does the term
**"Asymptotic"**and**"continuity correction"**mean? - When and under what circumstances should we apply it?
- Is there any assumption on time series before applying it?
- What are the advantages and limitations of these two processes?

In confimatory factor analysis (CFA) in Stata, the first observed variable is constrained by default (beta coefficient =1, mean of latent variable =constant).

I don't know what is it! Because, other software packages report beta coefficients of all observed variables.

So, I have two questions.

1- Which variable should be constrained in confirmatory factor analysis in stata?

2- Is it possible to have a model without a constrained variable like other software packages?

I am working on two SNPs on the same gene, and I tested some biochemical parameters for 150 patients with hypothyroidism. I want to see if a certain haplotype has an impact on these biochemical parameters. How can I statistically calculate the haplotypes and their association with these parameters?

Do Serial correlation, auto-correlation & Seasonality mean the same thing? or Are they different terms? If so what are the exact differences with respect to the field of statistical Hydrology? What are the different statistical tests to determine(quantity) the serial correlation, autocorrelation & seasonality of a time series?

I want to draw a graph between predicted probabilities vs observed probabilities. For predicted probabilities I use this “R” code (see below). Is this code ok or not ?.

Could any tell me, how can I get the observed probabilities and draw a graph between predicted and observed probability.

analysis10<-glm(Response~ Strain + Temp + Time + Conc.Log10

+ Strain:Conc.Log1+ Temp:Time

,family=binomial(link=logit),data=df)

predicted_probs = data.frame(probs = predict(analysis10, type="response"))

I have attached that data file

Four homogeneity tests, namely the Standard Normal Homogeneity Test(SNHT), Buishand Range(BR) and Pettitt test and Von-Neumann Ratio test (VNR) are applied for finding the break-point. Out of which SNHT, BR and Pettitt give the timestamp at which the break occurs whereas VNR measures the amount of inhomogeneity. Multiple papers have made the claim that "

*SNHT finds the break point at the beginning and end of the series whereas BR & Pettitt test finds the break point at the middle of the series."*Is there any mathematical proof behind that claim ? Is there any peer-reviewed work (Journal article) which has proved the claim or is there any paper which has crosschecked the claim ?

Let me say that I have a 100 years data, then

*means whether it is the first 10 years or first 15 years or first 20 years? How to come to a conclusion ?***start of the time series**Hi

I have a huge dataset for which I'd like to assess the independence of two categorical variables (x,y) given a third categorical variable (z).

My assumption: I have to do the independence tests per each unique "z" and even if one of these experiments shows the rejection of null hypothesis (independence), it would be rejected for the whole data.

Results: I have done Chi-Sq, Chi with Yates correction, Monte Carlo and Fisher.

- Chi-Sq is not a good method for my data due to sparse contingency table

- Yates and Monte carlo show the rejection of null hypothesis

- For Fisher, all the p values are equal to 1

1) I would like to know if there is something I'm missing or not.

2) I have already discarded the "z"s that have DOF = 0. If I keep them how could I interpret the independence?

3) Why do Fisher result in pval=1 all the time?

4) Any suggestion?

#### Apply Fisher exact test

fish = fisher.test(cont_table,workspace = 6e8,simulate.p.value=T)

#### Apply Chi^2 method

chi_cor = chisq.test(cont_table,correct=T); ### Yates correction of the Chi^2

chi = chisq.test(cont_table,correct=F);

chi_monte = chisq.test(cont_table,simulate.p.value=T, B=3000);

- In non-parametric statistics, the
**Theil–Sen estimator**is a method for robustly fitting a line to sample points in the plane (simple linear regression) by choosing the median of the slopes of all lines through pairs of points.*Many journals have applied Sen slope to find the magnitude and direction of the trend* - It has also been called
**Sen's slope estimator**,**slope selection**, the**single median method**, the**Kendall robust line-fit method**,[6] and the**Kendall–Theil robust line**. - The major advantage of Thiel-Sen slope is that the estimator can be computed efficiently, and is insensitive to outliers. It can be significantly more accurate than non-robust simple linear regression (least squares) for skewed and heteroskedastic data, and competes well against least squares even for normally distributed data in terms of statistical power.

*My question is are there any disadvantages/shortcomings of Sen's Slope? Are there any assumptions on the time series before applying it.? Is there any improved version of this method? Since the method was discovered in 1968, does there exist any literature where the power of the Sen slope is compared with other non-parametric? What inference can be made by applying Sen slope to a hydrologic time series explicitly? What about the performance of the Sen slope when applied on an autocorrelated time series like rainfall and temperature?*My question concerns the problem of calculating odds ration in logistic regression analysis when the input variables are from different scales (i.e.: 0.01-0.1, 0-1, 0-1000). Although the coefficients of the logistic regression looks fine, the odds ratio values are, in some cases, enormous (see example below).

In the example there were no outlier values in each input variables.

What is general rule, should we normalize all input variables before analysis to obtain reliable OR values?

Sincerely

Mateusz Soliński

Dear RG members, how can I find R packages and lists specific to health and medical research? Furthermore, could you give me online sources or guidelines on easy to study statistical analysis using R for medical research and its data visualization?

I need to run artanova and tukey-hsd for the interactions among the treatments, but my dataset has few NAs due to experimental errors.

When I run :

anova(model<- art(X ~ Y, data = d.f))

I get the warning :

Error in (function (object) :

Aligned Rank Transform cannot be performed when fixed effects have missing data (NAs).

Manually lifting is not an option because each row is a sample and it would keep NAs, simply in wrong samples.

Hello,

I am hoping that someone who is well versed in statistics can help me with my analysis and design. I am investigating the torque produced via stimulation from different quadriceps muscles. I have two groups (INJ & CON), three muscles (VM, RF, VL), three timepoints (Pre, Post, 48H) in which torque is measured at two different frequencies (20 & 80 Hz). In addition to the torque, we also want to look at the relative change from baseline for immediately Post and 48H in order to remove some of the baseline variability between muscles or subjects. A ratio of 1.0 indicates same torque values post and Pre. This is a complex design so I have a few questions.

If I wanted to use repeated measures ANOVA, I have to first for normality. When I run the normality test on the raw data in SPSS, I have one condition that fails and others that are close (p < 0.1). When I run the ratios I also have a condition that fails normality. Does this mean now that I have to do a non-parametric test for each? If so, which one? I am having a difficult time finding a non-parametric test that can account for all my independent variables. Friedman's is repeated measures but it is not going to be able to account for group/frequency/muscle differences like an ANOVA would.

Is repeated measures ANOVA robust enough to account for this? If so, should I set this up as a four-way repeated measures ANOVA? It seems like I am really increasing my risk of type I error. It could be separated it by frequency (20 and 80 Hz) because it's established a higher frequency produces higher torque but as you can tell I have a lot of uncertainties in the design. I apologize if I am leaving out vital information in order to get answers. Please let me know and I can elaborate further.

Thank you,

Chris

I have 18 rainfall time series. On calculating the variance, it was found there was an appreciable change in the value of variance from one rainfall station to other. Parametric statistical tests are sensitive to Variance, does it mean we need to apply robust statistical tests instead of the parametric test?

I carried out Kruskal Wallis H test in SPSS to do a pair wise comparison of three groups. I got some positive and negative values in Test statistics and Std. Test Statistics columns. I can conclude the results based on p-value but I don't know what the values indicates in Test Statistics and Std. Test Statics column and why some values are positive and why some are negative. Need some explanation please. Thanks in advance.

Hi -

I am looking for a way to quantify annual temporal variation in the intensity of space-use (per pixel across a reserve) into a single value. I was originally looking into using the coefficient of variation - however the CV does not appropriately quantify the intensity of utilization. For example, areas with high constant utilization and low constant utilization will both have a value of 0.

I will have yearly intensity of utilization values for each pixel with a park, where 5 is the highest possible utilization and 0 is no utilization. So for example:

2017 2018 2019

pixel 1 5 5 4

pixel 2 3 1 5

pixel 3 1 0 1

I'm looking for one single value per pixel, that can quantify the temporal variation whilst still accounting for the total intensity of utilization. Is there something similar to the CV that will also be able to account for the magnitude of utilization per pixel?

Thanks in advance for your help,

Emma

I am currently performing undergraduate research in forensics and I am comparing two types of width measurements (the widths of land and groove impressions on fired bullets), one taken by an automated system and the other performed by my associate manually using a comparison microscope. We are trying to see if the automated method is a more suitable replacement for the manual method. We were recommended to perform a simple linear regression (ordinary least squares) however when it comes to actually interpreting the results we had some slight trouble.

According to pg 218 of Howard Seltmann's experimental design and analysis, "sometimes it is reasonable to choose a different null hypothesis for β1. For example, if x is some gold standard for a particular measurement, i.e., a best-quality measurement often involving great expense, and y is some cheaper substitute, then the obvious null hypothesis is β1 = 1 with alternative β1 ≠ 1. For example, if x is percent body fat measured using the cumbersome whole body immersion method, and Y is percent body fat measured using a formula based on a couple of skin fold thickness measurements, then we expect either a slope of 1, indicating equivalence of measurements (on average) or we expect a different slope". In comparison to normal linear regression where β1 = 0 is usually tested, I was just wondering how you actually test the hypothesis proposed by Seltmann: do we test it the same way you would test the hypotheses of a normal linear regression (finding T test values, p values, etc)? Or is there a different approach?

I am also open to suggestions as to what other tests could be performed

A quick thank you in advance for those who take the time to help!

Reference:

Other than R, which other software/app can I easily obtain volcano plots related to gene information.

*Which is the correct order in Data-processing of rainfall time series-***Homogeneity test followed outlier detection & treatment**(OR)**Outlier detection & treatment followed by Homogeneity test?**- I have monthly rainfall data for 113 years. I am planning to run four homogeneity test- Buishand range test (BRT), Standard normal homogeneity test(SNHT), Von-Neumann Ratio (VNR) and Pettitt.
**Which is the appropriate method for identifying outliers in a non-normal distribution ?**- Should the descriptive statistics(DS) and Exploratory data analysis (EDA) should be conducted before (or) after treating the outlier? (or) a comparison should be made in the EDA & DS before and after treating the outlier

Conventionally four homogeneity tests, namely the Standard Normal Homogeneity Test(SNHT), Buishand Range(BR) and Pettitt test and Von-Neumann Ratio test (VNR) are applied for finding the break-point. Out of which SNHT, BR and Pettitt give the timestamp at which the break occurs whereas VNR measures the amount of inhomogeneity. SNHT finds the break point at the beginning and end of the series whereas BR & Pettitt test finds the break point at the middle of the series. How to come to a common conclusion (Break point of the time series) if all three test gives different timestamps?

I want to check the Homogeneity of a rainfall time series. I want to apply the following techniques. Is there any

**R package available in CRAN for running the following test?**- The von Neumann Test
- Cumulative Deviations Test
- Bayesian Test
- Dunnett Test
- Bartlett Test
- Hartley Test
- Link-Wallace Test
- Tukey Test for Multiple Comparisons

In XLSTAT software, four homogeneity tests are present, is there any other software where all the homogeneity tests are present?

I want to do the trend analysis of the dataset of temepratutre and precipitation which has several continuous up and down (See the fugure). For Examaple, several (More than 10) breaks can be seen in my data sets. Is it advisable to to do piecewise linear regression analysis in such cases. To overcome the limitation of such parametric analysis, I have done the nonparametric trend analysis like Mann-Kendell.

I want to develop a Hybrid SARIMA-GARCH for forecasting monthly rainfall data. The 100% of data is split into 80% for training and 20% for testing the data. I initially fit a SARIMA model for rainfall and found the residual of the SARIMA model is heteroscedastic in nature. To capture the information left in the SARIMA residual, GARCH is applied to model the residual part. The model order (p=1,q=1) of GARCH is applied. But when the data is forecasted I am getting constant value. I tried applying different model orders for GARCH, still, I am getting a constant value. I have attached my code, kindly help me resolve it? Where have I made mistake in coding? or is some other CRAN package has to be used?

library(“tseries”)

library(“forecast”)

library(“fgarch”)

setwd("C:/Users/Desktop")

**# Setting of the work directory**data<-read.table("data.txt")

**# Importing data**datats<-ts(data,frequency=12,start=c(1982,4))

**# Converting data set into time series**plot.ts(datats)

**# Plot of the data set**adf.test(datats)

**# Test for stationarity**diffdatats<-diff(datats,differences=1)

**# Differencing the series**datatsacf<-acf(datats,lag.max=12)

**# Obtaining the ACF plot**datapacf<-pacf(datats,lag.max=12)

**# Obtaining the PACF plot**auto.arima(diffdatats)

**# Finding the order of ARIMA model**datatsarima<-arima(diffdatats,order=c(1,0,1),include.mean=TRUE)

**# Fitting of ARIMA**modelforearimadatats<-forecast.Arima(datatsarima,h=12)

**# Forecasting using ARIMA model**plot.forecast(forearimadatats)

**# Plot of the forecast**residualarima<-resid(datatsarima)

**# Obtaining residuals**archTest(residualarima,lag=12)

**# Test for heteroscedascity****# Fitting of ARIMA-GARCH model**

garchdatats<-garchFit(formula = ~ arma(2)+garch(1, 1), data = datats, cond.dist = c("norm"), include.mean = TRUE, include.delta = NULL, include.skew = NULL, include.shape = NULL, leverage = NULL, trace = TRUE,algorithm = c("nlminb"))

**# Forecasting using ARIMA-GARCH model**

forecastgarch<-predict(garchdatats, n.ahead = 12, trace = FALSE, mse = c("uncond"), plot=FALSE, nx=NULL, crit_val=NULL, conf=NULL)

plot.ts(forecastgarch)

**# Plot of the forecast**While running a sysGMM using xtabond2 in STATA, I came across the following error. The error is strange to me because this was not the first time to run symGMM using xtabond2. The screenshot is attached.

I tried using

**mata: mata set matafavor space, perm.**to set space over speed, I keep getting the same error repeateadly.Thanks as I await your response

Hello Stata users. Please help.

When running Cronbach's Alpha test for internal consistency...

I have some missing values in the data set coded as 999.

Are they included in calculations or dismissed by Stata software as default?

Using other words do I have to mark some option in Stata before running Cronbach's alpha calculations so the software would dismiss missing values?

Could anybody clarify? Many thanks in advance.

Hello, In one of the projects, I conducted a questionnaire for the skills of students before the project (PRE survey), and after the completion of the project, I conducted a post-project survey.

I calculated the results and processes of the questionnaire (the percentage of the level of each skill increase)

But I have no experience in making an interpretation of these results.

If you can help me or provide me with publications in this area.

Thank you.

Grubbs's test and Dixon's test are widely applied in the field of Hydrology to detect outliers, but the drawback of these statistical tests is that it needs the dataset to be approximately normally distributed? I have rainfall data for 113 years and the dataset is non-normally distributed. What are the statistical tests for finding outliers in non-normally distributed datasets & what values should we replace in the place of Outliers?

Dear colleagues, I've tried to construct some recurrent neural network with using learning sample size 25 and I would like to get 178 columns in output as a result (there are 25 columns and 178 linesin the learning sample),but I can use predict ony for a single item :

pred<-predict(fit,inputs[-train[106,]]),so I need to change the numbers in train to get a column with forecast.

sslog<-as.ts(read.csv("k.csv"))

mi<-sslog

shift <- 25

S <- c()

for (i in 1:(length(mi)-shift+1))

{

s <- mi[i:(i+shift-1)]

S <- rbind(S,s)

}

train<-S

y<-as.data.frame(S, row.names=FALSE)

x1<-Lag(y,k=1)

x2<-Lag(y,k=2)

x3<-Lag(y,k=3)

x4<-Lag(y,k=4)

x5<-Lag(y,k=5)

x6<-Lag(y,k=6)

x7<-Lag(y,k=7)

x8<-Lag(y,k=8)

x9<-Lag(y,k=9)

x10<-Lag(y,k=10)

x11<-Lag(y,k=11)

x12<-Lag(y,k=12)

slog<-cbind(y,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12)

slog<-slog[-(1:12),]

inputs<-slog[,2:13]

outputs<-slog[,1]

fit<-elman(inputs[train[106,]],

outputs[train[106,]],

size=c(3,2),

learnFuncParams=c(0.2),

maxit=40)

#plotIterativeError(fit)

y<-as.vector(outputs[-train[106,]])

#plot(y,type="l")

pred<-predict(fit,inputs[-train[106,]])

a<-pred

print (a)

df <- data.frame(a)

Could you tell me please,how is possible to construct data frame and to get the 178 in output as a result,not only single column?

Thanks a lot for your help

I have Monthly rainfall data from 1901-2013 for 29 stations covering the entire state of Kerala. I took the first 80% of the data for training the model and the rest 20% for validating it. I developed SARIMA monthly model for forecasting rainfall. The reviewer has asked

**What is the scientific basis for forecasting rainfall over a point location (station) over a longer time scale (a Month)?**What was the reviewer trying to convey by this question?Hello everyone,

Could you recommend papers, books or websites about mathematical foundations of artificial intelligence?

Thank you for your attention and valuable support.

Regards,

Cecilia-Irene Loeza-Mejía

Hi all and happy new year; please let me know about my question. recently I have conducted a 2 (Groups) X 4 (Condition) between-within repeated measure ANOVA on my data. the results indicated the main effect of the Group and also the main effect of Condition was significant and the interaction was not. However, when I took a look at the post hoc LSD for interaction, the condition was significant in one group but was not significant in another group. Please let me know; why when the interaction isn't significant, the post hoc LSD for each group, indicates different comparisons? which one I should report? the main effect of conditions for two groups? or the comparisons of each group separately?

Context:

A systematic review with data synthesis is being conducted. The data synthesis aims to assess if there are correlations between changes in two pair of variables (e.g., disability and pain).

From each included study, only mean and SD (no more data available) have been extracted in different follow-up periods (1-month, 3-months, 6-months, 12-months).

Doubt/question:

I would like to run a correlation analysis of two pairs of Hedges'g effect size values to assess a correlation between variables' changes. What I would like to know is if it is possible to run spearman correlation weighted by sample size and number of follow-ups between Hedges'g values. And if yes, how it could be reported or justified.

I have already calculated repeated Hedges'g values for the same group in different follow-up times (1-month, 3-month, 6-month, 12-month) in different variables (i.e., variable1 and variable2).

An example of the Hedges'g matrix is:

----------------------- Var1 Var2

g1 (3-months) -0.5 -0.9 ---> n=12

g1 (6-months) -0.4 -1.6 ---> n=12

g2 (3-months) +0.1 +0.0 --> n=12

g3 (1-months) -0.7 -0.3 ---> n=40

g3 (3-months) -0.6 -0.3 ---> n=40

g3 (6-months) -0.8 -0.4 ---> n=40

g3 (12-month) -1.0 -0.5 ---> n=40

g4 (1-months) -0.7 -0.2 ---> n=40

g4 (3-months) -0.5 -0.3 ---> n=40

g4 (6-months) -0.4 -0.3 ---> n=40

g4 (12-month) -0.6 -0.2 ---> n=40

- If I run spearman correlation only with Hedges'g values (without taking into account sample size), it would be wrong because sample size is not taking into account.

- If I run spearman correlation creating n cases (12 with the first pair of values, 12 with the second pair of values, 12 with the third pair of values, 40 with the fourth pair of values, 40 with the fifht pair of values, etc), it would be wrong because for the same sample/group I am creating more cases than participants.

- Then, my question is if I weight the Hedges'g values according sample size and number of measurements (6 cases for the first pair, 6 cases for the second pair, 12 for the third pair, 10 for the fourth pair, 10 for the fifth pair, etc.) and then calculate spearman correlation, would it be correct?

Thanks in advance for your time.

I build an predictive machine learning model that generate the probability to default over the next 2 years for all the companies in a specific country. I used for training the algorithms financial data for all these companies and also the NACE codes (domains of activity) and I'm wondering if I will develop a better model if I somehow segment the population of B2B in segments and run distinct models on these segments.

Hope you can advice!

Lots of thanks in advance!

What is the method to compare the performances on two different cognitive tests (that measure different cognitive functions) of the same or different group(s)?

As two cognitive tests are inherently different from each other and many a times, have different parameters.

It will be helpful if anyone can direct me to some useful references.

Thank you

I'm using Enrich Marital satisfaction scale (15 items) in my thesis. along with other variable scales.

Please guide me regarding correlation and regression analysis. Out of Raw scores of EMS and IDS and PCT scores of both subscales, which scores will be taken in both analyses? and also as we are using percentile scores do I have to calculate the percentile score for other variables? or i will consider the raw scores?

Hi everyone! I have a statistical problem that is puzzling me. I have a very nested paradigm and I don't know exactly what analysis to employ to test my hypothesis. Here's the situation.

I have three experiments differing in one slight change (

**Exp 1, Exp 2, and Exp 3**). Each subject could only participate in one experiment. Each experiment involves 3 lists of within-subjects trials (**List A, B, and C**), namely, the participants assigned to Exp 1 were presented with all the three lists. Subsequently, each list presented three subsets of within-subjects trials (let's call these subsets**LEVEL**, being**I, II, and III**).The dependent variable is the response time (RT) and, strangely enough, is normally distributed (Kolmogorov–Smirnov test's

*p*= .26).My hypothesis is that no matter the experiment and the list, the effect of this last within-subjects variable (i.e.,

**LEVEL**) is significant. In the terms of the attached image, the effect of the**LEVEL**(I-II-III) is significant net of the effect of the Experiment and Lists.*Crucial info*:

- the trials are made of the exact same stimuli with just a subtle variation among the LEVELS I, II, and III; therefore, they are comparable in terms of length, quality, and every other aspect.

- the lists are made to avoid that the same subject could be presented with the same trial in two different forms.

The main problem is that it is not clear to me how to conceptualize the LIST variable, in that it is on the one hand a between-subjects variable (different subjects are presented with different lists), but on the other hand, it is a within-subject variable, in that subjects from different experiments are presented with the same list.

For the moment, here's the solutions I've tried:

1 - Generalized Linear Mixed Model (GLMM). EXP, LIST, and LEVEL as fixed effect; and participants as a random effect. In this case, the problem is that the estimated covariance matrix of the random effects (G matrix) is not positive definite. I hypothesize that this happens because the GLMM model expects every subject to go through all the experiments and lists to be effective. Unfortunately, this is not the case, due to the nested design.

2 – Generalized Linear Model (GLM). Same family of model, but without the random effect of the participants’ variability. In this case, the analysis runs smoothly, but I have some doubts on the interpretation of the

*p*values of the fixed effects, which appear to be massively skewed: EXP*p*= 1, LIST*p*= 1, LEVEL*p*< .0001. I’m a newbie in these models, so I don’t know whether this could be a normal circumstance. Is that the case?3 – Three-way mixed ANOVA with EXP and LIST as between-subjects factors, and LEVEL as the within-subjects variable with three levels (I, II, and III). Also in this case, the analysis runs smoothly. Nevertheless, together with a good effect of the LEVEL variable (

*F*= 15.07,*p*< .001, η^{2}= .04), I also found an effect of the LIST (*F*= 3.87,*p*= .022, η^{2}= .02) and no interaction LEVEL x LIST (*p*= .17).The result seems satisfying to me, but is this analysis solid enough to claim that the effect of the LEVEL is by no means affected by the effect of the LIST?

Ideally, I would have preferred a covariation perspective (such as ANCOVA or MANCOVA), in which the test allows an assessment of the main effect of the between-subjects variables net of the effects of the covariates. Nevertheless, in my case the classic (M)ANCOVA variables pattern is reversed: “my covariates” are categorical and between-subjects (i.e., EXP and LIST), so I cannot use them as covariates; and my factor is in fact a within-subject one.

To sum up, my final questions are:

- Is the three-way mixed ANOVA good enough to claim what I need to claim?

- Is there a way to use categorical between-subjects variables as “covariates”? Perhaps moderation analysis with a not-significant role of the moderator(s)?

- do you propose any other better ways to analyze this paradigm?

I hope I have been clear enough, but I remain at your total disposal for any clarification.

Best,

Alessandro

P.S.: I've run a nested repeated measures ANOVA, wherein LIST is nested within EXP and LEVEL remain as the within-subjects variable. The results are similar, but the between-subjects nested effect LIST within EXP is significant (

*p*= .007 η^{2}= .06). Yet, the question on whether I can claim what I need to claim remains.Hi,

I have 2 categorical and 1 continuous predictors (3 predictors in total), and 1 continuous dependent variable. The 2 categorical variables have 3 and 2 levels, respectively, and I have only dummy coded the variable with 3 levels, but directly assigned 0 and 1 to the variable with only 2 levels (my understanding is that if the categorical variable has only 2 levels, dummy coding is not necessary?).

In this case, how do I do and interpret the assumption tests of multicollinearity, linearity and homoscedasticity for multiple linear regression in SPSS?

Thank you!

What is the method to compare the performances on two different cognitive tests (that measure different cognitive functions) of the same or different group(s)?

As two cognitive tests are inherently different from each other and many a times, have different parameters.

It will be helpful if anyone can direct me to some useful references.

Thank you.

I need to present some data in a form of a Radar graph and I don't want to use excel and I need another software.

What is the best way of making this kind of chart that can be accepted academically?

Thank you for your help and software

I have heard some academics argue that t-test can only be used for hypothesis testing. That it is too weak a tool to be used to analyse a specific objective when carrying out an academic research. For example, is t-test an appropriate analytical tool to determine the effect of credit on farm output?

Dear colleagues

If anyone may suggest me a code (matlab or r) for arima and garch model for the real data (not simulated).

Thank you so much

Hello Researchers,

I was working to check the validity of a mathematical equation. In doing so I have obtained a large data set values >50,000 experimentally. As the mathematical equation gives only a single value I was wondering is there any way to compare that data set to mathematical equation. Based on that comparison I am willing to assign a constant that is when operated (+,-,x,/) to mathematical equation results in values justifying the experimental data set. The constant could be more than one as the range of the data set is quite large compared to empirically obtained values through a mathematical equation.

I cannot find sources which gives a thorough explanation of PCA and how to assign the principal components 1 & 2 including their computations. Say for example, my study will explore polyphenol profiles of a certain plant from different geographical area and I will test their antioxidant activity too and analyze them using biochemometric approach. Which variables should be included in principal components? I will also integrate data from this PCA to construct my OPLS DA.

Suppossing:

Landing Page A with 1300 leads is achieving a 3% conversion rate
Landing Page B with 1500 leads is achieving a 10% conversion rate

First, I want to see if I have achieved enough samples (Leads or conversions) to have a statistically valid test.

Second, I want to confirm that conversion rate for Landing Page B is statistically significant, so better than the obtained for Landing Page A.

How do I determine that the sample of leads and conversions obtained are statistically representative? What would be the minimum sample to get the same success in the conversion ratio?

I am currently doing a PCA on microbial data. After running a Parallel Analysis to determine the number of factors to retain from the PCA, the answer is 12. Since my idea is to save the factor scores and use them as independent variables for a GLM together with other variables, I was wondering:

- Should I definitely save the factor scores of all 12 factors (which would become too many variables) or I can save only a few of them (e.g., the first 3 which together explain a 50% of the variance) for the GLM?
- If I can save a lower number, should I re-run the PCA retaining only that lower number (e.g. 3) or just use the factor scores already obtained when retaining the 12 ones?

Thank you all for your time and help!

Hi all,

My research project is based on Meta analysis. I have empirical mean, sample size and standard error calculated for two groups Group1 has 20 studies and Group 2 has 6 studies. I have already calculated pooled weighted mean, SE and CI for each groups. I would like to know how to calculate statistical significance between two groups with the empirical pooled weighted values?

Is it possible to calculate statistical significance based on confidence intervals between two groups? If yes, what type of statistical tests i need to perform to calculate P value for this.

Thanks much

Hi everybody,
I have a question about calculating the sample size with the software. There is various software such as GPower and NCSS. PASS in this area. which one is better? Can anyone guide me? For example; how can I work with NCSS. PASS software?
Thanks

Can you kindly suggest the best statistical test to compare the yield of a protein from a bacterial culture carried out in different pH? Is one-way ANOVA a suitable method?

Hello,

I am performing statistical analysis of my research data by comparing the mean values by using Tukey HSD test. I got homogeneous group in both small and capital alphabets. This is because of large number of treatments in my study. Is this type of homogeneous group is acceptable for publication in any journal?

Hi everyone.

I have a question about finding a cost function for a problem. I will ask the question in a simplified form first, then I will ask the main question. I'll be grateful if you could possibly help me with finding the answer to any or both of the questions.

1- What methods are there for finding the optimal weights for a cost function?

2- Suppose you want to find the optimal weights for a problem that you can't measure the output (e.g., death). In other words, you know the contributing factors to death but you don't know the weights and you don't know the output because you can't really test or simulate death. How can we find the optimal (or sub-optimal) weights of that cost function?

I know it's a strange question, but it has so many applications if you think of it.

Best wishes

Deal All,

I have two series of time series data that I would to correlate. One data set is the deposits, by month, for a list of different account. The other is the balances, by month, for the same list of accounts. In essence, I have two matrices that I want to understand correlation for without having to strip out each account separately. Furthermore, I want to cross-section that data into different segments.

This is being done with the goal of being able to forecast account balances in the futures, by looking at their usage behavior (assuming there is a lag relationship).

How do I build an intermediate matrix of the correlations? Is there a way to do it in Python or R-Studio? Is there a way to do it in excel?

Thanks

Ryan

Dear Researchers, Modellers, and Mathematicians,

As we know that in mathematics, computer science, and physics, a

**deterministic system**is a system in which no randomness is involved in the development of future states of the system. A deterministic model will thus always produce the same output from a given starting condition or initial state.**In this regard, I am looking forward to having examples from daily life events which are deterministic**. Thank you!Sincerely,

Aman Srivastava

I aim to allocate subjects to four different experimental groups by means of Permuted Block Randomization, in order to get equal group sizes.

This, according to Suresh (2011, J Hum Reprod Sci) can result in groups that are not comparable with respect to important covariates. In other words: there may be significant differences between treatments with respect to subject covaraites, e.g. age, gender, education.

I want to achieve comparable groups with respect to these covaraites. This is normally achieved with stratified randomization techniques, which itself seems to be a type of block randomization with blocks being not treatment groups, but the covariate-categories, e.g. low income and high income.

Is a combination of both approaches possible and practically feasible? If there are, e.g. 5 experimental groups and 3 covariates, each with 3 different categories, randomization that aims to achieve groups balanced wrt covariates and equal in size might be complicated.

Is it possible to perform Permuted Block Randomization to treatments for each "covariate-group", e.g. for low income, and high income groups separately, in order to achieve this goal?

Thanks in advance for answers and help.

Hi there,

in SPSS I can perform a PCA with my dataset, which does not show a positive definite correlation matrix, since I have more variables (45) than cases (n = 31).

The results seem quite interesting, however, since my correlation matrix and therefore all criteria for appropriateness (Anti-Image, MSA etc.) are not available, am I allowed to perform such an analysis?

Or are the results of the PCA automatically nonsense? I can identify a common theme in each of the loaded factors and its items.

Thanks and best Greetings from Aachen, Germany

Alexander Kwiatkowski

helllo

i have a simple control-experimental research design with pre-post exam and 12 person in each group.

What is the appropriate way to extract the effect size? (what is the right formula? cohen d or eta squire or omega squire or ....)

I performed a logistic regression to ascertain the effects of academics’ role at the institution, years of teaching, qualification, and type of HEIs on the likelihood that participants are ready to teach IR topics to accounting students. Below are my results.

Is it normal to have a significance of more than .05 in a Hosmer and Lemeshow test but having non-significant independent variables? How do you interpret such a scenario? My sample size is 50, looking at 4 independent variables (research suggests size of 10 for each independent variable).

I have 210 respondents who completed a 15-item true/false/I don't know questionnaire, what is the best way to analyse that data and determine each respondents final score out of 15?

SPSS Coding: 1 = true, 2 = false, 3 = I don't know.

Each item/question is a unique variable.

The original measure indicates correct responses are allocated a score of 1, incorrect and “I don’t know” a score of 0, for a maximum of 15/15.

I have the correct answers as both true/false and scored in the ways indicated above, however I'm not sure how to check each respondent against the correct answers.

Would I try to match the existing answers to the correct answers and determine a result from that? Is there a fast way of doing it that doesn't involve individually inputting each answer? I feel like I'm missing something ridiculously obvious.

Thanks in advance!