Science topic
Statistical Computing - Science topic
Explore the latest questions and answers in Statistical Computing, and find Statistical Computing experts.
Questions related to Statistical Computing
i need a Scientific Paper Research topic with hypothesis and statistical computation
Hey,
I want to calculate the standard deviation for each substituent for two molecules using Excel and then calculate the average of all the values (no S.D).
For S.D I used STDEV.P and for average I used AVERAGE. Is it the right way? Or should I use STDEV.S? or should I calculate range (large-small) instead of average?
I am conducting a study and had two people to vet the results for me. Person A used SPSS 25, while Person B and I used SPSS 28. We all got the same frequency table results (same central tendencies, range, standard deviation, case count), same ordinal regression results too, but when we ran bivariate correlation the SPSS 25 results were different. It gave significant results for 4 of the 6 variables, while the SPSS 28 gave insignificant results for 5 of 6. We double checked the data tables and variable types, and everything is identical. We had the data run in STATA and the correlation results were very close or matching to the SPSS 25 results. We went over the steps to check for variation and there's none. Does anyone have any idea what could cause this? Any recommendations on which results you would go with if we can't find a resolution to getting data to match?
I have six kinds of compounds which I then tested for antioxidant activity using the DDPH assay and also anticancer activity on five types of cell lines, so I got two types of data groups:
1. Antioxidant activity data
2. Anticancer activity (5 types of cancer cell line)
Each data consisted of 3 replications. Which correlation test is the most appropriate to determine whether there is a relationship between the two activities?
Dear RG experts,
The statistical study includes a number of tests, some of which are well-known, while others are controversial. For me, applying such dubious standards to real-world problems is a major issue. I learn all of these in order to solve real-life problems.
Please help me with the application of the run test.
What type of design was used and how was statistical computation and graphing achieved? Was a software package used?
i found that most equations, such as d' and A' were used for balanced design of AA, AB pairs.
I am sincerely writing to ask for help concerning method or equation for calculating sensitivity for unequal weight. Thanks a million.
Dear All:
I am wondering if someone have an R codes (R functions) to run the test
procedures described in the paper titled “ESTIMATION AND COMPARISON OF
LOGNORMAL PARAMETERS IN THE PRESENCE OF CENSORED DATA” by STAVROS POULOUKAS
2004, Journal of Statistical Computation & Simulation, Vol. 74, No. 3,
March 2004, pp. 157–169.
I can send a copy of the paper if necessary.
with many thanks
abou
Hello!
I have successfully developed and implemented ANFIS in R with the help of FRBS package. Just one thing that is remaining is to visualize the ANFIS network.
Currently due to some constraints because of COVID, I don't have any access to Matlab while working from home. So I was wondering if there is any way to implement it in R.
Which is the best book for understanding Social Sciences Statistical analysis tools?
Hello Friends!
I have been in search of best book for understanding and applying social sciences statistical analysis tools. I am new in this field please seniors recommend some best books on the topic.
Thanks
Prediction bounds are relatively important when an interval considering the error of the regression model and not only sampling error comparing to the confidence interval.
A linear mixed model was fitted using lmer function of lme4 package in R. Does anyone have idea of how to extract studentized conditional residual for individual data point?
The DDoS attach could detect Statistical based, Soft computing based, Knowledge-based and Data mining and machine learning-based methods. These methods proved that are efficient to detect attacks but lacking behind with automatic capabilities. Also, these DDoS attack detection methods are localized standalone systems that predict the DDoS attacks based on data traffic rather than detect it on the spot.
I have a list of phi, psi angles derived from a number of PDB files based on some criteria. I have plotted a scatter plot using matplotlib with them. But i want to show the favoured, allowed and generously allowed regions in different shades of color at the background. For better understanding, am providing the scatter plot (D_torsionAngle.png) i have already made and an example of the plot (1hmp.png) i wanted to do.


We used SPSS to conduct a mixed model linear analysis of our data. How do we report our findings in APA format? If you can direct us to a source that explains how to format our results, we would greatly appreciate it. Thank you.
It is known that the FPE gives the time evolution of the probability density function of the stochastic differential equation.
I could not see any reference that relates the PDF obtain by the FPE with trajectories of the SDE.
for instance, consider the solution of corresponding FPE of an SDE converges to pdf=\delta{x0} asymptotically in time.
does it mean that all the trajectories of the SDE will converge to x0 asymptotically in time?
I am doing research on colon cancer Want to compare two different treatments of survival in a retrospective data. I want to do propensity score matching to adjust for the bias. I am using inverse propensity score weight method. I calculated inverse propensity score weight from propensity score. I was able to get odds ratio by using this weighted variable in SPSS. But when i am trying to calculate KM curve I am getting the following error code "No statistics are computed because nonpositive or fractional case weights were found". Any suggestions of how to proceed?
Production Engineering is an Engineering area that deals with the problems of productive operations, with emphasis on the production of goods and services. Operational Research (OR) is defined as the area responsible for solving real problems, using decision-making situations, using mathematical models. The OR is an applied science focused on solving real problems that seeks to apply knowledge from other disciplines such as mathematics, statistics, computation to improve rationality in decision making processes.
Operational Research (OR), responsible for solving real problems, through mathematical and statistical models. How have we used OR in our Searches?
How to describe the process of analyzing statistical data carried out with the help of Business Intelligence in Big Data database systems?
How to describe the main models of statistical data analysis carried out with the help of computerized Business Intelligence tools used for the analysis of large data sets processed in the cloud and analyzed multi-criteria in Big Data database systems?
Please reply
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes

Big Data database systems can significantly facilitate the analytical processes of advanced processing and testing of large data sets for the needs of statistical surveys.
The current technological revolution, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies. All these advanced data processing and analysis technologies can significantly change and facilitate the analysis of large statistical datasets in the future.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
Will analytics based on data processing in Big Data database systems facilitate the analysis of statistical data?
Please reply
I invite you to discussion and scientific cooperation
Best wishes

Hello everyone :)
I am currently conducting a comprehensive meta-analysis on customer loyalty, with a huge amount of articles, that are using SEM to evaluate the strengths of the relationships between the different variables I am interested in (satisfaction, loyalty, trust…etc).
I saw, that for most of the meta-analysis, the effect size metric is r. But since all my articles of interest are using SEM, I just could report the Beta coefficients, t-values and p-values. Is it okay to use these kinds of metrics to conduct a meta-analysis?
I saw an article of Peterson (2005), explaining how to transform a beta into a r coefficient for the articles where the r is not available. This is a first start, but this is not giving me a comprehensive method for conducting a meta-analysis only with SEM articles (what metrics should I code? what are the statistics to compute?...etc).
My question is then: is it possible to conduct a meta-analysis with articles using SEM? If yes, do you have references explaining how to code the metrics and compute the statistics for the meta-analysis?
Thanks in advance for your help ! :)
Kathleen Desveaud
I have 40 observations and 9 items. When I try to run factor analysis on SPSS I get an error message. "There are fewer than two cases, at least one of the variables has zero variance, there is only one variable in the analysis, or correlation coefficients could not be computed for all pairs of variables. No further statistics will be computed."
Is the problem inadequate sample size?
If I increase sample size,will it be a solution?
Thanks
Can histon that is computed from histogram refer both to upper and lower approx. in rough entropy?
What does Sen's slope value indicates while performing mann-kendall trend test using xlstat? can anyone describe this value in the xlstat tutorial example?
I've gathered some binary data and my observations look like this:
trt1 trt2 trt3 trt4
p1 1 1 0 1
p2 1 1 1 1
p3 1 1 1 1
p4 1 1 1 1
p5 1 1 1 1
p6 1 1 1 1
p7 1 1 0 1
p8 1 1 0 0
p9 1 1 1 1
when I tried to calculate the Phi coefficient between these 4 columns using SPSS, I ran into a problem since the software would't calculate this coefficient for the first column, saying: "No statistics are computed because trt1 is a constant."
Can anyone help by suggesting another way to calculate some sort of correlation coefficient or by solving this?
I run test-retest reliability using unweighted kappa for my questionnaire and 2 of my items have the result " no statistics are computed because constant". How to interpret this clause and can i use the items in my questionnaire? Any article to support?
I run test-retest reliability using unweighted kappa for my questionnaires and 2 of my items have the result "no statistics are computed because constant". How do i interpret this clause and can i use the items on my questionnaire? Any article to support?
The MKT Z statistic was computed for annual mean of pollutant concentration to see whether the concentration was increasing or decreasing in a time period of 20 years. The linear regression of annual mean concentration was computed for the same data (ppb/year). The linear regression slope is negative and MKT Z values are large and negative. How do I write about the comparison between the two?
Thanks
From our statistical analyses with R and Genstat, the outputs for means are different for the same values and treatments analyzed. For instance means with Genstat A,B,C,D are 32.65,4.69,18.23 and 48.96 respectively while for the same variables, R analysis gives A,B,C,D as 34.17, 4.13, 18.23 and 42.68.what could be the reason for the disparity in the means?
Im trying to match gene ID from one data base with the GO ID in other data base, the length in the second data base is longer than the first one, why do I have this error? Error in 1:longitud[j] : NA/NaN argument
Here is the script
d1<-as.matrix(datos)
longitud1=numeric(0)
for(i in 1:length(datos$Datos1)){
longitud1[i]=length(which(datos1$Cod1==d1[i]))}
longitud=longitud1[-which(longitud==0)]
i=1
mat1=matrix(0,sum(longitud),2)
for(j in 1:length(longitud)){
for(k in 1:longitud[j]){
mat1[i,]=as.matrix(datos1[which(datos1$Cod1==d1[j]),][k,])
i=i+1
}}
I have generated a number of data sets that follow a specific distribution. They represent different layers of a certain system but they should be correlated with predetermined correlation coefficient without affecting their statistical distribution.
Hence, I am thinking of arranging every two subsequent data sets to impose their corresponding correlation parameter. Any ideas of how to do that, especially in R, will be appreciated.
Are there any statistical test (parametric or non-parametric) which can be applied to test the goodness of fit of a potential probability distribution (other than normal) estimated from auto-correlated data?
I need to compare average runtime execution of two different C++ algorithms. The question is: how many times do I need to repeat the experiment in order to calculate the average runtime for each algorithm? 10? 100 times? Can you provide me with a paper regarding this issue? Thanks in advance!
I taught an introductory stat class and one of the subtopics involved confidence intervals( CI). I decided to focus my examples on the meaning and applications of CI. For example, students were involved obtaining the confidence interval for the population mean from a random sample of data that they themselves collected. I discovered from students' work that interpreting the meaning of confidence interval was truly challenging to many. How have others approached this topic in introductory statistics?
.
Dear all, I would like run spatial autocorrelation analysis with my data in R (or other software such as Minitab, Past or Python). My data comprise 100 1m2 plots with control paired plots 1m far away treatment. In all plots I measured plant cover and I want to measure species co-ocorrence in each plot. All plots are georeferenced with lat and long in degree, minutes and seconds. I want know if had autocorrelation in my sampling. Can someone help me?
Best wishes,
Jhonny
Hi all,
I am working with the R package 'quint' to test for qualitative treatment-subgroup interactions (personalized medicine). Everytime I analyze my data there are some warnings which I cannot handle. All warnings are of the same sort. Actually, the main problem is that I do not understand what the warning means exactly:
Warning messages:
1: In computeD(tmat[kk, 1], tmat[kk, 2], tmat[kk, 3], tmat[kk, 4], :
value out of range in 'gammafn'
Is anybody well versed in this R package or is it probably an universal warning that you were confronted with in another context? I do not know what 'gammafn' could be and why the value is out of range.
I appreciate any comments and ideas!
Best,
Brian
Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals.
I need to perform an a priori power analysis for a
1) MANOVA with 1 IV with 3 levels and 4 DV's
2) MANOVA with 2 IV (1 with 2 levels, 1 with 3 levels) and 4 DV'sAsk
(only global effects, not interactions)
Can anyone help me calculate the sample size for a .5 effect and .05 p,?
Thank you!
I am looking for a script to calculate average wind direction. I am wondering if someone has it already.
My experimental design has two factors (the dosage of a drug x the gender of the animal subject). The subjects were fed with the drug for a long period of time, and we divided the experimental period into three intervals. I randomly choose some of my subjects for collecting data at the end of each interval.
However, it turns out that the gender of these subjects are not equal for each dosage treatment (because it was hard to tell the difference in gender based on appearance). Some treatments even have no replication (I picked 8 animals per dosage treatment but only one were female). I wonder how to do ANOVA on this.
I've searched a relevant website as below--
I'm not sure whether it suits my situation, and I'm figuring out how to make some modifications as I calculate my data.
Any advice and suggestions will be greatly appreciated.
Hi. I study on my master degree and I'm currently doing on my dissertation
I have some question. I think I will use a ordinal regression. However, I'm quite struggling on how to report this type of regression. I used ordinal data as a dependent variable. and the scale data as an independent variables. In the SPSS, I selected all independent variables as covariates and I quite not sure what exactly I should report. I saw many examples on the websites but most of them using ordinal variables as independent variables.
So anyone can help me or explain me on how to report this type of regression. or any textbook or journal that explain this
Thank you.
I'm getting an error : "Error in if (colnames(tm.class)[j] == "fixed") tm.final[i, j] = 0 :
missing value where TRUE/FALSE needed" all I tried to do was a simple snk.test(lm(values ~ factor1*factor2)), and the estimates function keeps returning this error. i'm not sure what tm. class is, but I have no idea why the column names seem to be NA for whatever the estimates function is testing
Dear SAS users,
I want to perform Barnard's exact test for a 2x2 table, which is described as an option in the EXACT statement of PROC FREQ. While I can use all other options for the EXACT statement, the option BARNARD seems to be unavailable.
I use SAS 9.3 for Windows. Is anybody aware of that problem?
I know about the controversies on the use of Barnard's test (or alternatives) as compared to Fisher's exact test and try to avoid a discussion on that issue here. I have to use Barnard's test for one project.
Thank you in advance!
Hi,
I want ask suggestion suitable software to run multinomial and mixed logit(statistical model) besides SPSS?
Thanks
I used GDP data to find structural break in the series - if there is any significant change in the pattern or series. I used 'breakpoints' command and found some breakpoints.
My question is how it works? Based on regression on itself?
I'm trying to run the additive macro (for additive hazards models) written by Alicia Howell and John Klein but it takes longer to run. I'm using SAS 9.3. I always break it after an hour without getting any output. I followed all the steps as per Alicia Howell's paper. It does not show any errors except that I don't get any output. I'm not sure whether to leave it running overnight.
I am working on prediction of hatchability in chickens using egg quality traits. I need statistical software applications for efficient data analysis. Please list out some good applications for genetic and breeding studies in chicken.
When I simulate the mc it shows a warning "Warning from spectre during Monte Carlo analysis `mc1'.
mc1: Attempt to run Monte Carlo analysis with process and mismatch variations, but no process variations were specified in statistics block.". How can one specifiy the statistics for monte carlo simulation?

Hi everybody,
I have a qustion about predict (raster package) with gam?
This is my script:
pred.data <- brick (sst, par)
gamCrus <- gam(logCrus~s(sst)+s(par), family=gaussian(), data=zoo.data)
predCrus <- predict(pred.data, gamCrus)
I have an error message:
> predCrus <- predict(pred.data, gamCrus)
Error in model.frame.default(ff, data = newdata, na.action = na.act) :
object is not a matrix
In addition: Warning messages:
1: In predict.gam(model, blockvals, ...) :
not all required variables have been supplied in newdata!
spatial analysis, spatial econometrics, spatial statistics, computer or statistics
I'm usually fitting curve on Matlab using "fminsearch" function, which is a really useful and powerful function.
As I'm currently more using R than Matlab... I wonder if that kind of function or script exist on R ?
That would be perfect if you can provide an example conduct on Matlab and on R.
Regards
I am trying to resolve a problem with count data. At the beginning, I fitted a poisson regression model. However, I got under dispersion in my model.
I tried to use a restricted generalized poisson regression model to go on. However, I got the problem with the SAS code. Can anyone propose a suitable SAS procedure in this case?
I have my own function. I need to generate its 10000 values which should be random like the rayleigh fading values generated using h=randn(N,1)+i*rand(N,1) like this line generate N random values similarly I need now similar values of my distribution. my function has input of three arguments whose values are different in one scenario . what I understand that I can take fix three values for one time and evaluate my function then get it one value. output values of my function is of complex values. whether I need to compute variance and average of my function values then loop over in between these values to get 10000 values .write in an array form .please correct me whether this is right way or what should I do.how do I write code in Matlab.?
Applications of different types / theories and best entropy method used in image processing.
A metric can be used to evaluate performance
I need to evaluate the security of a sequence.
Out of equilibrium thermodynamics study often changes in a structure with a long time scale. Could first order transitions give informations to out of equilibrium evolving structures? I mean: the time scale in a first order transition tends to zero (for example liquid solid transition), but the involved systems are often more simple. I would suggest scientists to study first order transitions as if they were out of equilibrium (during the duration of the transition).
Suppose I am looking at millions of data sets each with millions of data points and I need to capture details about each of those distributions with as much accuracy as possible. Histograms are a concise way to capture information about the distributions so that one can construct a CDF or calculate approximate quantiles at a later time from the stored histograms, and they can efficiently be calculated over many computers in parallel for large data sets.
What statistical methods best capture the information loss for a given set of histogram breakpoints for a given empirical distribution?
For example, suppose I have the data set 1,1,1,1,1,9,9,9,9,9. Histogram 1 uses breakpoints 0,5,10 and Histogram 2 uses breakpoints 0,2,4,6,8,10
So histogram 1 looks like :
[0,5] : 5
[5,10]: 5
Histogram 2 looks like:
[0,2]: 5
[2,4]: 0
[4,6]: 0
[6,8]: 0
[8,10]: 5
Clearly Histogram 1 has more information loss than histogram 2 since the bimodal nature of the underlying distribution is lost with the unfortunate breakpoints chosen in histogram 1 compared to the breakpoints in histogram 2 which show the bimodal nature of the underlying distribution.
Since I don't know if the underlying distribution is normal, I am currently using a worst case metric which essentially generates the worst possible distributions that could be represented by the same histogram and takes the Kolmogorov-Smirnoff statistic (or just the maximum distance apart of the two CDFs approximated from the histograms, as represented by the yellow boxes in the right most column of the attached plots).
Do any statistical software packages calculate KS or information loss metrics directly from histograms? Are there other methods besides KS which capture this information loss? I couldn't find anything for R on CRAN.

I am looking for an explicit formula that weighs the predictions from each database into a combined one.