- Anna Bartkowiak added an answer:4What does this warning message tell me (in R package quint) and what is 'gammafn'?
I am working with the R package 'quint' to test for qualitative treatment-subgroup interactions (personalized medicine). Everytime I analyze my data there are some warnings which I cannot handle. All warnings are of the same sort. Actually, the main problem is that I do not understand what the warning means exactly:
1: In computeD(tmat[kk, 1], tmat[kk, 2], tmat[kk, 3], tmat[kk, 4], :
value out of range in 'gammafn'
Is anybody well versed in this R package or is it probably an universal warning that you were confronted with in another context? I do not know what 'gammafn' could be and why the value is out of range.
I appreciate any comments and ideas!
The warnings, especially when they appear often, should not be disregarded. It may happen, that the results produced by the software, are meaningless. Brian, are you sure, that your computed results make sense?
First question: What is the role of the gamma function in the applied statistical model? Judging from the information that a treatment-subgroup interaction is tested . I guess that a F-test is calculated, and the probability distribution function of F (also of chi-square) somehow use gamma functions. Getting some extremal arguments for these functions indicates that something is wrong with the data. May be you have some empty classes in the assumed linear model, may be too few degrees of freedom, may be the (the non-zero) data have variance equal zero? May be too many parameters to estimate and too few data?
In any case, to consider the results as trustworthy, the reason of the warnings should be clarified.
Sorry, this does not sounds optimistic.
- klaus krippendorff added an answer:17Which measure of inter-rater agreement is appropriate with diverse, multiple raters?I want to calculate and quote a measure of agreement between several raters who rate a number of subjects into one of three categories. The individual raters are not identified and are, in general, different for each subject. The number of ratings per subject varies between subjects from 2 to 6.
In the literature I have found Cohen's Kappa, Fleiss Kappa and a measure 'AC1' proposed by Gwet. So far, I think that Fleiss measure is the most appropriate, although he derives it assuming than the number of ratings per subject is the same for all subjects.
The Gwet measure AC1 is supposed to deal with the apparent 'paradox' of low agreement values despite a large percentage agreement. Unfortunately, I do not understand the derivation of this measure.
I would be grateful for any comments and suggestions, particularly on the appropriateness of Fleiss Kappa and the soundness of Gwet's AC1.Following
- Ciaran O'Boyle asked a question:NewWhat is your experience of the POSAS score?
I've been trying to use the POSAS score for evaluation of burn scar reconstruction. However, I find the Patient's part difficult to use. The patients all seem to become a bit embarrassed in trying to put scores on their symptoms. I'm left with the impression that they are all under-scoring their problems. This might be ok statistically, since if everything is under-scored, then trends will still be valid, but has anyone else experienced this? More to the point, has anyone overcome this problem???Following
- Patrick S Malone added an answer:5Is there a way in statistics, to consider multiple variables as one ?
For example for illustrating the development of ICT, there are six different indicators.
How can I consider these six different indicators as one single variable in order to use it for further statistical test?
Is there a way ?
If anyone has knowledge of statistics, it could be great to help me.
Elahe, there are several approaches, but three are the most common for this situation. Which one you choose depends on your goal in the analysis, the nature of your data, and what you believe to be the relations among the variables.
Since you mention ICD-10, for indicators of a diagnosis, the simplest thing is to sum (or average) them and use that as your variable.
A somewhat more complex version of the same thing is principal component analysis, as has been suggested. Technically speaking, a PCA will give you the "first eigenvector" of the correlations among your variables. Practically speaking, it gives you a weighted sum that represents the greatest possible amount of information from your data that one score can. However, if your indicators are binary, like presence/absence, PCA doesn't always perform well. The other caution is that it optimizes for your dataset and won't necessarily generalize well.
The third is factor analysis, variations of which can accommodate binary indicators, but I think that's getting more complex than you want to.
- Daniel Wright added an answer:1To find the significant difference for Non gaussian data set, which test is used?
Statistician, Data Analyst
First, plot the data with quantile-quantile plot against normality (so in R, qqnorm(x);qqline(x)). This is easier to see deviations than a histogram
The Shapiro-Wilk test is often used, see
Other packages will have similar abilities (and in R there are more packages/functions if your needs are more specific).
Sometimes people estimate skewness and kurtosis and see if these differ from normality. However, these are only two characteristics of the distribution, and also estimating their standard errors are problematic (see attached).Following
- Thanyani Rambane asked a question:NewWhen doing seasonal adjustments how do you remove residuals from the aggregate of components that have no residuals?Following
- Thawhidul Kabir added an answer:1What is the best approach of treating Dont know responses when re-scaling continous variable?
To illustrate this let me use an example. I have question "How much do you trust your President?" Answer choices are 1- Not at all 2-Just a little 3-Somewhat 4-Alot 5-Dont know. When re-scaling some analyst might consider to re-scale this by allotting 1-Not at all 2-Just a little 3-Dont know 4-Somewhat 5-A lot. Another Analysis might allot 1-Not at all/Don't know 2-Just a little 3-Somewhat 4-Alot . Every Analyst have a distinct way of doing this but there must be a clear line where we can all walk? Personally i always group don't know choices as system missing however i lose a lot of cases on the way. The bottom line is what is the best way of treating Don't know responses with a statistical implication and without violating any statistical principle?
I am afraid that your statement "Every Analyst have a distinct way of doing this" is not true. In research there is nothing like 'my own way of doing this and that' because all the ways are already defined in a scientific way and have commonly been accepted. What you are wanting to know through your question refers into the 'Measurement and Scaling' part of any quantitative research. Let me take your example for further discussion. You have exemplified that you are wanting to know 'How much does a person trust something'. So the part of your query 'How Much' indicates that it is an issue of measurement but what exactly you are wanting to measure depends on your objective. Here your objective is to measure 'How Much Trust' and it is a kind of belief which can only be identified and measured by the opinion of a person, not by counting i.e. in Centimeters, Litters or Kilograms. For the purpose of measuring such things researchers use 'interval scales'. Now developing an scale depends on the 'scaling technique' which can be of two types, (a) Comparative and (b) Non-comparative. In case of your example non-comparative scaling technique is applicable because your objective is it measure 'how much someone trusts a president'. If you wanted to know 'How much someone trusts one of the presidents among A, B and C' the comparative scaling techniques will be applicable. There are different types of comparative and non-comparative scales. In your example you have specifically mentioned the 'Summated scale' because it a collection of related questions that measure underlying constructs i.e. Trust, Confidence, Satisfaction etc. For a detailed answer of your question you can check the book of Paul Spector entitled "Summated Rating Scale Construction: An Introduction (Quantitative Applications in the Social Sciences)".
- David Boansi added an answer:22Is it better to use the "mean" or "median" in describing central value/tendency of a given population/sample?I want to know which among the two "mean" or "median" appropriately reflects central value of a given sample/population.
Thanks so much for the respective opinions shared. Your points are well noted and very much appreciatedFollowing
- Johannes Ullrich added an answer:5¿Is there any difference in the effect size calculation (Cohen's d) for within groups and between groups?
Do you know if I can use the same formula (Cohen's d) for the calculation of within groups effect size (pre-post treatment of one group) and between groups effect size (post-treatment of two groups)?
The formula that I know is proposed by Durlak (2009, How to select, calculate and interpret effect sizes).
Thank you! =)
Fixed the link above:
- Trevor J. Willis added an answer:11Has anyone used PERMANOVA as an univariat test?I have seen papers using PERMANOVA as a multivariate test (for example for comparing richness among sites or for comparing a species density among seasons). Is it posible? To me this does not sound correct, since Anderson (2001) describes PERMANOVA as ´A new method for non‐parametric multivariate analysis of variance´.
I believe that PERMANOVA is essentially a multivariate test, am I correct?
PERMDISP tests the heterogeneity of dispersions among groups. This is conceptually similar, but not quite the same thing as heterogeneity of variances. Dispersions can vary both in extent and 'shape' of the multivariate cloud. The short answer is: not really. As per my earlier answer, you should not be using PERMANOVA for univariate comparisons in the first place, and if concerned about normal distributions, you should be looking at models with appropriate distributions rather than trying to shoe-horn non-normal or heterogeneous data into Gaussian models. Try log-linear or negative binomial models for count data, for example.
I should mention that CAP does not provide information on data 'spread'. As a constrained ordination method, it looks for the 'best' means of separating a priori specified groups, and as such does not allow any any assessment of either overall or between-group variability. Anderson & Willis (2003) specifically recommend using unconstrained methods (such as PCO or MDS) for this.
Hope this helps,
- Suman De added an answer:39How to interpret factor scores from Exploratory Factor Analysis?I've conducted different factor extraction methods using a considerably small dataset (low-level features extracted from image content). The problem is with the interpretation of factor scores obtained, which ranges from negative to positive integer number of unknown minimum/maximum. I read some handbooks but usually highlighted on how to conduct factor analysis and very rarely discuss about how to interpret the output.
I have conducted a factor analysis to extract latent factors leading to gender inequality at workplace. I extracted 5 factors from 20 variables. Now i want to rank all the variables showing their contribution in gender inequality. I have conducted a factor score ranking of variables, is this a correct method to fulfill my objective of ranking?
Please suggest me ways to do the ranking.Following
- Rand R Wilcox added an answer:19How can we run any regression if the error term is explicitly assumed to be non-normally distributed?
Please look in the link below, bottom of p. 8, where W is specified in several ways, not necessarily distributed normally.
There is an extensive literature dealing with your question that is summarized in
Wilcox (2012). Introduction to Robust Estimation and Hypothesis Testing. Elsevier.Following
- Anne Renaud added an answer:99+Which correlation coefficient is better to use: Spearman or Pearson?I´m performing a correlational study of two temporal series of data in order to identify positive or negative correlations between them. Which correlation coefficient is better to use: Spearman or Pearson?
Dear Walter. May be you should search for a method that takes into account the fact that you have times series. Pearson (based on the values) and Spearman (based on the ranks) do not take into account the time series specificities.Following
- Karen Steadman added an answer:5Do I need to weight my questionnaire data?
We have commissioned a survey company to undertake a telephone survey for us (n=500). The sample has been recruited from people who have taken part in a large national survey (Health Survey England) which is conducted every year with a different randomly selected population sample. We are speaking specifically to people with more than one long-term health condition.
The questionnaire data has been collected and the survey company are planning to provide us with the dataset next week. They have asked us if we would like them to weight the data (for an additional fee). I am not an expert on weighting large datasets, and am unsure whether this is necessary. Was wondering if anyone had any thoughts on the matter, guidance on what I should be considering in this decision, or indeed could point me in the direction of any resources which might help me make this decision.
We would like the data to be broadly generalisable to this population. The sample has been recruited from 3 different years worth of HSE participants so we could get the numbers. I am assuming that they were more difficult to recruit from the earlier years as people will have changed phone numbers, become more ill, etc. I will need to check this assumption with the survey company.
Would really appreciate any advice. We are a charity and don't really have resources for additional expenditure, but I don't want to jeopardise the quality of the data.
Thank you all. I think I am going to go with weights, as I know there will be bias in the sample and I would like to generalise. Going to see if a uni colleague will help me to avoid the costs! Fingers crossed....Following
- Emilio José Chaves added an answer:6Is there any relationship or equation to find the total width of a gaussian curve when full width at half maximium is known?
Is there any relationship or equation to find the total width of a gaussian curve when full width at half maximium is known?
If premise of simetry around media holds for positive values, then the width should be equal to two medias. So I am not able to understand neither to apply gaussian curves. It would help if contributors explain their answers with a numeric sample.Following
- Thom S Baguley added an answer:5What is the best non-parametric method for determining modality and potential mixture in a non-normal single-variable distribution?
I am an archaeologist studying bedrock mortars (the holes in boulders and rock outcrops that Native Americans would use to grind foodstuffs) in a large study area. I’m exploring whether the depth of an individual bedrock mortar is an indicator of a specific kind of food that was meant to be ground in it. More specifically, whether a bounded range of depths is correlated with a specific type of food (e.g. mortars between 0.25 – 5.5 cm deep were mostly used to grind acorn). Unfortunately, I did not have the ability to test for the presence of food residues in the mortars, so I only have a single variable- depth- to work with. Instead of performing regressions or similar multivariate tests to correlate specific foods with specific depth ranges, I am looking for patterns in the depth variable that suggest preferred ranges. I assume these ranges will be evident as modes in the distribution curve, and potentially as mixed distributions.
I measured the depth (continuous interval) of 699 bedrock mortars from my study area, and can assume that is essentially the entire population. When I plot the values, the resultant distribution curve is highly non-normal (right skewed, leptokurtic?, a long tail and lots of outliers on the right, p-value of <0.01 for Kolmogorov-Smirnov and Shapiro-Wilk tests). There is an ever present mode on the left side of the curve, which is definitely meaningful, but there a couple of potential modes on the right side of the curve that are less obvious. I have “bump hunted” by binning the data differently in the histograms, but am hoping for a more statistically powerful way to identify modes. The values do not normalize when doing a log base 10 or square root transformation. Is there a test or method I can use to identify statistically significant (probable?) modes in the distribution of this single variable? Is there an effective non-parametric test for mixed distributions in this circumstance?
Any advice is greatly appreciated. I have attached a spreadsheet of the values below.
KDE in R can be as easy as:
... though there are many more options and packages that give further options.Following
- Shah Limon added an answer:1Can I create FMEA from the line items of pareto analysis?
I have created a pareto analysis on top contributors for error. Please let me know with the 80/20 list of line items, can I develop a FMEA to understand the root cause?
I think from Pareto analysis you can get the major contributor of failure mode. The FMEA is a rigorous process and this historical failure modes from Pareto analysis can help to enhance FAME. For complete FMEA, besides this major failure modes, it is necessary to generate operating condition, failure process and propagation, and many other cross functional activities.Following
- André François Plante added an answer:6Can anyone direct me to this paper by Noether (1963) A note on the Kolmogorov-Smirnov statistic in the discrete case?
The complete reference is Noether GE: A note on the Kolmogorov-Smirnov statistic in the discrete case. Metrika 7:115-116, 1963'.
Basically I'm looking cor citations that prove that Kolmogorov-Smirnov test for discrete variables is (way) too sensitive, and P-values are extremely small. Many thanks for your help.
Have-you tried Google Scholar?Following
- Hamid R Tabatabaee added an answer:2We have done univariate analysis in SPSS, no problem in getting the odds ratio. but E+ error is showing in multivariate. how to do without error?
We have done univariate analysis in SPSS, no problem in getting the odds ratio. but E+ error is showing in multivariate. pls give suggestion. how to do the multivariate without error.
you must conduct it via Logistic Regression (Binary logistic Reg.) Analysis
Of course, you must analyse it with univariate analysis one by one and select variables that have P less than 0.25 for multivariate analysis.Following
- Watheq J. Al-Mudhafar added an answer:3How do I do a two-stage multiple imputation?
I am trying to do multiple imputation in two steps. I have successfully completed the first step (imputing the independent variable).
Now, using the imputed data set I created in the first step, I would like to impute the covariates I am using. I am using stata and get this error:
"mi impute: imputations exist
40 imputations found; use replace to replace existing and/or add() to add new imputations. Before adding new
imputations, verify that the same imputation model is specified."
I appreciate any tips! Thank you!
I would use the methods included in the attached paper for missing data imputation.
They are so easy to implement them in R to impute in multiple variables.
I can guide you about that if you prefer to use R.
- Daniel Wright added an answer:3Is there any recommended test other than Sobel test in measuring mediating effect?
Sobel test in mediating effect
For a conceptually different approach, see http://imai.princeton.edu/projects/mechanisms.htmlFollowing
- Ismail Maakip added an answer:35When I use AIC (akaike information criterion) to find the model of the best fit, do I need to consider p-values?Using the AIC method, I extracted the parameters which are the best fit to explain the variability in my dependent variable. My question is, when I want to publish my results, do I need to state my p-value that the AIC method gave me or the p-values that the regression model calculated?I think Animesh Ghose has answered the question. The model with the lowest AIC value is considered as the best model. In addition, Akaike Weight and Evidence ratio should be calculated.Following
- Nida Rizvi added an answer:3Can I use fractional rank method to transform my data to normal distribution?
I have used log and square root transformation method in to order to normalise my data set but it didn't work? Can I use fractional rank method to get normalise data set?
thanks all for your valuable answers.Following
- Gavin F Revie added an answer:4Why am I getting a low ICC even though agreement is objectively high?
I am assisting a student with a measure of agreement for test-retest data using an ICC (2, 1) for absolute agreement. The data is ordinal with a 5 point scale, and according to Streiner et al 2015's book, the ICC is superior in these situations than (for example) kappa or weighted kappa.
We've encountered a situation where despite nearly all scores being identical between test 1 and test 2, we have an ICC of .000. As far as I can tell this is because there is almost no variation in the scores. For both test 1 and test 2, nearly all of the answers given are "2". I've queried with the student whether this suggests that there was something wrong with the question in the first instance, since if you ask a question and everyone answers identically, was it really worth asking in the first place?
However what I'd like to know is WHY the ICC is 0 despite nearly all responses agreeing. My understanding is that ICC is a version of correlation adjusted for the fact that the different variables are measuring the same thing. Correlation looks at associations between two variables. If when plotted on a scatter plot scores on both variables are, with very few exceptions, all "2", then it is not possible to model a relationship between the two variables. They're not associated with one another, they're functionally identical.
Is this correct? Thank you for any help you can provide.
That's the sort of answers I was expecting, thanks. After thinking about it for a while longer I also realized that the ICC measures reliability, with agreement being secondary to this. These two things are to an extent independent. In the case of the data we have here we have high agreement but poor reliability. If you're not doing an ICC for "absolute" agreement it is also possible to have the reverse be true - a situation where you have high reliability but low total agreement.Following
- Vitalii Gryga added an answer:7Any information about statistics on GERD (Gross expenditure on research and development) and number of researchers in breaking by fields of science?
Dear colleagues, hello! My question is primarily for statisticians (statisticians of Science technology and innovation).
Do you have any information about statistics on GERD (Gross expenditure on research and development) and number of researchers in breaking by fields of science?
In UNESCO Institute of Statistics (UIS), OECD stat database, Eurostat, I’ve found such a statistics for some countries (Croatia, Hungary, Portugal, Russia, Argentina etc.) but data for e.g. Germany, Japan, US, UK, Netherlands, France etc. exist only for one year or do not exist at all.
In OECD Stats e.g. There are more or less full data (i.e. coverage of developed countries) on “Other national R-D expenditure by field of science and by source of funds”. (but it is only some (quite small) part of total GERD).
I need data for “Gross domestic expenditure on R-D by sector of performance and field of science” (for total intramural sectors) and “R-D personnel by sector of employment and field of science” (for all sector). Here data are very fragmentary for the most developed countries. The same situation is in UNESCO UIS, and Eurostat since these database exchange data with each other.
Therefore do you have any info on “Gross domestic expenditure on R-D by sector of performance and field of science” and “R-D personnel by sector of employment and field of science” For developed countries?
Is this data available on national statistical service databases?
Many thanks in advance for your answers, links and shared files!
Have you cheked national statistical offices. Sometime is it only the way to find data, however such data could not be fully comparable.Following
- Mudassar Sayyed added an answer:9I Categorized respondents in two groups on basis of mean of a percentage scale. Is this correct approach?
I have grouped web searchers as efficient or inefficient on basis of their experience about finding information online. I have responses of 281 volunteers for question.
You find what you are searching for ______ of times.
options are: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
The mean approximately 60% I formed two groups as
Group A % > 60
Group B % <= 60
Is this statistically valid approach?
Thank you sir for adding this new dimension.Following
- Saleh Salehizadeh asked a question:NewHow this high dimensional test statistic has been obtained?
Hi! I was reading this article of Feng and Sun. But I was wondering How the statistic has been obtained! any Idea?
- A.K. Singh added an answer:5Can I include predictor variables that are correlated to the exposure variable in Negative binomial regression?
Hello, I have a question regarding negative binomial (NB) regression. I am not sure if I can include predictor variables that are correlated with the exposure variable (say time)? I'm concerned that the predictor variable (VIF 3.8) is correlated to the exposure variable (VIF 9.8). I have carried out NB regression despite the collinearity and the results are significant. The overall likelihood ratio test is 6.377, df = 1 and sig = 0.012. Can I include the predictor variable in this case?
I agree with Jochen and also with Andrey: if your model is an explanatory model, you do need to address multicollinearity, but if it is a predictive model, then it does not matter.Following
- Sagar Bachhav added an answer:5How can I calculate Cmax, Tmax, AUC, T1/2 in PK/PD studies?
Please find the data below:
TIME IN MIN - 0, 30,60,180,300
CON OF CURCUMIN in µg - 0.00, 7.66, 84.27, 40.87, 7.84 respectively
Agree with both Konstantin and Jun.
Considering that you are quite new on PK concept, I suggested simple Non compartmental analysis. Compartment analysis need some understanding and software. Try to use Basica or Kinetica software trial version.
Other software Phoenix WonNONLIN and GastroPlus are paid and expensive ..!Following
- Manar Shaheen added an answer:15How do I test normality for two independent groups?
I am doing comparison between two independent groups; patient group of 24 subjects and control group of 34 subjects. My questions are:
1) Is testing normality relative? I mean is it sufficient to check out visually the histograms and q-q plots and decide if my variable data shows normality or not? what are skewness and kurtosis? I wonder what is the conclusive tests that I can depend on to test normality.
2) In case one group is showing normality while the other is not, does that means that my data is not normally distributed and I have to use non-parametric testing?
3) I read about other assumptions regarding using parametric tests even the variable data of both groups shows normal distribution. For example, I read about variance homogeneity? How can I check this out? I am using SPSS for my data analysis.
Thank you for all your replies.
You suggested to me to fit the two means and look at the distribution of the residuals. Did you mean to combine both groups and plot them as one group? How to check out the distribution of residuals?
Thank you again.Following
Statistical theory and its application.