• Mohamed Mihilar Shamil added an answer:
    Is it possible to compute a covariance matrix with unequal sample sizes?

    I'm not sure if this question is correct, but is there a way to construct a covariance matrix for two vectors that have different lengths? If so, how?

    And would it have a size of (m+n)×(m+n) (assuming the two vectors are of length m and n)?

    (25* 5) * (5*25)

    Mohamed Mihilar Shamil

    sampath, look at it like this. what you have is two groups of data of unequal sizes. what you are trying to do is to establish association between these two groups. in statistics group statistics are mainly dealt through t statistics, anova etc. you can look for weighted techniques, ancova etc. 

    another concern: service quality causes mo or mo causes service quality?

  • Reza Daryabeygi Khotbesara added an answer:
    How can I get the group-specific univariate F results for the intervention condition in a 2 by 2 MANOVA?

    Even though I modify the syntax to "/EMMEANS=TABLES(Group*Time) COMPARE(Time)", I don't get the group-specific F results.

    Reza Daryabeygi Khotbesara

    For sure, here you are (complete SPSS syntax):

    GLM F_pastB F_pastB_new s_FAT_Int s_Fat_Int_new s_FAT_att s_Fat_att_new s_FAT_sn s_Fat_SN_new s_FAT_pbc s_Fat_PBC_new F_Prisk_a F_Prisk_a_new F_Prisk_b F_Prisk_b_new s_FAT_Plan s_Fat_Plan_new BY Group
    /WSFACTOR=Time 2 Polynomial
    /MEASURE=Bhav Int Att Sn Pbc Risk1 Risk2 Plan

  • Leandro Candido asked a question:
    Experimental vs analitycal: how to approach statistics?

    I have a set of experimental data (EXP) which I have fitted with two analytical models (AN1 & AN2).

    In order to estimate the precision and accuracy of both analytical models I can study statistics of the ratios EXP/AN1 and EXP/AN2 or AN1/EXP and AN2/EXP.

    Well, the point is that statistics of such ratios are not coincident.

    I see that many researchers adopt the first approach when I istinctively would go for the second because I can compare two different analytical models by normalizing them with respect to the same experimental variable.

    Is there anybody who can help me out with this?


  • A. A. Zisman added an answer:
    How precise is calculation of GNDs density using EBSD?

    Using EBSD, and based on the local average misorientation, it is possible to calculate the density of Geometrically Necessary Dislocations (GNDs).

    But there is a huge debate on the accuracy of such results.

    How accurate and reliable is it to calculate the local density of GNDs using EBSD?

    BTW, the dislocations we see under TEM, are those GNDs or they also include Statistically Stored Dislocations (SSDs)?

    A. A. Zisman

    I completely agree with Gert, moreover the virtual sharing of actual dislocations among GND and SSD by means of EBSD seems to be deceptive rather than informative. Indeed, we are unable to indicate in TEM image which particular defect is redundant (neutral) and which is GN (charged). If to consider what use still may come from a formal GND density, it is clear that strain hardening is sooner due to actual (total) density normally close to that of SSD. What in rest?! Apparently, the only sense of GND is in real effects such as plastically induced misorientations (boundaries) and stress sources. HOWEVER, these effects are more easy and pertinent  to treat (and measure) directly, without intermediate terms not always reliable. As to excellent accuracy of HR EBSD mentioned by Julian, this performance exceeds practical meaning of GND in my opinion. For more reasoning please have a look at a long discussion (first, a short question) in my RG page where Gert and Julian also actively participated. 

  • Sanjay Garg added an answer:
    What are some examples of high-dimensional data?
    To support my thesis, I need to know more about this topic.
    Sanjay Garg

    Earth Observation Data/ Geographical data is a classical example of High Dimensional Data. Same is also a classical example of spatio-temporal data too. In this scenario data may have 100+ dimensions e.g. longitude, latitude, temperature, pressure, rainfall, humidity, altitude, time, season , soil type,  and  many more( if data is in vector format). If data is in raster format i.e. image which can always be considered as multidimensional data, as answered earlier by Measy  G.

    We can help you in much better way, if little bit more details about your work are  known to us.

  • Ahmed A.A. Khalifa added an answer:
    How do I interpret wavelet analysis?

    Hi All,

    I am doing the multivariate wavelet analysis of ecological time series in R. I have a hard time interpreting the graphical output I am getting. Please find attached one of the graph from my analysis. The graph is a plot of results from whites of two time series. I have the following questions regarding the analysis;

    1. I know what the statistical power means, but, is the power in attached plot same as the statistical power ?

    2. What can we conclude about the contours with 95% confidence but low power?

    3. Is there any way to quantify the differences between two time series ? I am doing clustering of time series and getting distances from 100 to 300 between different time series. I don’t know how to interpret these distances. How much distance is required to consider the two time series different ?

    4. How to interpret different periods on the plot ?

    5. Should we completely ignore the regions on plot that are outside 95% contours but in phase ?



    Ahmed A.A. Khalifa

    There is an easy article handle that ; the link is

    Good luck

  • Jochen Wilhelm added an answer:
    Any advice on Post-hoc power analysis for ROC/AUC analyses?

    Hi all,

    I was wondering if anyone could help advise about any calculator or formula to retrospectively calculate the power of a study which uses ROC/AUC analyses? Unfortunately, of the information I have seen thus far the calculators are for a priori analyses.

    I understand some researchers advise against post hoc power analyses but I am interested in any calculators/formulas people may be aware of nonetheless

    Many thanks

    Jochen Wilhelm

    I wonder how one can say what "underpowered" is in (basic) research. The concept of "power" is linked to the specification of a "minimum (relevant) effect size" for which one can value the expected wins and losses regarding the consequences (actions) that follow from correctly "treating" or wrongly "ignoring" such effects. I don't see a research setting where this could be done at all (however, it can work well in economical quality-control settings, for insurance companies, credit institutions, and alike).

  • Andres Redchuk added an answer:
    Do you use any software to teach statistics?

    There are arguments that we should teach statistics, not only using manual calculation but also by using software such as spss,minitab,stats and etc. Most of teacher or lecture prefered to use manual calculation. What is your expert opinion on this?

    Andres Redchuk

    In engineering courses, it is important to use software in the classes and solve real life problems (big size, real life size problems).

  • Kristina Eirich added an answer:
    Which statistical package or software application is easiest to use by non-statisticians?
    Can you recommend a simple-to-use statistical software for a non-statistician?
    Kristina Eirich

    Btw, there is one more interesting website about spss, video tutorials, user forum, spss news, everything you need. Take a look

  • Jochen Wilhelm added an answer:
    What is the best introductory book on medical statistics for beginners?
    A text that explains the concepts without much math and formulae? Beginners in medical research in clinical trials/epidemiology often need a basic book on medical statistics that is appropriate for self study.
    Jochen Wilhelm

    I would not recommend the WHO book. I had a look into the chapters about data analysis (Chap. 8 ff) and they contain the "usual" misconceptions and the minglemangle of hypothesis- and significance tests.

  • Kazeem Adepoju added an answer:
    Statistical advice on analyzing the trend of cases over time?

    Dear colleagues,

    I'm working on an article concerning poisoning and wanted to analyze the trend of cases over time but don't know as to what test applies to it in SPSS and how to tackle it.

    Attached is the example of what I want to do and the data available with me.

    I would be very thankful if anyone is willing to guide me on this.

    Thanks in anticipation!


    + 1 more attachment

    Kazeem Adepoju

    Zainudin is right

  • Manuel Morales added an answer:
    What is a binomial distribution?
    I wanted to understand it with some real world examples rather than a definition.
    Manuel Morales

    Although my invite for research contributions are initially focused towards grade school children, I invite my colleagues here at RG to feel free to participate as well (see link).

    Besides RG can you recommend open source resources for students to use to conduct their research with?

  • Patrick S Malone added an answer:
    Correlation coefficient in SPSS is lower when constructs are treated as sum of separate items when compared with Mplus output when they are latent?

    On SPSS I added number of items and created two totals, e.g. A and B. Correlation between them was .700. When I used the same items and used for latent factors (A and B again) in Mplus (using command by) the correlation in the output increased to .850. Could someone explain that please? Thank you

    Patrick S Malone

    Coming back to Daniel's point, for most common estimation methods, the latent variable will use loadings that optimize reproducing the original variance/covariance matrix. The sum-score is equivalent to assuming equal loadings (and equal residual variances) for the items. It is not surprising that the results differ.

  • Sergio Dominguez Lara added an answer:
    In SEM, why is the CFI is affected by the matrix used (policoric or covariance matrix)?

    I perform the CFA with the same data, but with different approach (polychoric matrix, and covariance matrix) separately. Why do the results indicate that CFI from analysis conducted with policoric matrix is greater than CFI from analysis conducted with covariance matrix, even with the same data?

    Sergio Dominguez Lara

    Gracias por las respuestas!

  • M. Srivastava added an answer:
    How to find an empirical formula for a given experimental data, which has 6 independent variables and one dependent?
    I believe I can't do it by using Excel. Because it just has linear regressions not a multiple nonlinear regression.
    I have 7 non-dimensional parameters, one is dependent. I have experimental data for these parameters. I wanted to have a formula to estimate the dependent variable for ant further data/experiment.
    M. Srivastava

    You can go with following steps:

    1. have a matrix Plot of  7x7 variables
    2. Check for the high order correlation, if any.
    3. Just analyse data for multiple regression analysis (Y=a1X1+a2X2+ ... + a6X6).
    4. Revise the equation by introducing the quadratic features. 
    5. This may lead to develop a working predictive model.  
    6. If there is systematic shift in any of the variable then modify the model accordingly. 

    All the best.

  • Kyriaki Kostoglou added an answer:
    Comparing two correlation coefficients (Kendall's Tau) from control and treatment group
    How can I perform a hypothesis test on the difference of Kendall's Tau between independent control and treatment/disease group. Can I use Fisher's Z transform as for Pearson's R (=Steiger's Z-test) or do I have to bootstrap, but how?
    Kyriaki Kostoglou

    Thanx! I was looking for that too. In terms of spearman correlation,

  • Khalid Hassan added an answer:
    How to interpret the result of a Kruskal-Wallis test revealing p<0.05, but with a p>0.05 between two groups?
    I had used Kruskal-Wallis test to analyze 4 groups using SPSS 19. It reported p<0.05 but there was no significant difference between groups. What does it stand for? Did the group affect the results or not? How can I deal with it? Thank you.
    Khalid Hassan

    Dear Dr. Fras Rashad
    Dunn's test used with unequal groups after KW performed , you must use Student - Newman - Keuls ( SNK ) for equal groups in your multiple comparisons
    Good Luck

  • Kamal Rullah added an answer:
    How do I rank ligands based on docking scores against different proteins?

    Suppose I dock 10 ligands to 5 different proteins. My goal is to say "find out the ligand with the best combination effect". I need my ligand to have maximum docking score against say Protein1 and lower docking scores against say Protein 2,3,4 and 5.

    What techniques to can be used to predict the best possible ligand ??

    Thanks in advance

    Kamal Rullah

    It would be difficult to comparing of docking energy in different protein to get new better ligand

  • Lenka Schnaubert added an answer:
    Do I adjust degrees of freedom in a Helmert contrast if there are inhomogenious variances across the sample?

    Hi all,

    I conducted an experimental study with a 3-group between subjects design, where the independent variable is gradually increased between the groups. That is why I want to use helmert contrasts to compare level 1 vs. 2&3 (contrast 1) and 2 vs. 3 (contrast 2). 
    If I follow the logic of orthogonal contrasts, the degrees of freedom should not vary between the contrasts (for each contrast, the full sample is considered to calculate df, even though the second contrast itself does not include all groups / the full sample).

    So far so good. Now my problem: I have inhomogenious variances across the full (3-group) sample. Do I adjust df for the contrasts?
    SPSS (if I define the contrasts manually) does give me t-values etc and also adjusted df-values, BUT: they differ considerably between contrast 1 (1 vs. rest) and contrast 2 (2 vs. 3). Can anyone explain?

    Thanks in advance


    Lenka Schnaubert

    Hi, thanks Jochen for this illustration! This gives me some background on what I saw in my data. So practically the loss in dfs does not make a lot of difference (not with my sample sizes anyway) and my results support the notion that whichever way I handle this, the results stay put - which is a good thing in my book! From a theoretical viewpoint, I will go with the adjusted values, since the variances are inhomogeneous. Going beyond my original problem, I still would be interested if statistically helmert contrasts and the appropriate t-tests for inhomogeneous variances are identical when we adjust variances and df for both anyway (while they do differ for homogeneous variances since we use pooled variances / dfs - even though this difference might be marginal as in my case). This goes beyond my original question, but now I am curious and will need to follow up on this.

  • E. Nihal Ercan added an answer:
    How do I include baseline error in data standard deviation?

    I am looking at a signal from an exogenous protein that also includes the signal from the endogenous homologue. To get a rough idea about the signal from my exogenous protein I subtract an averaged endogenous background signal. This endogenous background signal comes from five experiments and has a certain variation. If I do for example six experiments where I get a combined signal from both endogenous and exogenous proteins, and from each of these experiments subtract the averaged endogenous background signal, I get six datasets for my exogenous protein and can plot it with for example SD or SEM, but what about the error I already know to have from my averaged endogenous background signal, how do I take this into account?

    Thank you,


    E. Nihal Ercan

    i agree with Dr Perumal.

  • Venkatesan Perumal added an answer:
    Can McNemar's test be done when the cell value b or c is 0 in the 2x2 contingency table?

    I am comparing Knowledge and Skill of staffs before and after education. I have assumed McNemar's test is more appropriate test. But I was told that I cannot use McNemar's test because one of my cell values in the 2x2 contingency table is 0. Is this true?

    Venkatesan Perumal

    If you use the formula (|b-c|-1)^2/(b+c)  you will get chi-square value  infinity .
    If you use (b-c)^2/ (b+c) you will get 0/0 which is an indeterminate.  Hence in both situations it is not feasible to apply McMemars test.

  • Amer Al-Jawabreh added an answer:
    How do I statistically compare the sensitivity and specificity or LR+ and LR- of three diagnostic tests performed on the same patients?

    In my experiment I am subjecting 50 patients to a definitive diagnostic test and three more different tests. How do I statustically test the hypothesis that the sensitivity and specifity or the likelihood ratios are similar for the three diagnostic tests I have used?

    Amer Al-Jawabreh

     You may use Fleiss' kappa statistic for multiple tests with dichotomous outcome (+, -)

  • Renzo Bianchi added an answer:
    How do I analyse self-assessment manikin (SAM) data?

    I asked participants from two different cultures (I.V.) to answer the SAM scale after watching a short video. My objective is to investigate if there was any significant difference in each one of the 3 dimensions (Pleasure, Arousal, Dominance) from these two cultures. 

    My question is if I should treat the data as non-parametric and, therefore run a Mann-Whitney test or do something different?

    Thank you in advance

  • Nicholas Almond added an answer:
    How does cognitive psychologists view behavior analysis?

    I am currently researching within a field called behavior analysis, which is essentially modern behaviorism. Researchers in this field tend to emphasize different methodologies, such as single-case designs, and often avoid statistical methods.

    In terms of psychology, behavior analysts are not interested in cognitive phenomena. This is not because they reject the existence of private events, but because they argue that cognitive events cannot be observed; only its behavioral outcomes.

    There are several papers that address how behavior analysis sees cognitive psychology. They often refer to the misuse of hypothetical constructs and unnecessary group designs. However, I was wondering if there are papers discussing behavioral analysis from the cognitive psychologists point of view?

    Most psychology textbooks will refer to behaviorism as dead, often with reference to Chomsky's critique of Skinner. According to behavior analysts, Chomsky's critique is flawed, but in mainstream psychology, behavior analysis remain a minority subdiscipline.

    So, I was wondering if there are any good articles discussing cognitive/internal/private events, and behavior analysis/behaviorism, that are written from a cognitive psychologist point of view? There are plenty such articles in behavior analysis journals, but I am wondering if the issue of cognition vs. externally observed behavior have been discussed elsewhere, from a cognitive viewpoint?

    I guess what I am asking is, what papers from cognitive psychology exist that address why behavior analysis is obsolete, and internal, private events are perfectly acceptable to investigate?

    Nicholas Almond

    My point is why do behaviourists not care about what is going on in the brain. I think the thing which I personally don't like about behaviourism is that it can never be proved wrong because the behaviourists will just come out with another argument for why the behaviour has changed. I am not saying that cognitive psychology or cognitive neuropsychology is always corret but at least we accept that there is something going on in the brain which is worth investigating. We also accept that evolution and genetics and neurochemistry can effect behaviour

    The question of why do some people with a positive mood (and attitude) survive cancer longer than people with a negative attitude is something which is worth investigating don't you think? How we can make CP treatment more effective using cognitive techniques and different drugs is something worth investigating is it not? The reason why CBT is more effective than simple BT is worth investigating? My point is these three examples might effect over one third of the worlds population, but behaviourists will not investigate it because it involves cognition... Please tell me that is unethical?!

    It is totally fine to come up with making a dog salivate at a ring of a bell but how relevant is this to the problems which we are facing now like dementia and mental health problems in children and wars over religion?

  • Celine Bourdon added an answer:
    How to convert coefficients of Log-Transformed variables to Odds-Ratio in Logistic Regression?
    I'm using some log-transformed variables within my model of Logistic Regression. After fitting the model I would like to convert its coefficients to odds-ratio. Does anyone knows how to do this? I'm not sure if it is simply a double exponentiation of the coefficient, like
    odds-ratio = [e^e^(beta hat)]-1
    Celine Bourdon

    Loved this thread! It helped me get some pointers (years after it's posting). Thanks guys! 

  • Frederik Schenk added an answer:
    I have precipitation data for 40 stat over 10yrs. Some have missing data for more than 50%.Which stations can be used according to the missing data?

    I want to imply trend  data analysis on these data( Mann-Kendall Test). So can i use all the stations regardless the % missing data? or should i eliminate some of them.

    Another questions for experts in Mann-Kendall test.. how can i define the variables of the equations to finally calculate Z(standard normal test statistic) and then determine the rend of the data?

    Frederik Schenk

    The severity of the problem is location dependent. Is the rain rather convective (rather local-scale) or is it large-scale advective? In any case, you should compare stations with missing data to other stations around. It makes also a difference whether you look on (sub-)daily or monthly to seasonal sums. So could you explain a bit more what you are looking at and which region?

  • Miebaka Emmanuel Ikiriko added an answer:
    How can I run a ANOVA using general anova in genstat for a multi environment trials? Can anyone assist the steps use ?
    Runing general anova for a multienvironment trial using genstat
    Miebaka Emmanuel Ikiriko

    i will recommend DSAASAT  to you, it is easy to use and interpret.

  • Augusto Teoi added an answer:
    How to interpret factor scores from Exploratory Factor Analysis?
    I've conducted different factor extraction methods using a considerably small dataset (low-level features extracted from image content). The problem is with the interpretation of factor scores obtained, which ranges from negative to positive integer number of unknown minimum/maximum. I read some handbooks but usually highlighted on how to conduct factor analysis and very rarely discuss about how to interpret the output.
    Augusto Teoi

     Dear Giuliani, thanks for the prompt support! I'll check that out!

  • René Schlegelmilch added an answer:
    How do I combine two outcome measures from two different questionnaires into one variable?

    I had developed a questionnaire to measure childhood socio-economic position. Another questionnaire to measure the adulthood socioeconomic position. Both this 2 questionniares developed in the same samples and during the same time period.

     Now i want to use this both questionnaires to measure a new outcome variable, socioeconomic transition (Change in socioeconomic position from childhood and adulthood)

    What kind of statistical measures will i use to know the comparability of 2 questionnaires (correation? ) 

    René Schlegelmilch

    She says, that the same people answered both questionnaires in the same period.

    She wants to derive a new measure: socio-economic transition. I assume this means: the change from child- to adulthood.

    So first, because both questionnaires are (necessarily) composed of different items, you can not directly compare them in terms of 'score-changes'. Second, 'socio-economic' status is a relative construct, meaning that your 'socio-economic-position' is always relative to the population you live in.Thus the question changes to: The change in the socio-economic status relative to the 'current' population.

    So the answer is:

    Normalize/standardize the scores from both questionnaires on the same mean and the same standard deviation (like in an IQ-Test M=100, SD=15 for example). Afterwards you can build differences and see to which extent (relative to the population of course) the socio economic status changes. (I guess you want to see if there are driving third factors...).



About Statistics

Statistical theory and its application.

Topic followers (66,825) See all