- Huda Abdullah added an answer:What is the probability density function of θ ̂?
Let T be a random variable with Gamma distribution:
What is the probability density function of θ ̂, where
θ ̂= v1 / v2
v1 = a0*T/((n-1) ) + a1*T2/((n-1)(n-2))+ a2*T3/((n-1)(n-2) (n-3))
v2= a0+a1*T/(n-1) + a2*T2/((n-1)(n-2)) , where, a0, a1, a2 are constants.
Many thanks in advance
With Best Regards
Thanks alot for your a valuable comments.
I'm sorry there is an a mistake!
I have been changed c1 and c2 to be n ,θ ( i.e.,T~Gamma (n,θ)) and a0, a1, a2 are constants.
Thank you againFollowing
- Daniel Courgeau added an answer:Can Fisher’s controversial idea of fiducial inference, in the 20th century, be accepted by the statistical community in the 21st century?
Fisher introduced the concept of fiducial inference in his paper on Inverse probability (1930), as a new mode of reasoning from observation to the hypothetical causes without any a priori probability. Unfortunately as Zabell said in 1992: “Unlike Fisher’s many original and important contributions to statistical methodology and theory, it had never gained widespread acceptance, despite the importance that Fisher himself attached to the idea. Instead, it was the subject of a long, bitter and acrimonious debate within the statistical community, and while Fisher’s impassioned advocacy gave it viability during his own lifetime, it quickly exited the theoretical mainstream after his death”.
However during the 20th century, Fraser (1961, 1968) proposed a structural approach which follows the fiducial closely, but avoids some of its complications. Similarly Dempster proposed direct probability statements (1963), which may be considered as fiducial statements, and he believed that Fisher’s arguments can be made more consistent through modification into a direct probability argument. And Efron in his lecture on Fisher (1998) said about the fiducial distribution: “May be Fisher’s biggest blunder will become a big hit in the 21st century!”
And it was mainly during the 21st century that the statistical community began to recognise its importance. In his 2009 paper, Hannig, extended Fisher’s fiducial argument and obtained a generalised fiducial recipe that greatly expands the applicability of fiducial ideas. In their 2013 paper, Xie and Singh propose a confidence distribution function to estimate a parameter in frequentist inference in the style of a Bayesian posterior. They said that this approach may provide a potential conciliation point for the Bayesian-fiducial-frequentist controversies of the past.
I already discussed these points with some other researchers and I think that a more general discussion seems to be of interest for Research Gate members.
Dempster, A.P. (1963). On direct probabilities. Journal of the Royal Statistical Society. Series B, 25 (1), 100-110.
Fisher, R.A. (1930).Inverse probability. Proceedings of the Cambridge Philosophical Society, xxvi, 528-535.
Fraser, D. (1961). The fiducial method and invariance. Biometrika, 48, 261-280.
Fraser, D. (1968). The structure of inference. John Wiley & Sons, New York-London-Sidney.
Hannig, J. (2009). On generalized fiducial inference. Statistica Sinica, 19, 491-544.
Xie, M., Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81 (1), 3-77.
Zabell, S.L. (1992). R.A. Fisher and the fiducial argument. Statistical Science, 7 (3), 369-387.
Sorry for this delayed answer to your vey interesting contribution, but this was due to some urgent work to finish and to my whish to read the papers you cited before answering you.
First, I am very happy to have the contribution of a metrologist to this important debate. In the second place, I see many convergent points between our thoughts.
In the paper by Guthrie et al., I am interested by their use of the term “paradigm” in statistical thinking. These paradigms are taken in a different sense from Kuhn (1962), for which Masterman (1970) identified 21 different meanings, but, I think, in the sense of Granger (1994) which addresses the following question: how does one move from the experienced phenomena to the scientific object? For him “the complex life experience grasped in sensitive things has become the object of mechanics and physics, for example, when the idea was conceived of reducing it to an abstract model, initially comprising only spatiality, time and resistance to motion.” And he recognizes that the content of this object is not explicitly and broadly defined at the outset. I took such a definition for the different paradigms in demography and probability (Courgeau, 2012), and it seems to me that Guthrie et al. take a similar definition for their statistical paradigms.
Similarly in your paper with De Bièvre you said that “Also in science, ‘diversity’ is not always synonym of ‘confusion’, a popular term used to contrast it, rather is an invaluable additional resource leading to a better understanding”. This recalls me the other sentence by Granger: “True, the human fact can indeed be scientifically understood only through multiple angles of vision, but on condition that we discover the controllable operation that uses these angles to recreate the fact stereoscopically”. This is also consistent with Guthrie et al. conclusion: “The existence of different paradigms for uncertainty assessment that do not always agree might be seen as an unfortunate complication by some. However, we feel it is better seen as an indication of further opportunity. It is only by continually working together to appreciate the features of different paradigms that we will arrive at methods for uncertainty assessment that meet all of our scientific and economic needs”.
Courgeau, D. (2012). Probability and social science. Springer.
Granger, G.-G. (1994). Formes, opérations, objets. Librairie Philosophique Vrin.
Kuhn, T. (1962). The structure of scientific revolutions. The University of Chicago Press.
Masterman, M. (1970). The nature of a paradigm. In Lakatos and Musgrave, eds., Criticism and the growth of knowledge, Cambridge University Press.Following
- Martina Chantal de Knegt added an answer:Calculating a weighted kappa for multiple raters?I have a dataset comprised of risk scores from four different healthcare providers. The risk scores are indicative of a risk category of low, medium, high or extreme. I've been able to calculate an agreement between the four risk scorers (in the category assigned) based around Fleiss' kappa but unsurprisingly it's come out very low - actually I managed to achieve negative kappa value. I've looked back at the data and there are many cases where, for example, three of the scorers have said 'extreme' and one has said 'high'. Based on normal kappa, this comes out as disagreement, but of course the cases are adjacent so whilst its not agreement is an awful lot better than, say two scorers saying 'extreme' and two scorers saying 'low' as agreement does not fall into adjacent cases.
I understand the basic principles of weighted kappa and I think this is the approach I need to take but I'm struggling a little with weighted kappa given its multiple raters. Does anyone have any experience on this and advice how is it best to tackle this?
I can recommend this paper
which recommends that you calculate intra-class correlation for ordinal data to determine the interrater reliability:
"The intra-class correlation (ICC) is one of the most commonly-used statistics for assessing IRR for ordinal, interval, and ratio variables. ICCs are suitable for studies with two or more coders, and may be used when all subjects in a study are rated by multiple coders, or when only a subset of subjects is rated by multiple coders and the rest are rated by one coder."Following
- Patricio Herrera added an answer:Are there two kinds of cohort analyses?
I am confused with the two kinds of cohort analyses composed of discovery cohort and validation cohort shown in Figure2 e-h in the attached file.According to ROC curves, both analysis tells that exosome concentration is superior to exosome size to detect the malignancy. According to ROC curves. Why the authors have to demonstrate ROC analyses twice in the different way? What is the difference between discovery and validation cohort?
Hello Go J Yoshida, as far as I know, your problem is somewhat different from how you are looking at it. Unless things have changed a lot, you will find nice insights after reading the article I am suggesting to you:
Sackett DL, Haynes RB “The architecture of diagnostic research”. BMJ 2002; 324:539-41.
I hope this will help you.
- S. Vannitsem added an answer:Which is the most suitable stationarity test available (KPSS-, ADF-,PP-test...)?
I've got some wind speed measurements and I would like to find out which test is the most reasonable one to clarify if the data is stationary or not. Your help is highly appreciated.
I am not aware of the most recent progresses in this context, but when I was working on climate change in time series I was using tests based on the cumulative sum (Pettitt's test able to detect one change, and the Lombard's test able to detect multiple changes, see references in Vannitsem and Nicolis, 1991, in Contributions to Atmospheric Physics). I was also using the Mann-Kendall test also able to detect one change (whatever the nature). More recent development of these techniques were made by Olivier Mestre and colleagues at Météo-France. You can also find nice applications of the Mann-Kendall test in a paper of Reinhard Bohm (https://www.researchgate.net/profile/Wolfgang_Schoener2/publication/229879931_Regional_temperature_variability_in_the_European_Alps_17601998_from_homogenized_instrumental_time_series/links/5440ed370cf251bced6149f5.pdf)
- Najibullah Hassanzoy added an answer:What will be the mathematical form of the Johansen's cointegration test with a level shift and its corresponding trace statistic?
I have applied the Johansen et al. (2000) cointegration test with a level shift, but his paper is very complicated at this stage for me to understand fully. I want to know once a level shift is considered, what will be the mathematical form (model) of Johansen's cointegration test and its corresponding trace statistic? If you can understand his paper, please leave the equations in the comments.
Thanks in advance.
Dear Prof. Peters,
Thanks for sharing this very helpful guide.Following
- Sajid Hussain added an answer:Any advice on the estimation of sur with hetroskadsticity and serial correlation?
i am estimating SUR (KLEM) model using cross sectional data. the sur model assumes serial independence and homoskedasticity. i tested these assumptions with lmhreg3 and lmareg3 in Stata after sureg (Breusch-Pagan LM Test =2617.1928) and (Harvey LM Test = 2171.8203). Now please suggest me efficient estimator. Should i proceed with iterative sureg or i need to correct these issues first? Someone told me that serial independence is not mandatory in cross sectional data. If it is true please share me a reference. waiting for the reply desperately.
i read a paper by Michael Creel, Montserrat Farell (1996) which relaxes these assumptions in time series data and named that estimator as Quasi FGLS. can i use this estimator in cross sectional case? if yes than how to do it in Stata? paper by Michael Creel, Montserrat Farell is attached.
Thanks brother. Can you please give me a reference for this ?Following
- Godfrey Tumwesigye added an answer:How to interpret factor scores from Exploratory Factor Analysis?I've conducted different factor extraction methods using a considerably small dataset (low-level features extracted from image content). The problem is with the interpretation of factor scores obtained, which ranges from negative to positive integer number of unknown minimum/maximum. I read some handbooks but usually highlighted on how to conduct factor analysis and very rarely discuss about how to interpret the output.
Yes, Allan. Discarding depends, on the relationships among the factors. I have done EFA of job attitudes including job satisfaction, organisational commitment and turnover intentions. And consistent with theory, whereas factor loadings on commitment and job satisfaction are positive, those on turnover intentions are negative. The negativity is explained by theory, so I can't discard items measuring TOI.Following
- Adolph Delgado added an answer:Mauchly's test of sphericity - Why no results?I'm analyzing my current research data for potential violations in statistical assumptions. The particular analysis/design in question is a 2 x 2 between-within (i.e., mixed design, repeated measures + between measure, etc.) factorial ANOVA. The problem I'm having (or maybe it isn't one?) is when I check for violations of the sphericity assumption. For whatever reason, whenever I run Mauchly's test of sphericity in SPSS, it gives me a Mauchly's W of 1.000, df 0, and Sig of . . Nothing...it gives me no sig. data. Why would that be?
Is it a sample size thing (my N = 34)?
Could it be due to largely unequal group sizes (n = 30 and n = 4 ) (note: I am well aware of the problems that my unequal group sizes have with regard to other assumptions and the reliability of any F statistic derived; I'm working on that problem but wanted to get an idea of how my data was looking)?
Maybe this means sphericity is as violated as it can be? The confusing thing is that I get the same result if I check sphericity with just the within measures in my control group of 30 participants. Any ideas why this might be? Please let me know if additional information or context is needed.
- Klaus Gadow added an answer:Where can I find websites to get free scientific publications?
Please I need links like http://gen.lib.rus.ec/ or http://www.freefullpdf.com/; I´m from Bolivia and sometimes it is too expensive buy scientific papers, usually it is not one or three, also for the students could have access without having a account where you have to be endorsed for a institution to get it (like on reserchgate). Also publications in other fields such as art, music, etc. Thank you for answers
Look for Open Access Journals, like this one
all papers are free to downloadFollowing
- Xiao Xianfeng added an answer:Should I use the standard deviation or the standard error of the mean?Is the choice between these down to personal preference or is one favoured in the scientific field over another?
The standard error of the sample mean depends on both the standard deviation and the sample size, by the simple relation SE = SD/√(sample size).
hope the connections will be helpful.
- Fateh Mebarek-Oudina added an answer:How have you or others used the model-based classical ratio estimator (CRE)?
In Chapter 4 of Lohr, S.L.(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole, she explains the design-based and model-based CRE. On pages 158-160 of Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons, he also briefly discusses the model-based CRE, amidst his discussion of the design-based CRE. The model-based format can further be found in other contexts in econometrics books. The attached links provide more information and usage. Cochran, page 160, noted that (under what seem mild conditions) it could be "hard to beat."
Where and how have you used the model-based classical ratio estimator, or might you have seen it used?
Important question in econometrics.Following
- James Mullins added an answer:Any advice on Multiple Point Statistics?
Can I get the workflow of Multiple Point Geostatistical Simulation of Facies in Petrel?
If people are still wondering Petrel (at least the new one) does indeed have a range of geostatistical modelling techniques including one for MPS. RMS will also run it. Definitely recommend most of the stuff that comes out of the Stanford Petroleum Engineering department. The only problem is that the literature does actually tell you how to model (in Petrel) only the statistical concepts behind the technique and their various modelling algorithm. Also as far as I am aware Petrel will generate the TIs and run MPS for you but without telling you exactly how it achieves the end result. Has anyone else encountered this and/or know a way around it?Following
- Sharad S Malavade added an answer:How to test multicollinearity in logistic regression?I want to check multicollinearity in a logistic regression model, with all independent variables expressed as dichotomous.
Given that I can not use VIF, is the correlation matrix the only possible procedure? If so, what is the threshold for a correlation to be unacceptable? 0.60? 0.70?...
I am also testing for multicollinearity using logistic regression. I have all outcomes and predictors as categorical variables. I am using a method described by Paul Allison in his book Logistic regression Using the SAS System. He also gives the SAS code that you can adapt for your use. Adrian mentioned in his post, this method applies weights. The interpretation is then exactly like in linear regression.Following
- Debra Sharon Ferdinand added an answer:What are the best data mining tools for health care data?
We have moved into the era of "big data", and tools that have traditionally been applied to other industries are now being considered in health care. Data sources include claims data, survey data, data derived from biometric monitoring, etc.
Given the wealth of data that is being made available, what are the best data-mining tools available to apply to these data. This means not only the type of algorithms (ie., regression trees, etc.), but also the best software available for conducting these analyses.
Data mining is an area I am interested in but have not studied/researched it, so I looked up what is available on ResearchGate:
- Larisa Alagić-Džambić added an answer:How can accuracy profile integrate three parameters of validation as linearity, trueness and precision?
Validation of analytical method following the total error approach use the accuracy profile, this tool is considered by certain researchers as a statistic which intergrate several parameters of validation trueness, precision and linearity
if it's possible that someone make more explanation about this point
Point of accuracy can be the part of linearity and precision.Following
- Geoffrey Chin-hung Chu added an answer:Is there any software available for circular statistics?I am looking for a software for analyzing the orientations of astigmatism in different treatment groups.
Thank you! e-version is available in our library.Following
- Subhash Chandra added an answer:Why do we measure 30 pollen grains per species?
Good Evening, I'm doing my thesis in palinology and I was told that we always measure 30 pollen grains per species, but I can't find the reason for this sample size anywhere. Can someone help me with this question? Thank you.
Interesting discussion. The key criterion to choose a sample size n (=30 here) needs to be the inherent variability on the population. In many situations, as probably here, this knowledge is not available or difficult to get. The rule of 30 is then often used for reasons already highlighted in the previous posts.Following
- Haitham Hmoud Alshibly added an answer:Which numpy syntax can be used to select specific elements in a numpy array?
How can I select some specific elements from a numpy array?
Say I have imported numpy as np
y = np.random.uniform(0,6, 20)
I then want to select all elements (i.e. y < = 1)from y. Thanks in advance
Thank you for bringing this matter to our attention!. It is still a great help and a real pleasure to read your postsFollowing
- George Stoica added an answer:Do we need a new definition of fractals for big data? Or must fractals be based on power laws?
So far definitions of fractals are mainly from mathematical point of view for the purpose of generating fractal sets or patterns, either strictly or statistically; see illustrations below (Figure 1 for strict fractals, while Figure 2 for statistical fractals; Fig. 4 for fractals emerged from big data):
Big data are likely to show fractal because of the underlying heterogeneity and diversity. I re-defined fractal as a set or pattern in which the scaling pattern of far more small things than large ones recurs multiple times, at least twice with ht-index being 3. I show below how geographic forms or patterns generated from twitter geolocation data bear the same scaling property as the generative fractal snowflake.
Jiang B. and Yin J. (2014), Ht-index for quantifying the fractal or scaling structure of geographic features, Annals of the Association of American Geographers, 104(3), 530–541, Preprint: http://arxiv.org/ftp/arxiv/papers/1305/1305.0883.pdf
Jiang B. (2015), Head/tail breaks for visualization of city structure and dynamics, Cities, 43, 69-77, Preprint: http://arxiv.org/ftp/arxiv/papers/1501/1501.03046.pdf
The new definition of fractals enables us to see the fractals emerged from big data. The answer to the question seems obvious. Yes, we need the new definition. BUT, some of my colleagues argued that the newly defined fractals are not fractal anymore, because they do not follow power laws.
I understand, so a new definition is in order. The metric aspect is less important.Following
- Jochen Wilhelm added an answer:Is it possible to statistically compare two values if we do not have information on replicates?
E.g. Is it possible if we want to check statistical significance in the difference between two separately published LC50 for a particular chemical in two separate species? To elaborate lets take an example of Cu. Suppose, Cu has LC50 of 2 mg/L in species A and 4 mg/L in species B. Can we compare it to say which species is more sensitive to Cu?
"Justification: Statistics starts at two numbers up." - I do not agree here. Statistics starts where we must consider uncertainty, and this has noting to do with the amount of values or sample sizes (n) one has. You are correct that some methods or calculations do not work when n=1, but this is not the whole of statistics. Meaningful and constructive statistics is possible even for n=1. Comparing 2 and 4 (to pick up your example) while assuming stochastic processes resulting in these observations is at the very heart of statistics and not just algebra. It is algebra to calculate the difference, but statistics provides tools to estimate the uncertainties associated with these observations and with the difference.
To make the example with your values: Given these are counts from a Poisson process, then the log-likelihood contour for the model E(x) = exp(b0 + b1*x) with E(x)~Pois(lamda) is attached below. You can see the profile and select likelihood or confidence ellipses. Don't tell me that this is "just algebra". There is a way to infere reasonable (and unreasonable) model parameters based on the data, and this involves all the core of inferential statistics, what is, to my opinion, way beyond "just algebra".
This example may or may not be sensible (it may simply be not reasonable to assume an non-overdisperdes Poisson process, who knows?) - but at least it shows that even for n=1 there can a lot of statistics and inference be done!
I would even be more provocative and say that statistics does not at all need data. Statistics can be based only on sensible assumptions from which yet sensible concusions can be derived. Statistics is about thinking and knowledge. Data just adjusts this, so to say, to empirical findings, so that we can use empirical data to modify our knowledge.Following
- Allen G Harbaugh added an answer:How do I calculate statistical significance for ChIP fold enrichment?
I have some mean amount of DNA from my ChIP assay (A), and some mean amount from the IgG (B).
Call fold enrichment C. C=A/B
I have standard deviations for both A and B. However, If I calculate the standard deviation for C according to basic propagation of error rules for sample quotients, my standard deviation becomes impossibly large.
I'm looking for a different way to calculate a standard deviation, standard error, relative error or confidence interval for fold enrichment (C). I've heard that jackknife resampling or bootstrapping a CI may be my best bet, but both of those methods seem silly given that I have only 3 technical replicates (n).
Many thanks for your help!
I would suggest the following Taylor series approximation for the variance of C:
Numerator = E(A^2)(E(B))^2 - 2*E(A*B)*E(A)*E(B) + E(B^2)(E(A))^2
Denominator = [E(B)]^4
where E(x) = the average for the variable x.
Then use a VERY conservative critical value: alpha=0.05 with df=1: t_(alpha/2) = 12.71 (that seems really high, but with df = 3 - 2, you don't have a lot of wiggle room)
then you could produce a CI as:
E(A/B) ± t_(alpha/2) * sqrt(numer/denom)
This won't be the most elegant solution, but it would be mildly defensible with a very small sample size.
Hope this helps.Following
- Sarjinder Singh added an answer:How many bootstraps can I perform on a sample of N elements?I am using bootstrapping analysis for a set of data that I obtained from a Monte-Carlo simulations.
Bootstrapping (statics) allows random sampling with replacement from the original data set that I obtain from a Monte-Carlo simulation. Thanks!See Sufficient bootstrapping by Singh and Sedory. Effron and Tibshirani missed a point that there is a theory of distinct units in with replacement sampling scheme. It was developed in 1950s, when bootstrap was developed in 1979s. The problem is that people do not look at literature and go with their own minds!Following
- Ahmad Bazzi added an answer:Antenna Calibration for DoA estimation in the presence of multipath?
Assume i have an array of uniform linear antenna array of 3 antennas, distance uncertainties and other imperfections might perturb the steering vector away from the true one. Thus, DoA estimation using ML or subspace techniques would fail.
I would like to know if it is possible to calibrate when the receive number of signals are more than 3 (Due to severe multipath)????
Thank you in advance.
Dear Mr. Shahriar,
Thank you for this,
Could you please share a document for this ?
- Glory Enaruvbe added an answer:What steps are involved in statistical analysing and variogram modelling of soils samples collected at three layers using gstat in R?
What are the steps involved in statistical analysing and variogram modelling of soils using gstat. I have collected samples at three layers and want to examine spatial variation of 18 soil parameters and also determine the semivariogram model to enable me determine the spatial variability of these parameters and predict their values at unsampled locations using kriging in R?.
I am having issues with database preparation as values have to be repeated for each parameter at each point in a CSV file. Kindly suggest the most efficient solution round this issue
Thank you Prof Myers. Your answers are always very insightful and thought provoking. I sincerely appreciate your responseFollowing
- Chitra Baniya added an answer:What is the difference between random (probability) sampling and simple random sampling?Sampling procedure
These are nice and informative discussion even for non-statistician as me.
- André François Plante added an answer:What is the meaning of negative coefficient of kurtosis obtained in my specific AFM sample?
Kurtosis moment is the fourth moment of profile amplitude probability function and corresponds to a measure of surface sharpness. Even than negative value ?
Table is attached with this question.
I am using R. A. Fisher's definition, not Karl Pearson's.Following
- M. A. Aghajani added an answer:Free Software for Curve fitting or best fit equationWe are using TableCurve2D for fitting our data. Problem with this software it is windows based and commercial software. We need a free software equivalent TableCurve2d (I mean similar functions) which can be run in command mode.
I will highly appreciate if some one suggest free software which take my data and fit it in large number of equations by regression or non-regression. Finally it give me equation in which my data fit best.
SigmaPlot 13 now is working well. Its model library is very full and it is possible to add and edit models.Following
- Patrick A Green added an answer:Alternatives of Fisher's exact test for more than 2 groups?I am doing a chi square test on a 3X3 contingency table. However, there are some cells with expected value <5. I know Fisher's exact test is used for 2X2 table only. Is there any alternative test in my case? Thanks,
you can do it in SPSS, if you go to crosstabs->Exact-->then click the exact box you get the Fisher's Exact result in the stats box.Following
Statistical theory and its application.