Questions related to Statistical Inference
In 2007 I did an Internet search for others using cutoff sampling, and found a number of examples, noted at the first link below. However, it was not clear that many used regressor data to estimate model-based variance. Even if a cutoff sample has nearly complete 'coverage' for a given attribute, it is best to estimate the remainder and have some measure of accuracy. Coverage could change. (Some definitions are found at the second link.)
Please provide any examples of work in this area that may be of interest to researchers.
According to Fisher , “… probability and likelihood are quantities of an entirely different nature.” Edwards  stated, “… this [likelihood] function in no sense gives rise to a statistical distribution.” According to Edwards , the likelihood function supplies a nature order of preference among the possibilities under consideration. Consequently, the mode of a likelihood function corresponds to the most preferred parameter value for a given dataset. Therefore, Edwards’ Method of Support or the method of maximum likelihood is a likelihood-based inference procedure that utilizes the mode only for point estimation of unknown parameters; it does not utilize the entire curve of likelihood functions . In contrast, a probability-based inference, whether frequentist or Bayesian, requires using the entire curve of probability density functions for inference .
The Bayes Theorem in continuous form combines the likelihood function and the prior distribution (PDF) to form the posterior distribution (PDF). That is,
posterior PDF ~ likelihood function × prior PDF (1)
In the absence of prior information, a flat prior should be used according to Jaynes’ maximum entropy principle. Equation (1) reduces to:
posterior PDF = standardized likelihood function (2)
However, “… probability and likelihood are quantities of an entirely different nature ” and “… this [likelihood] function in no sense gives rise to a statistical distribution” . Thus, Eq. (2) is invalid.
In fact, Eq. (1) is not the original Bayes Theorem in continuous form. It is called the "reformulated" Bayes Theorem by some authors in measurement science. According to Box and Tiao , the original Bayes Theorem in continuous form is merely a statement of conditional probability distribution, similar to the Bayes Theorem in discrete form. Furthermore, Eq. (1) violates “the principle of self-consistent operation” . In my opinion, likelihood functions should not be mixed with probability density functions for statistical inference. A likelihood function is a distorted mirror of its probability density function counterpart; its usein Bayes Theorem may be the root cause of biased or incorrect inferences of the traditional Bayesian method . I hope this discussion gets people thinking about this fundamental issue in Bayesian approaches.
 Edwards A W F 1992 Likelihood (expanded edition) Johns Hopkins University Press Baltimore
 Fisher R A 1921 On the ‘Probable Error’ of a coefficient of correlation deduced from a small sample Metron I part 4, 3-32
 Huang H 2022 A new modified Bayesian method for measurement uncertainty analysis and the unification of frequentist and Bayesian inference. Journal of Probability and Statistical Science, 20(1), 52-79. https://journals.uregina.ca/jpss/article/view/515
 Box G E P and Tiao G C 1992 Bayesian Inference in Statistical Analysis Wiley New York
Due to growing concerns about the replication crisis in the scientific community in recent years, many scientists and statisticians have proposed abandoning the concept of statistical significance and null hypothesis significance testing procedure (NHSTP). For example, the international journal Basic and Applied Social Psychology (BASP) has officially banned the NHSTP (p-values, t-values, and F-values) and confidence intervals since 2015 . Cumming  proposed ‘New Statistics’ that mainly includes (1) abandoning the NHSTP, and (2) using the estimation of effect size (ES).
The t-test, especially the two-sample t-test, is the most commonly used NHSTP. Therefore, abandoning the NHSTP means abandoning the two-sample t-test. In my opinion, the two-sample t-test can be misleading; it may not provide a valid solution to practical problems. To understand this, consider a well-posted example that is originally given in a textbook of Roberts . Two manufacturers, denoted by A and B, are suppliers for a component. We are concerned with the lifetime of the component and want to choose the manufacturer that affords the longer lifetime. Manufacturer A supplies 9 units for lifetime test. Manufacturer B supplies 4 units. The test data give the sample means 42 and 50 hours, and the sample standard deviations 7.48 and 6.87 hours, for the units of manufacturer A and B respectively. Roberts  discussed this example with a two-tailed t-test and concluded that, at the 90% level, the samples afford no significant evidence in favor of either manufacturer over the other. Jaynes  discussed this example with a Bayesian analysis. He argued that our common sense tell us immediately, without any calculation, the test data constitutes fairly substantial (although not overwhelming) evidence in favor of manufacturer B.
For this example, in order to choose between the two manufacturers, what we really care about is (1) how likely the lifetime of manufacturer B’s components (individual units) is greater than the lifetime of manufacturer A’s components? and (2) on average, how much the lifetime of manufacturer B’s components is greater than the lifetime of manufacturer A’s components? However, according to Roberts’ two-sample t-test, the difference between the two manufacturers’ components is labeled as “insignificant”. This label does not answer these two questions. Moreover, the true meaning of the p-value associated with Roberts’ t-test is not clear.
I recently visited this example . I calculated the exceedance probability (EP), i.e. the probability that the lifetime of manufacturer B’s components (individual units) is greater than the lifetime of manufacturer A’s components. The result is EP(XB>XA)=77.8%. In other words, the lifetime of manufacturer B’s components is greater than the lifetime of manufacturer A’s components at an odds of 3.5:1. I also calculated the relative mean effect size (RMES). The result is RMES=17.79%. That is, the mean lifetime of manufacturer B’s components is greater than the mean lifetime of manufacturer A’s component by 17.79%. Based on the values of the EP and RMES, we should have a preference of manufacturer B. In my opinion, the meaning of exceedance probability (EP) is clear without confusion; a person even not trained in statistics can understand it. The exceedance probability (EP) analysis, in conjunction with the relative mean effect size (RMES), provides the valid solution to this example.
 Trafimow D and Marks M 2015 Editorial Basic and Applied Social Psychology 37 1-2
 Cumming G 2014 The New Statistics Psychological Science 25(1)DOI: 10.1177/0956797613504966
 Roberts N A 1964 Mathematical Methods in Reliability Engineering McGraw-Hill Book Co. Inc. New York
 Jaynes E T 1976 Confidence intervals vs Bayesian intervals in Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science, eds. Harper and Hooker, Vol. II, 175-257, D. Reidel Publishing Company Dordrecht-Holland
 Huang H 2022 Exceedance probability analysis: a practical and effective alternative to t-tests. Journal of Probability and Statistical Science, 20(1), 80-97. https://journals.uregina.ca/jpss/article/view/513
At the US Energy Information Administration (EIA), for various establishment surveys, Official Statistics have been generated using model-based ratio estimation, particularly the model-based classical ratio estimator. Other uses of ratios have been considered at the EIA and elsewhere as well. Please see
At the bottom of page 19 there it says "... on page 104 of Brewer(2002) [Ken Brewer's book on combining design-based and model-based inferences, published under Arnold], he states that 'The classical ratio estimator … is a very simple case of a cosmetically calibrated estimator.'"
Here I would like to hear of any and all uses made of design-based or model-based ratio or regression estimation, including calibration, for any sample surveys, but especially establishment surveys used for official statistics.
Examples of the use of design-based methods, model-based methods, and model-assisted design-based methods are all invited. (How much actual use is the GREG getting, for example?) This is just to see what applications are being made. It may be a good repository of such information for future reference.
Thank you. - Cheers.
I am working with a distribution in which the support value of x depends upon the scale parameter of the distribution and when I obtaining the Fisher Information of the MLE, it exists and giving some constant value. So, in order to find the asymptotic variance of the parameter, can I take the inverse of the Fisher Information matrix even they violets the C.R. Regularity condition, and will it hold the normality property of the MLE??
Please suggest to me how can I proceed to find the confidence interval of the parameter.
Dear Colleagues ... Greetings
I would like to compare some robust regression methods based on the bootstrap technique. the comparison will be by using Monte-Carlo simulation (the regression coefficients are known) so, I wonder how can I bootstrap the determination coefficient and MSE.
Thanks in advance
Nowadays there is an increasing debate among the researcher around the globe regarding the significant p-value in the statistical testing. They are in the conclusion that instead of p-value we can choose point and interval estimate ( OR, RR & CI) whether it is significant or not.
I have recently read an article in Nature. which conclude that scientific inference is more important than statistical inference then what is the importance of statistical testing in the research if it cannot conclude what is the reality? If it just roaming around only old established scientific relationship? Is it the failure of statistics? What would be the way forward for the researchers to consider the proper statistical test?
The link of an article which I read in Nature is here
Thanking you all in advance for your reply.
Quais são atualmente, as medidas de frequência, inferências estatísticas e/ou indicadores mais utilizadas em estudos sobre produção ambulatorial / força de trabalho em saúde?
What are currently the most frequently used measures of frequency, statistical inferences and/or health indicators in studies on outpatient production / health workforce?
I measured call rates for manatee vocalizations and have a range of call rates from 0-11 calls/min. The data set for all observations are not normally distributed, however I want to report call rates for individual groups (ie. Call rates for one manatee, groups of two, etc). Is it more appropriate to use the median and interquartile range to describe each set or would the mean and standard deviation be acceptable. I do not have a large standard deviation when looking at individual groups. Ie. (Call rates for groups of 6 animals was 6.28 calls/min with a standard deviation of 4.21)
In general term, Bayesian estimation provides better results than MLE . Is there any situation, Where Maximum Likelihood Estimation (MLE) methods gives better results than Bayesian Estimation Methods?
I have a panel dataset of N productive units over T periods of time. I estimate a model by maximum likelihood, and I estimate clustered standar errors, given the non independent nature of the sample, wich is the normal case in panel datasets.
Next, I want to test for the best specification of several nested models. Initially, I was using the likelihood ratio (LR) test in Stata using the "force" option. I came to realize that was wrong, because one of the assumptions of the test is that the observations are independent, which is not my case.
My question is, is there other test to test the performance of several nested models? is there any modification to the LR test in the clusters errors case?
Many thanks in advance to anyone who can shed light in this regard.
I am struggling with statistics for price comparison.
I would like to check if the mean market price of a given product A differs over two consecutive time periods, namely from December till January and from February till March.
My H_0 would be that means are equal and H_1 that from Feb till March the mean is lower.
For this I have all necessary data as time series, sampled at the same frequency.
I thought of using the paired t-test, yet price distribution is not normal (extremely low p-value of Shapiro-Wilk test).
I guess that the two random samples of two groups cannot be treated as independent, as my intuition is that price in February would depend on price in January.
Do you know any test that would fit here? Given the nature of the problem?
Thanks in advance
Could you please help me by explaining how to generate samples from the Bivariate exponential distribution (Downton's Bivariate exponential distribution ) with details. I'd be grateful if you provide me the related references and sources Thanks in advance.
I wish you all the best
Are there reasons to use a biased estimator for a parameter when there is an unbiased estimator for it?
As I recall, I saw an estimator which combined a ratio estimator and a product estimator using one independent variable, x, where the same x was used in each part. Here is my concern: A ratio estimator is based on a positive correlation of x with y. A product estimator is based on a negative correlation of x with y. (See page 186 in Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.) So how can the same x variable be both positively and negatively correlated with y at the same time?
Can anyone explain how this is supposed to work?
Thank you for your comments.
In statistics, there are existing formulas for the sample size with confidence intervals to estimate parameters (mean, proportion…) of the population (see the link). But if we add bootstrapping into such process, is there any exact formula? I notice there are books or discussions saying that the sample size cannot be too small. But I just wonder is there any direct formula to calculate the sample size?
I have seen several references to "impure heteroscedasticity" online as heteroscedasticity caused by omitted variable bias. However, I once saw an Internet reference, as I recall, which reminds me of a phenomenon where data that should be modeled separately are modeled together, causing an appearance of increased heteroscedasticity. I think there was a youtube video. That seems like another example of "impure" heteroscedasticity to me. Think of a simple linear regression, say with zero intercept, where the slope, b, for one group/subpopulation/population is slightly larger than another, but those two populations are erroneously modeled together, with a compromise b. The increase in variance of y for larger cases of x would be at least partially due to this modeling problem. (I'm not sure that "model specification error" covers this case where one model is used instead of the two - or more - models needed.)
I have not found that reference online again. Has anyone seen it?
I am interested in any reference to heteroscedasticity mimicry. I'd like to include such a reference in the background/introduction to a paper on analysis of heteroscedasticity which, in contrast, is only from the error structure for an appropriate model, with attention to unequal 'size' members of a population. This would then delineate what my paper is about, in contrast to 'heteroscedasticity' caused by other factors.
I know that there are different tests like cronbach's alpha, kuder-Richardson formulas etc but I am not sure about choosing right tests. The objectives of my research are to evaluate current techniques used in investment decisions (capital budgeting) in Pakistan and also to evaluate the impact of different factors ( i.e. age of CFO, CEO, industry type, education of CFO, Experience of CFO etc) affecting capital budgeting on the selection of a particular method of capital budgeting technique.
I am preparing a review on the subject of Design of Experiment.
The standard literature [1,2] describes the One factor at Time methods and the factorial experiments without any mathematical proof of them.
I understand the reason of these methods but I would like a citation to refer to the mathematical proof.
Any help will be highly appreciated
 Box, George EP, J. Stuart Hunter, and William Gordon Hunter. Statistics for experimenters: design, innovation, and discovery. Vol. 2. New York: Wiley-Interscience, 2005.
 Jobson, John. Applied multivariate data analysis: volume I and II: Springer Science & Business Media, 2012.
The bayesian approach offers enormous advantages with respect to classical frequentist method when there is genuine a-priori information on the parameters of the model to be estimated. For the problem at hand is there a priori information?
My opinion is that it is impossible basing on finite random sample. If anybody do not agree, so, would you kindly propose estimator for the case of a family of distributions given in attached pdf file. I am interested in both things: a) construction of such estimator (if possible) b) why it is impossible to construct estimator (if impossible).
Thank you for consideration!
My DV is measured two times per observations, once pre- and once post-treatment (interval scale). The IV has multiple levels and is categorical.
I am interested in how the different levels of the IV differ in their effect on changes from pre- to post-treatment.
An OLS regression of IV with change scores (post minus pre measurements) as DV.
An OLS regression of IV and pre-treatment measurement of the DV with post-treatment measurement as the DV.
An OLS regression of IV interacted with pre-treatment measurement of the DV with post-treatment measurement as the DV
What are the up- and downsides of these three approaches? Do they only differ with respect to the hypotheses they test, or are there any technical/statistical con's to some of these appraoches? Especially concerning the second possibility, I came accross criticisms because of endogeneity, i.e. correlation of unobserved effects in the error term with the first-round measurements of the DV as the IV. Also, the second and third possibility, in my specific case, have very high R-squared due to the inclusion of pre-treatment measurements, compared to the first possibility. I feel that this comes at some costs, though.
Thanks in advance
So, I'm utilizing the gene ontology enrichment analysis on a list of proteins and specifically for cellular components
The tool is present on this site: http://geneontology.org/
Now, my question is: If I want to try and compare between the different sets, is it better to look at the p-values or the fold enrichment. Bear in mind that all of these sets have passed the "significance" value.
I also noticed that the bigger the set is, the more significant the result because I suppose you have a higher "n". So, I supposed looking fold enrichment is more interesting? I'm not sure.
"Survey" is a very broad term, having widely different meanings to a variety of people, and applies well where many may not fully realize, or perhaps even consider, that their scientific data may constitute a survey, so please interpret this question broadly across disciplines.
It is to the rigorous, scientific principles of survey/mathematical statistics that this particular question is addressed, especially in the use of continuous data. Applications include official statistics, such as energy industry data, soil science, forestry, mining, and related uses in agriculture, econometrics, biostatistics, etc.
Good references would include
Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
Lohr, S.L(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole.
Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang.
For any scientific data collection, one should consider the overall impact of all types of errors when determining the best methods for sampling and estimation of aggregate measures and measures of their uncertainty. Some historical considerations are given in Ken Brewer's Waksberg Award article:
Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,
(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.
In practice, however, it seems that often only certain aspects are emphasized, and others virtually ignored. A common example is that variance may be considered, but bias not-so-much. Or even more common, sampling error may be measured with great attention, but nonsampling error such as measurement and frame errors may be given short shrift. Measures for one concept of accuracy may incidentally capture partial information on another, but a balanced thought of the cumulative impact of all areas on the uncertainty of any aggregate measure, such as a total, and overall measure of that uncertainty, may not get enough attention/thought.
What are your thoughts on promoting a balanced attention to TSE?
I'm looking at the National Diet and Nutrition Survey data. I'm interested in studying the association between some variables. My question is, although the survey is a 2 stage cluster survey, can i use the unweighted data as i'm not interested in the population parameters as much as i'm in the relation between variables. Must i use complex survey analysis or can i use normal analysis procedure. Again, I'm just interested in the relation bet some variables. The data set has several weighting variables to make results representative of the whole UK population.
I am grading exams in an introductory cognition course, and I am again annoyed by answers that I consider simplified to the point of being misleading, or just plain misleading, then finding that is mostly a fair interpretation of the book.
Bottom-up versus top-down processing is presented as an important debate with the supposedly radical solution that both might happen. Students come away with a notion that it is possible to have pure top-down processing, and it is an exceptional student who notices that this would be hallucination entirely detached from reality.
Categorical perception is presented as evidence for speech perception being special, with a two sentence disclaimer that most students miss. It is not presented as explainable by Bayesian cue combination involving remembered prototypes, but then the concept of perception as statistical inference doesn't come up anywhere in the book. The next piece of evidence for speech perception being special is the McGurk effect, but there is no attempt to explain that as top-down input from multisensory cue combination feeding back on categorical perception.
Heuristics in reasoning are presented as simply a fact of life. Concepts such a computational and opportunity costs don't get a look in.
The general approach is to present a lot of mostly disconnected facts, heavy on the history of discovery in the field. Last time I checked, there was not a lot of difference between introductory text books. They occasionally selected different examples, or covered topics to different degrees, but I haven't found anything that abandons the history of science for science, and that tries to present a more coherent and integrated view. Does anyone know of a book that does?
After reading papers on power analysis (e.g. Colegrave & Ruxton 2003 in Behavioral Ecology; Goodman & Berlin 1994 in Annals of Internal Medicine) I got the following impressions:
a) Ex-ante power analysis is informative as long as the experiment is not yet conducted. After having conducted the experiment and after having done the statistical analysis, ex-ante power analysis is not more informative than p-values.
b) Reporting confidence intervals (e.g. for estimated regression coefficients) in case of insignificant coefficients provides information about the reliability of the results. By that, I mean that the confidence intervals can inform about what many researchers want to know when results are not significant (i.e. p-values are higher than some magic threshold), i.e. how reliable (considering the Type II eror) the results are. With 'reliable' I mean that some treatment-effects are within the confidence interval and are therefore likely to represent the true treatment effect.
Are both my impressions correct? Specifically, is it valid to say something like "We conducted a power analysis prior to the experiment. We report confidence intervals on our estimated regression coefficients in order for the reader to assess how 'strong' our support for the null hypothesis is (instead of ex-post power analysis, which was demanded by some scientists)"
Thanks in advance for the answers.
Cochran (1953, 1963, 1977), Sampling Techniques, is generally consider a classic, was very well and compactly written, and very useful, but is from an era when probability of selection/design-based methods were virtually used exclusively. However, even so, Cochran did use modeling in his book (my copy is the 3rd ed, 1977) to explain variance under cluster sampling in section 9.5, and to compare some estimation techniques for population totals, when one uses clusters of unequal size, in section 9A.5. He used "g" as a coefficient of heteroscedasticity. He also showed, in section 6.7, when the classical ratio estimator is a best linear unbiased estimator (BLUE), giving the model in equation 6.19, saying when this is "...hard to beat." -- Ken Brewer noted that if you count the use of the word "model" or "models" in Cochran's Sampling Techniques, first edition, 1953, there are substantially more instances in the second half (22) than the first (1). In section 3.3, "The appearance of relevant textbooks," of his Waksberg Award article linked in the references below, Ken Brewer says that "... I had the distinct impression that the more [Cochran] wrote, the more he was at ease in using population models."
Cochran, W.G. (1953), Sampling Techniques, 1st ed., Oxford, England: John Wiley
Cochran, W.G. (1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,
(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.
What other survey sampling text of that period made use of models?
I have one bulk soil sample which i will extract by two different methods. Both methods have around 6-7 steps before the purified protein is ready for mass spectrometery identification of the different proteins extracted by each method. The bulk sample is split in two A and B , sample A is subjected to a boiling treatment, some sonication and then is split into 5 parts for overnight protein precipitation . The five subsamples are cleaned and precipitated with methanol etc . Three different samples of these are separated on an SDS page gel. Each lane is in-gel digested and run through a mass spectrometer.
The second sample B is treated in a different manner for two processes and then is split into 8-9 for the overnight protein precipitation using a different chemistry. The precipitated protein is then centrifuged the next day and cleaned with solvent and prepared for mass spectrometry. Three different samples from B are then run on an SDS-page gel, in-gel digested and run on a mass spectrometer, giving MS and MS /MSprofiles.
I want to compare the groups of proteins I have extracted by both extraction methods A and B and make inferances from the differences. Perhaps to say that A method extracts proteins which belong more to membrane type protein families than method B . To do this I think I need triplicate observations so that I can make some statistical inferences. Can I call these triplicates of observations biological replicates rather than technical replicates ?
This question relates to one I asked previously (https://www.researchgate.net/post/How_to_deal_with_large_numbers_of_replicate_data_points_per_subject). I have performed a generalized estimating equations model (binary logistic regression) with SPSS on categorical repeated measures data. As factors my model has condition (3 levels) and cell diameter (3 levels, cells grouped by diameter), plus their interaction. In my results, the main effect of diameter and the interaction term are significant, and looking at pairwise comparisons I can see exactly which groups differ from which as the effect only concerns some pairs, which is the expected result.
My question concerns reporting: I would like to use odds ratio in addition to p-values since it gives a better idea of which differences are really notable, but I can’t find a way to get the odds ratio for only some specific pairwise comparisons. The parameter estimates includes B for each main effect and interaction, which when exponentiated gives the odds ratio, but I’m not sure if it’s possible and how to extrapolate the odds ratio for specific pairs (since each of my factors has 3 levels)? Or can it be calculated using any of the information in the pairwise comparisons EM means output (it provides mean difference and standard error for each contrast, but since I have categorical data these seem to be inapplicable anyway)?
Thank you very much for any help!
Suppose that we have observed data (x_1, x_2, .. x_n) that may or may not fit a particular distribution better than some given distributions. In many scholarly articles it is found that researchers use to multiply/devide or say scale the given data by a small/large value, say m. The scaled data then further have been used to show that it fits the particular distribution better than those given distributions based on criteria like negative log-likelihood, AIC, BIC, KS test, etc. Further often citing computational convenient reason statistical inference like maximum likelihood estimates, asymptotic confidence interval, Bayes, Bayes credible and HPD interval estimates have been obtained for the scaled data instead of the given data. It may happen that after scaling the data a particular distribution may fit better or worse than the others and also may depend upon the value of m.
I would like to thank all the concerned persons in advance for their views. One may also provide Scholarly articles relevant to this discussion.
NO. S1 d2 S2 d2 T1 d2 T2 d2
1 14.65 0.69 23.67 0.02 12.76 2.72 23.85 0.19
2 16.97 9.94 26.98 11.97 18.04 13.19 23.71 0.09
3 10.54 10.74 22.67 0.72 1 2.75 2.75 25.45 4.16
4 16.23 5.82 20.12 11.56 12.73 2.82 21.65 3.10
5 13.19 0.39 25.84 5.38 14.31 0.01 20.15 10.63
6 11.32 6.23 21.84 2.82 15.86 2.11 25.65 5.02
Total 82.90 33.82 141.12 32.48 86.45 23.59 140.46 23.19
Mean 13.82 23.52 14.41 23.41
Potency (Units/ml.) = Potency Ratio x Dilution Factor x Std. Dilution.
I am interested in historical references such as the examples below.
This question is with regard to inference from all surveys, but especially for establishment surveys (businesses, farms, and organizations, as noted by the International Conference[s] on Establishment Surveys, ICES).
Ken Brewer (formerly of the Australian National University, and elsewhere) has not only contributed greatly to the development of inference from surveys and the foundation of statistical inference, he also has been a recorder of this history. We often learn a great deal from history, and the cycles perhaps contained within it. Often one might see a claim that an approach is new, when it may be simply a rediscovery of something old. Synthesizing knowledge and devising the best/optimum approach for a given situation takes both innovation, and a basic knowledge of all that has been learned.
A prime example is Ken Brewer's Waksberg article:
Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,
(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.
Ken believed in using probability sampling and models together, but he explains the different approaches.
And there is this amusing account:
Brewer, K. (2005). Anomalies, probing, insights: Ken Foreman’s role in the sampling inference controversy of the late 20th century. Aust. & New Zealand J. Statist., 47(4), 385-399.
There is also the following:
Hanif, M. (2011),
Brewer, K.R.W. (1994). Survey sampling inference: some past perspectives and present prospects.
Pak. J. Statist., 10A, 213-233.
Brewer, K.R.W. (1999). Design-based or prediction-based inference stratified random vs stratified balanced sampling.
Int. Statist. Rev., 67(1), 35-47.
On page 2, "History and Ubiquity," in https://www.researchgate.net/publication/261474011_The_Classical_Ratio_Estimator_Model-Based
there is a very short account which I included, as recommended by Ken Brewer. Note that the model example that Ken gave in his Waksberg Award paper was for the case of the classical ratio estimator.
There is also this:
Brewer, K. and Gregoire, T.G. (2009). Introduction to survey sampling. Sample surveys: design, methods, and applications, handbook of statistics, 29A, eds. Pfeffermann, D. and Rao, C.R., Elsevier, Amsterdam, 9-37.
But I especially like the humorous aspect of Ken Brewer's account of the trials and tribulations of Ken Foreman, referenced above.
Other characters who may loom large would include William Cochran, Morris Hansen, Leslie Kish, Carl-Erik Sarndal, Fritz Schueren, and many others. A good current example would be Sharon Lohr.
Do you have other references to the history of survey inference?
I have dependent variable of "rational decision making" with two independent variables of Leadership and "Self efficacy". When i use only one independent variable of leadership, it shows positive relation with rational decision making which is consistent with the theory. However, when I added variable of self efficacy as another independent variable then the leadership showed negative relation with rational decision making which is inconsistent with the theory. Why it happen?
I am building a statistical distribution model for my data. And I am working with several data sets, each has different data range.
In order to compare the results of my model I needed to unify the range of my data sets, so I thought about normalizing them. A technique called feature scaling is used, and it is performed as such;
(xi -min(X))/( max(X) - min(X)) , where xi is the data sample and X is the data set.
However, by checking the mean and standard deviation of the normalized data, it is found to be almost zero. Does that affect the statistical distribution of my original data?
I'm going to try to make this kind of general as I tend to be a bit wordy:
I am dealing with 3 data sets (18 treated w/ X, 18 treated w/ X2, 18 controls) at 3 time points (33 = 9 sets total).
Each of my 9 individual data sets are nonparametric or not normal:(
To test for any sig. differences between control (untreated) groups across the 3 time points, I used;
Kruskall-Wallis test -> result: no sig. difference in control group medians over time.
I also the software Mintab Express software to perform a test for equal variances [I specified that data was not normal (apparently "Multiple Comparisons and Levene's methods" were used);-> result: no significant differences in control group variance over time.
Now, My question:
- Since I have established that there is no significant difference in control groups w/ respect to both median and variance over time...does this give me carte blanche to compare treated data sets from different time points to each other using the same tests?
- In other words, does finding no sig. difference in median & variance over time in controls effectively rule out any significant chance of effects related to time alone, and not to treatments (these were mussels kept in tanks), having a significant effect on data?
Also: Are there any other tests I could perform to identify significant differences in distribution of (nonparametric) data with dose/duration?
I know next to nothing about stats and would appreciate any help I can get! Thanks!!!
In the first model: a= b0 + b1x + b2y + b3z.
b1 is insignificant.
Second model: a= b0 + b1x + b2y + b3z + b4yz.
b1 becomes significant.
I get what it means when a regular control suppresses a relationship, but what does it mean when an interaction variable suppresses a relationship? What is the interaction coefficient supposed to mean by itself?
Thank you for any help in advance!!
When fitting a non-linear trend, how to judge whether the used function is over fitting or under fitting? Is there any hypothesis testing? For example, model 1): LogRR=BiX when X<3, Bi=B1; When X>=3, Bi=B2; Model 2): LogRR=B1X+B2X2; We assume X=3 is the cutpoint of the curve; How to judge which model is the best one?
Please, is anyone have some references regarding the influence of sampling on inference when using bayesian statistics?
I just beginning to use bayesian and I try to better understand some results on personal datas with verry heterogeneous sample size.
Thank's in adance
At the following link, on the first page, you will see a categorization of heteroscedasticity into that which naturally should often be expected due to population member size differences, and that which may indicate an omitted variable:
posted under LungFei Lee - Ohio State:
This is nicely related to the following YouTube video - about five minutes long:
Anonymous (? ):
There are a number of very nice presentations by the following author, which may be found on the internet. Here I supply two such links:
Universidad de San Andrés,
University of Illinois
Though those presentations are excellent, in my experience I think it better to account for heteroscedasticity in the error structure, using a coefficient of heteroscedasticity, than to use the OLS estimate and adjust the variance estimate. At least in a great deal of work that I did, though the expected slope should be no problem, in practice the OLS and WLS slopes for a single regressor, for linear regression through the origin, can vary substantially for highly skewed establishment survey data. (I would expect this would also have impact on some nonlinear and multiple regression applications as well.)
Finally, here is one more good posting that starts with omitted variables, though the file is named after the last topic in the file:
posted under Christine Zulehner
Goethe University Frankfurt
University of Vienna:
My question is, why adjust OLS for omitted variables in the analysis, rather than start with WLS, when any heteroscedasticity may be largely from that which is naturally found? Shouldn't one start with perhaps sigma_i-squared proportionate to x, as a size measure (or in multiple regression, an important regressor, or preliminary predictions of y as the size measure), and see if residual analysis plots show something substantially larger for heteroscedasticity, before seeking an omitted variable, unless subject matter theory argues that something in particular, and relevant, has been ignored? -- Further, if there is omitted variable bias, might a WLS estimator be less biased than an OLS one???
Thank you in advance for any constructive comments. - Jim
PS - The video example, however, does seem somewhat contrived, as one might just use per capita funding, rather than total funding.
In Galit Shmueli's "To Explain or Predict," https://www.researchgate.net/publication/48178170_To_Explain_or_to_Predict, on pages 5 and 6, there is a reference to Hastie, Tibshirani, and Freedman(2009), for statistical learning, which breaks expected square of the prediction error into the two parts of a variance of a prediction error and the square of the bias due to model misspecification. (Variance - bias tradeoff is discussed in Hastie, et.al. and other sources.)
An example of another kind of variance bias tradeoff that comes to mind would be the use of cutoff or quasi-cutoff sampling for highly skewed establishment surveys using model-based estimation (i.e., prediction from regression in such a cross-sectional survey of a finite population). The much smaller variance obtained is partially traded for a higher bias applied to small members of the population that should not be very much of the population totals (as may be studied by cross validation and other means). Thus some model misspecification will often not be crucial, especially if applied to carefully grouped (stratified) data.
[Note that if a BLUE (best linear unbiased estimator) is considered desirable, it is the estimator with the best variance, so bias must be considered under control, or you have to do something about it.]
Other means to tradeoff variance and bias seem apparent: General examples include various small area estimation (SAE) methods. -
Shrinkage estimators tradeoff increased bias for lower variance.
Are there other general categories of applications that come to mind?
Do you have any specific applications that you might share?
Perhaps you may have a paper on ResearchGate that relates to this.
Any example of any kind of bias variance tradeoff would be of possible interest.
Article To Explain or to Predict?
How to get the statistical inference of the difference of two coefficient of variation (CV)?
We know some methods, such as Levene's test, can be used to compare two standard deviation. But standard deviation is often proportional to the mean. Is there any method/tool can be used to conduct comparison between CVs?
Sometimes researchers applied inferences to samples that were selected intentionally, for many authors there is a mistake. Always the statistic inference is applied to probabilistic sample selection?
Thank you in advance
Let T be a random variable with Gamma distribution:
What is the probability density function of θ ̂, where
θ ̂= v1 / v2
v1 = a0*T/((n-1) ) + a1*T2/((n-1)(n-2))+ a2*T3/((n-1)(n-2) (n-3))
v2= a0+a1*T/(n-1) + a2*T2/((n-1)(n-2)) , where, a0, a1, a2 are constants.
Many thanks in advance
With Best Regards
Most of the times I read fundamental of Statistical Inference in Mathematical way, But I want to know How, When,Why use the different kind of element use like sufficiency, Efficiency, Rao & Blackwell theory.
Some my teacher or elder brother said to read Mood, Grybill or Hoog and Craig or Casela Berger books but I want to learn easily like an non statistical background people.
I will really happy If you give me some references about books, video lectures, or lectures.
I want to learn more please anyone help me.help me.
I use Fisher's exact test a lot. Here is an example in http://en.wikipedia.org/wiki/Fisher's_exact_test
Dieting 1 9
Non-dieting 11 3
Using Fisher's exact test I can know whether the proportion of dieting in women is significantly higher than in men.
The problem is that there might exist other factors affeting dieting besides gender, such as age, current weight, etc. For example, I also have age data for each individual, and I want to control the factor "age". That is, I want to know whether the proportion of dieting in women is significantly higher than in men while excluding the influence of age. Which statistics method should I use?
Fisher introduced the concept of fiducial inference in his paper on Inverse probability (1930), as a new mode of reasoning from observation to the hypothetical causes without any a priori probability. Unfortunately as Zabell said in 1992: “Unlike Fisher’s many original and important contributions to statistical methodology and theory, it had never gained widespread acceptance, despite the importance that Fisher himself attached to the idea. Instead, it was the subject of a long, bitter and acrimonious debate within the statistical community, and while Fisher’s impassioned advocacy gave it viability during his own lifetime, it quickly exited the theoretical mainstream after his death”.
However during the 20th century, Fraser (1961, 1968) proposed a structural approach which follows the fiducial closely, but avoids some of its complications. Similarly Dempster proposed direct probability statements (1963), which may be considered as fiducial statements, and he believed that Fisher’s arguments can be made more consistent through modification into a direct probability argument. And Efron in his lecture on Fisher (1998) said about the fiducial distribution: “May be Fisher’s biggest blunder will become a big hit in the 21st century!”
And it was mainly during the 21st century that the statistical community began to recognise its importance. In his 2009 paper, Hannig, extended Fisher’s fiducial argument and obtained a generalised fiducial recipe that greatly expands the applicability of fiducial ideas. In their 2013 paper, Xie and Singh propose a confidence distribution function to estimate a parameter in frequentist inference in the style of a Bayesian posterior. They said that this approach may provide a potential conciliation point for the Bayesian-fiducial-frequentist controversies of the past.
I already discussed these points with some other researchers and I think that a more general discussion seems to be of interest for Research Gate members.
Dempster, A.P. (1963). On direct probabilities. Journal of the Royal Statistical Society. Series B, 25 (1), 100-110.
Fisher, R.A. (1930).Inverse probability. Proceedings of the Cambridge Philosophical Society, xxvi, 528-535.
Fraser, D. (1961). The fiducial method and invariance. Biometrika, 48, 261-280.
Fraser, D. (1968). The structure of inference. John Wiley & Sons, New York-London-Sidney.
Hannig, J. (2009). On generalized fiducial inference. Statistica Sinica, 19, 491-544.
Xie, M., Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81 (1), 3-77.
Zabell, S.L. (1992). R.A. Fisher and the fiducial argument. Statistical Science, 7 (3), 369-387.
I am studying seasonal changes in abundance of a fish species along a disturbance gradient. I sampled three locations at four seasons. My sampling sites at each location were very heterogeneous and the overall data was overdispersed . I am planning to analyze data using a GLMM with zero inflated model, considering LOCATION as a fixed factor and sampling site as a random factor. Should I also consider SEASON as a random factor (due to probable autocorrelation) or just nest it within LOCATION?
Have you used this estimator, or know of it being used? Are you familiar with its properties? Do you agree or disagree with the analysis given in the attached notes, and why or why not? When would you use a chain ratio-type estimator or related estimator?
(Note that this is a replacement for a question submitted a couple of days ago, which received a good reply regarding use of the chain ratio-type estimator for estimating crop yield with biometric auxiliary data, but ResearchGate dropped the question and Naveen's reply. The crop yield reply seemed to me to reinforce the idea of a nonlinear model, as noted in the attached paper.)
Technical Report Note on Ratio and Chain Ratio-Type Estimators: Gamma and Alp...
Let X distributed as Laplace distribution with (a,b) where a , b are the location and scale parameter respectively. If we want to estimate the scale parameter (b) we can assume that one of:
1. The location parameter is a constant and known
2. The location parameter is unknown so, we can use median as an estimator of the location (median is the Maximum-likelihood estimator for location of Laplace distribution) then, derive the scale estimator for laplace distribution.
Do you have another approach or suggestion please.?
Thank you in advance for any help you can provide!
In the article „Does climate change affect period, available field time and required capacities for grain harvesting in Brandenburg, Germany?“ by A. Prochnow et al. (Agricultural and Forest Meteorology 203, 2015, 43–53), the authors calculate (simple) moving averages of the considered quantities („number of hours per year available for harvesting grain with defined moisture content“; coefficient of variation; total sunshine duration; etc.) for time periods of 30 years in steps of one year (i.e., 1961-1990, 1962-1991, …, 1984-2013) first. After that, they derive the trend of these averaged values and use these values to estimate the significance with the help of the „t-test for the regression slope“ (see their section 2.4 and Table 5 and Table 6). This way most of these trends are proofed to be significant with p<0.01.
I am convinced that this procedure is wrong. I learned that the values y(i) and y(j) (or the residuals) which are entering the regression procedure (especially if the significance is supposed to be tested with a t-test) must be statistically independent (see, e.g., http://en.wikipedia.org/wiki/Student%27s_t-test#Slope_of_a_regression_line). But the moving averaged values are highly correlated and not at all independent. This violation of the precondition results in a much to small estimate for the standard deviation s of the y(i). The reduction factor is even smaller than 1/sqrt(30)=1/5.5 (because the computation interval is shorter than the correlation length). My own tests (see attached figures) showed that the standard deviation is reduced by a factor of about 1/15. The computation of the significance level with this very small standard deviation gives very high significant results (i.e., very small p-values). But the truth of the matter is that there is no significance at all and all trends could be emerged by pure accident.
Does anybody agree with my belief? (I can not believe that four authors and two or three reviewers/referees did not recognize this pitfall!) I also appreciate hints of the authors in case that I understood anything wrong.
I have written a review for the above mentioned article because there are more shortcomings in this paper (e.g., no consideration of multiplicity at p=0.05 but 12 or 16 test in one table (see their Table 3 and 4) with only a few significant results // very sophisticated regression functions (their Table 1) for estimating the „hours within classes for grain moisture contents“ with r^2 up to 0.99 (given references are not downloadable); I assume that these regressions are derived by stepwise multiple regression but are overfitted and could not withstand an external validation) // The probabilities of the results of Figure 4 („inclusive all more severe cases“) can be derived by means of the Binomial distribution and are not small enough (all greater than 0.1) to indicate any significance // et cetera).
I would like to know if there is inconvenients in using aAUC to stydy tumor growth in xenograft mice ? I also would like to know if bootstrap method is good to estimate a confident interval, knowing the samples are small (100 mice per group).
Thanks for your answers !
Many might first think of Bayesian statistics.
"Synthetic estimates" may come to mind. (Ken Brewer included a chapter on synthetic estimation: Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press.)
My first thought is for "borrowing strength." I would say that if you do that, then you are using small area estimation (SAE). That is, I define SAE as any estimation technique for finite population sampling which "borrows strength."
Some references are given below.
What do you think?
To my knowledge, you can gain more than 2 cut offs if you performed a scatterplot. I don't think the same can be achieve using a ROC analysis, since it has a binary function. Any suggestions?
I have a scenario where I get negative R2 values for a linear fit. My algorithm relies on a goodness of fit measure to improve the model. But in the specific cases that I am working on, the R2 values are never becoming positive, let alone close to 1. What are the other measures by which I can quantify the goodness of fit?
PS: If it helps, then I am getting negative values because the data is non-stationary. I am trying to break the data to stationary chunks by looking at R2 values. But they seem to never become positive.
Anyone have the reference for logarithmic transformation for both dependent and independent variables on child growth (mixed effect models)?
I have the formula Y = (A-B)/m where A and B are averages from samples with sizes nA and nB, and m is a "slope" determined from a linear regression from q points. There are standard errors given for A, B and m (sA,sB,sm). I can calculate the standard error of Y by error-propagation as
sY = 1/m * sqrt( (sm)²*(A-B)/m² + (sA)² + (sB)² )
Now I want to get a confidence interval for Y, so I need the degrees of freedom for the t-quantile. A rough guess would be nA+nB+q-3.
However, somehow I doubt this, because if m would be known theoretically, sY would be simply sqrt ( (sA)²+(sB)² ) with nA+nB-2 d.f. - But when m would be known because q -> Infinity, then sm->0 and sY -> sqrt( (sA)² + (sB)² ) but, following the guess above, with infinitely many d.f. (df = nA + nB + Infinity - 3). Both cannot be correct at the same time.
So what is the correct way to get the d.f. and, hence, the confidence interval for Y?
(please assume that the errors of A, B and m are all normally distributed; please do not discuss alternatives to or applicabilities and problems of confidence intervals. You may well assume that this is a stupid question, because I may have overlooked some simple fact or made a wrong derivation... this can easily be the case, and I still would be thankful for any help)