Article

Some observations on inverse probability including a new indifference rule

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

‘If the first button is buttoned wrongly, the whole vest sits askew’ The main object of this paper is to propound and discuss a new indifference rule for the prior probabilities in the theory of inverse probability. Being invariant in form on transformation, this new rule avoids the mathematical inconsistencies associated with the classical rule of ‘uniform distribution of ignorance’ and yields results which, particularly in certain critical extreme cases, do not appear to be unreasonable. Such a rule is, of course, a postulate and is not susceptible of proof; its object is to enable inverse probability to operate as a unified principle upon which methods may be devised of allowing a set of statistics to tell their complete and unbiased story about the parameters of the distribution law of the population from which they have been drawn, without the introduction of any knowledge beyond and extraneous to the statistics themselves. The forms appropriate for the prior probabilities in certain other circumstances are also discussed, including the important case where the unknown parameter is a probability, or proportion, for which it is desired to allow for prior bias. Before proceeding to the main purpose of the paper, however, it is convenient to provide some background to the subject. Reference is first made to certain modern writers to indicate how the problem with which inverse probability is concerned occupies a central place in the foundations of scientific method and in modern philosophy. In quoting from these writers I am not to be taken as suggesting that they necessarily support the inverse probability approach to the problem. The next section of the paper contains some brief comments on the direct statistical methods which have been devised in recent times to side-step induction and inverse probability, and this is followed by a few remarks on the various definitions of probability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The entries of all gestures observed by one encoder of one specific video including Jeffreys prior (Perks, 1947). For each category, the sum of observed loadings is a constant (namely, 4). ...
... including Jeffreys prior) along with the most likely probability s ML . The exponents for Jeffreys prior for each category for four gesture words (Fig. 2) are the fractions in that graph (Perks, 1947). One superiority of Bayesian statistics (in addition to the possibility of defining probabilities for very small sample sizes) is the possibility of calculating probability uncertainties. ...
... The likelihood function of a Dirichlet distribution using a Jeffreys prior for an encoding of a gesture category with loadings {7,3,4} and a Jeffreys prior { , , }(Perks, 1947). (a) The likelihood surface shown in 3D; the light gray contours show the 95%, 90%, … fractions of the maximum likelihood. ...
... Though expressed here in the more familiar marginal clique and separator marginal cell probabilities, this second expression of the ratio of prior densities may be more difficult to apprehend than the first one in terms of cell probabilities. We note also that for the saturated model, the prior with α = 1 and all fictive cell counts equal, is the prior advocated by Perks [25] (see also Dellaportas and Forster [9]). ...
Preprint
In Bayesian analysis of multi-way contingency tables, the selection of a prior distribution for either the log-linear parameters or the cell probabilities parameters is a major challenge. In this paper, we define a flexible family of conjugate priors for the wide class of discrete hierarchical log-linear models, which includes the class of graphical models. These priors are defined as the Diaconis--Ylvisaker conjugate priors on the log-linear parameters subject to "baseline constraints" under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical log-linear models for a six-way contingency table.
... This class of probability estimates is also prominent in the information theory literature [6,10,24,25] and the statistical language modeling community [20]. The most widely advocated single value for is 1=2, for a diverse set of reasons [18,31,26]. For this reason, the special case = 1=2 has even been given its own name, namely, the Jereys-Perks law of succession [13]. ...
Preprint
Consider the problem of multinomial estimation. You are given an alphabet of k distinct symbols and are told that the i-th symbol occurred exactly n_i times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.
... We considered three initial estimates for the λ values: uniform, the Jeffreys-Perks rule of succession [6,14,10], and the natural law of succession [17]. The uniform estimate sets all λ values to 0.5. ...
Preprint
A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model. This result is somewhat remarkable because both models contain identical numbers of parameters whose values are estimated in a similar manner. The only difference between the two models is how they combine the statistics of longer and shorter strings. Keywords: nonuniform Markov model, interpolated Markov model, conditional independence, statistical language model, discrete time series.
... with lim α→0 P (X 2 = X 1 ) = 1 and lim α→+∞ P (X 2 = X 1 ) = 1 2 T −1 . Therefore, to ensure a significant probability of a tie when dealing with large values of T , in the following we set α = 1 2 T −1 , in the spirit of Perks (1947). ...
Preprint
Full-text available
We propose a novel model-based clustering approach for samples of time series. We assume as a unique commonality that two observations belong to the same group if structural changes in their behaviours happen at the same time. We resort to a latent representation of structural changes in each time series based on random orders to induce ties among different observations. Such an approach results in a general modeling strategy and can be combined with many time-dependent models known in the literature. Our studies have been motivated by an epidemiological problem, where we want to provide clusters of different countries of the European Union, where two countries belong to the same cluster if the spreading processes of the COVID-19 virus had structural changes at the same time.
... It is not hard to show that setting α = C/K for any constant C satisfies aggregation consistency (for any K and any kind of aggregated binning), while a constant value-like α = 1, corresponding to the uniform prior used for Fig. 5-does not. (Perks 1947 appears to have offered the first argument along these lines.) Fig. 6 shows samples from an aggregation-consistent prior with C = 2, so α = 2/K. ...
Preprint
Full-text available
Bayesian inference gets its name from *Bayes's theorem*, expressing posterior probabilities for hypotheses about a data generating process as the (normalized) product of prior probabilities and a likelihood function. But Bayesian inference uses all of probability theory, not just Bayes's theorem. Many hypotheses of scientific interest are *composite hypotheses*, with the strength of evidence for the hypothesis dependent on knowledge about auxiliary factors, such as the values of nuisance parameters (e.g., uncertain background rates or calibration factors). Many important capabilities of Bayesian methods arise from use of the law of total probability, which instructs analysts to compute probabilities for composite hypotheses by *marginalization* over auxiliary factors. This tutorial targets relative newcomers to Bayesian inference, aiming to complement tutorials that focus on Bayes's theorem and how priors modulate likelihoods. The emphasis here is on marginalization over parameter spaces -- both how it is the foundation for important capabilities, and how it may motivate caution when parameter spaces are large. Topics covered include the difference between likelihood and probability, understanding the impact of priors beyond merely shifting the maximum likelihood estimate, and the role of marginalization in accounting for uncertainty in nuisance parameters, systematic error, and model misspecification.
... When α i = 1/2, we have Jeffrey's prior [4,5], which is uniform on the n-dimensional unit ball in the L 2 norm. In contrast to Bayes-Laplace and Jeffrey's, Perks' prior [6] has parameter values that depend on the number of variables, α i = 1/(n+1), maximising the variance of entropy of the distribution that is sampled from that distribution (i.e. the Dirichlet distribution with α i = 1/(n + 1), i = 1, . . . , n + 1). ...
Preprint
Full-text available
We give copula densities for all symmetric Dirichlet distributions. In fact, since the number of variables in a Dirichlet distribution can be viewed in two different ways, there are two copulas for every Dirichlet distribution of which one is singular. We pay particular attention to the copulas of the Dirichlet distribution whose density is the normalised product of its own one variable marginal densities. The singular copula of that Dirichlet distribution is uniform and spherical in the Lp space with p equal to the number of free variables while the non singular copula is radial in the Lp space.
... This family of prior distributions divides a prior counts evenly across the 4 cells. Setting a = 1, 2 and 4, hence O:ij = 1/4, 1/2 and 1 Vi, j, gives respectively Perks prior (Perks, 1946), Jeffreys prior (Jeffreys, 1967) \~Te proceed by treating the missing cell counts as parameters to be estimated. \Ve sample in turn from the conditional distribution of these missing counts given the current response parameters, then the response parameters given the current augmented cell counts. ...
Thesis
p>When making predictions, analysing incomplete data from a medical trial or drawing inference from artificially altered data, one is required to make conditional probability statements concerning unobserved individuals or data. This thesis provides a collection of statistical techniques for inference when data is only partially observed. An efficient reversible jump Markov chain Monte Carlo algorithm for generalised linear models is constructed. This provides a formal framework for Bayesian prediction under model uncertainty. The construction of the algorithm is unique, relying on a simple and novel reversible jump transformation function. The resulting algorithm is easy to implement and requires no ‘expert’ knowledge. An inference framework for multivariate survey data subject to non-response is provided. Deviations from a ‘close to ignorable’ model are permitted through realistic a-priori changes in log-odds ratios. These a-priori deviations encode the prior belief that the non-response mechanism is non-ignorable. A current disclosure control technique is studied. This technique rounds partially observed data prior to release. A Bayesian assessment of this technique is given. This requires the construction of a Metropolis-Hastings algorithm, and the algorithms irreducibility is proven and discussed.</p
... With k the number of cells, Dirichlet(1/k, . . . , 1/k) was originally suggested by Perks (1947) and recommended as an "overall objective" prior by Berger et al. (2015). Fienberg and Holland (1972) evaluated the variation of the risks of the posterior means of the cell probabilities with respect to the Dirichlet parameters. ...
Preprint
Full-text available
In the analysis of two-way contingency tables, the measures for representing the degree of departure from independence, symmetry or asymmetry are often used. These measures in contingency tables are expressed as functions of the probability structure of the tables. Hence, the value of a measure is estimated. Plug-in estimators of measures with sample proportions are used to estimate the measures, but without sufficient sample size, the bias and mean squared error (MSE) of the estimators become large. This study proposes an estimator that can reduce the bias and MSE, even without a sufficient sample size, using the Bayesian estimators of cell probabilities. We asymptotically evaluate the MSE of the estimator of the measure plugging in the posterior means of the cell probabilities when the prior distribution of the cell probabilities is the Dirichlet distribution. As a result, we can derive the Dirichlet parameter that asymptotically minimizes the MSE of the estimator. Numerical experiments show that the proposed estimator has a smaller bias and MSE than the plug-in estimator with sample proportions, uniform prior, and Jeffreys prior. Another advantage of our approach is the construction of credible intervals for measures using Monte Carlo simulations.
... Because no queen was accepted in the P-M treatment, we could not run a GLM with binomial error distribution. Therefore, we calculated an odds ratio with a Bayes prior (Perks, 1947) for each replicate to compare queen acceptance between treatments. The odds ratio was calculated as: log ((number of alien queens accepted + 1)/ (number of alien queens rejected + 1)). ...
Article
Full-text available
Relatedness underlies the evolution of reproductive altruism, yet eusocial insect colonies occasionally accept unrelated reproductive queens. Why would workers living in colonies with related queens accept unrelated ones, when they do not gain indirect fitness through their reproduction? To understand this seemingly paradox, we investigated whether acceptance of unrelated queens by workers is an incidental phenomenon resulting from failure to recognize non-nestmate queens, or whether it is adaptively favored in contexts where cooperation is preferable to rejection. Our study system is the socially polymorphic Alpine silver ant, Formica selysi. Within populations some colonies have a single queen, and others have multiple, sometimes unrelated, breeding queens. Social organization is determined by a supergene with two haplotypes. In a first experiment we investigated whether the number of reproductive queens living in colonies affects the ability of workers at rejecting alien queens, as multiple matrilines within colonies could increase colony odor diversity and reduce workers' recognition abilities. As workers rejected all alien queens, independently of the number of queens heading their colony, we then investigated whether their acceptance is flexible and favored in specific conditions. We found that workers frequently accepted alien queens when these queens came with a workforce. Our results show that workers flexibly adjust their acceptance of alien queens according to the situation. We discuss how this conditional acceptance of unrelated queens may be adaptive by providing benefits through increased colony size and/or genetic diversity, and by avoiding rejection costs linked to fighting.
... The marginal probability density function for each individual component, Be(1, 2). Perks' prior [45] or reference distance prior [40] . It is a Dirichlet distribution with parameter α = 1 m ! . ...
Article
We present the Bayesian estimation of Permutation Entropy. In particular, we studied the bias and the mean squared error in the entropy estimation when the length of the time series embedded in the m-dimension space is much less than the limit 5m! necessary for all the patterns to be expressed. Using objective Dirichlets distributions as priors, we found that for low dimensions, when there are few missing patterns, the Bayes-Laplace distribution is the one that presents the best performance, while for high dimensions, when many missing patterns can be present, the Perk’s distribution minimizes the mean square error and bias. We also show how the posterior distribution of each parameter could behave in presence of missing values and give some discussion about the potential uses of this new approach for Permutation Entropy estimation.
... As an objective setting (Phillips, 1991b) proposed a Jeffreys' prior (Jeffreys, 1946;Perks, 1947). Jeffreys' priors (often called "ignorance priors") are defined up to a proportionality factor as ∝ |i|, where |i| is the determinant of the expected Fisher information matrix i. Jeffreys' priors render the posterior invariant under one-to-one reparametrizations and enjoy a number of desirable properties (Ly et al., 2016). ...
Preprint
Full-text available
This paper introduces a feasible and practical Bayesian method for unit root testing in financial time series. We propose a convenient approximation of the Bayes factor in terms of the Bayesian Information Criterion as a straightforward and effective strategy for testing the unit root hypothesis. Our approximate approach relies on few assumptions, is of general applicability, and preserves a satisfactory error rate. Among its advantages, it does not require the prior distribution on model's parameters to be specified. Our simulation study and empirical application on real exchange rates show great accordance between the suggested simple approach and both Bayesian and non-Bayesian alternatives.
... ϭ ␣ k ϭ 0. A Dirichlet prior with parameters ␣ 1 ϭ . . . ϭ ␣ k ϭ 1/k suggested by Perks (1947) was also considered in this study because of the coherence with the Jeffreys prior for the binomial case. More specifically, the Jeffreys prior for the binomial distribution with success probability is proportional to ...
Article
Full-text available
When a person takes alternative forms of the same test across replications of the testing procedure, the test taker’s observed scores on the alternative forms are rarely identical. In educational and psychological measurement, inconsistencies in a test taker’s scores that are irrelevant to the construct being measured are attributed to errors of measurement. Typically, errors of measurement are summarized as the standard deviation of a test taker’s observed scores over replication of the same testing procedure. Assuming that errors of measurement follow a multinomial distribution (i.e., multinomial error model), the main goal of this study was to propose two interval estimation procedures, which are referred to as the score-like and Perks procedures, for true scores of a test with polytomous items. The performance of the score-like and Perks procedures was compared with that of two normal approximation procedures under the multinomial error model and a procedure based on item response theory (IRT) through simulation. In general, the score-like and Perks procedures outperformed the other three procedures when data were generated under the multinomial error theory framework and showed reasonable results when data were generated under the IRT framework.
... For the final stage, only the maximum likelihood estimates are used because the RM algorithm is applied and the estimates gradually approach a stationary point. We set a c = 1/C (Perks 1947), which was found effective in the subsequent Monte Carlo studies. ...
Article
In diagnostic classification models (DCMs), the Q matrix encodes in which attributes are required for each item. The Q matrix is usually predetermined by the researcher but may in practice be misspecified which yields incorrect statistical inference. Instead of using a predetermined Q matrix, it is possible to estimate it simultaneously with the item and structural parameters of the DCM. Unfortunately, current methods are computationally intensive when there are many attributes and items. In addition, the identification constraints necessary for DCMs are not always enforced in the estimation algorithms which can lead to non-identified models being considered. We address these problems by simultaneously estimating the item, structural and Q matrix parameters of the Deterministic Input Noisy “And” gate model using a constrained Metropolis–Hastings Robbins–Monro algorithm. Simulations show that the new method is computationally efficient and can outperform previously proposed Bayesian Markov chain Monte-Carlo algorithms in terms of Q matrix recovery, and item and structural parameter estimation. We also illustrate our approach using Tatsuoka’s fraction–subtraction data and Certificate of Proficiency in English data.
... The fourth choice, which we will call the Jeffreys-Perks prior, is based on invariance on the circle or sphere i.e. ignorance of angles; see Jeffreys (1961) and Perks (1947). Consider assigning a probability to a binomial distribution with parameters ρ and 1 − ρ. ...
... • setting c = 1 recovers Perks' estimator [Perks, 1947, Hutter, 2013, with a regret of O(m ln T ) where m is the number of experts that make at least one good prediction, • setting c = N recovers Laplace's rule of succession, with a regret of O(N ln T N ), • setting c = N/2 recovers the KT estimator [Krichevsky and Trofimov, 1981], with a regret of O( N 2 ln T ). See [Hutter, 2013] for more details and comparison of these estimators. ...
Preprint
We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time relative to the number of experts in each round. Despite the unbounded nature of the log-loss, we derive a bound that is independent of the largest loss and of the largest gradient, and depends only on the number of experts and the time horizon. Furthermore we give a Bayesian interpretation of Prod and adapt the algorithm to derive a tracking regret.
... , x K ) = ψ k when γ = −1, the aforementioned interval is in fact the Haldane interval. The second interval is constructed with γ = −.5 and is referred to as the Jeffreys-Perks interval because the prior with γ = −.5 is the famous Jeffreys-Perks prior (Jeffreys, 1946;Perks, 1947). In contrast to the compound normal approximation interval estimation procedure, the Haldane and Jeffreys-Perks interval estimation procedures provide nondegenerate confidence intervals for all values of π 1 , . . . ...
Article
Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly drawn from a undifferentiated universe of items, and therefore might not be suitable for tests developed according to a table of specifications. To address this issue, four interval estimation procedures that use category subscores for the computation of confidence intervals are presented in this article. All four estimation procedures assume that subscores instead of test scores follow a binomial distribution (i.e., compound binomial error model). The relative performance of the four compound binomial–based interval estimation procedures is compared to each other and to the better known normal approximation and Wilson score procedures based on the binomial error model.
... Perks' prior or reference distance prior. This prior was firstly proposed by Perks (1947), but recently, it has been also obtained as the reference distance prior by Berger et al. (2015). This is a Dirichlet distribution with parameters˛D 1=M . ...
Article
The Dirichlet-multinomial process can be seen as the generalisation of the binomial model with beta prior distribution when the number of categories is larger than two. In such a scenario, setting informative prior distributions when the number of categories is great becomes difficult, so the need for an objective approach arises. However, what does objective mean in the Dirichlet-multinomial process? To deal with this question, we study the sensitivity of the posterior distribution to the choice of an objective Dirichlet prior from those presented in the available literature. We illustrate the impact of the selection of the prior distribution in several scenarios and discuss the most sensible ones.
... The literature on Bayesian statistics includes various proposals for prior distributions of α α α with minimum information (Alvares, 2015). Our choice here is α k = 1/K because it has been shown to be an objective prior (Berger et al., 2015) with the reference distance approach (see also Perks, 1947). Figure 3 shows the 95% posterior credible intervals for the probability associated to asymptomatic, mild and severe symptoms tubers depending on the health of the seed from which have grown. ...
Article
Full-text available
Tigernut tubers are the main ingredient in the production of orxata in Valencia, a white soft sweet popular drink. In recent years, the appearance of black spots in the skin of tigernuts has led to important economic losses in orxata production because severely diseased tubers must be discarded. In this paper, we discuss three complementary statistical models to assess the disease incidence of harvested tubers from selected or treated seeds, and propose a measure of effectiveness for different treatments against the disease based on the probability of germination and the incidence of the disease. Statistical methods for these studies are approached from Bayesian reasoning and include mixed-effects models, Dirichlet-multinomial inferential processes and mixed-effects logistic regression models. Statistical analyses provide relevant information to carry out measures to palliate the black spot disease and achieve a high-quality production. For instance, the study shows that avoiding affected seeds increases the probability of harvesting asymptomatic tubers. It is also revealed that the best chemical treatment, when prioritizing germination , is disinfection with hydrochloric acid while sodium hypochlorite performs better if the priority is to have a reduced disease incidence. The reduction of the incidence of the black spots syndrome by disinfection with chemical agents supports the hypothesis that the causal agent is a pathogenic organism.
... In that context, the family defined in (12) is quite appropriate. We note that the well-known non-informative prior due to Perks [36], with α(y I ) = 1 |Y I | for all y I , belongs to Γ. ...
Article
We present a novel framework for performing statistical sampling, expectation estimation, and partition function approximation using \emph{arbitrary} heuristic stochastic processes defined over discrete state spaces. Using a highly parallel construction we call the \emph{sequential constraining process}, we are able to simultaneously generate states with the heuristic process and accurately estimate their probabilities, even when they are far too small to be realistically inferred by direct counting. After showing that both theoretically correct importance sampling and Markov chain Monte Carlo are possible using the sequential constraining process, we integrate it into a methodology called \emph{state space sampling}, extending the ideas of state space search from computer science to the sampling context. The methodology comprises a dynamic data structure that constructs a robust Bayesian model of the statistics generated by the heuristic process subject to an accuracy constraint, the posterior Kullback-Leibler divergence. Sampling from the dynamic structure will generally yield partial states, which are completed by recursively calling the heuristic to refine the structure and resuming the sampling. Our experiments on various Ising models suggest that state space sampling enables heuristic state generation with accurate probability estimates, demonstrated by illustrating the convergence of a simulated annealing process to the Boltzmann distribution with increasing run length. Consequently, heretofore unprecedented direct importance sampling using the \emph{final} (marginal) distribution of a generic stochastic process is allowed, potentially augmenting the range of algorithms at the Monte Carlo practitioner's disposal.
Article
We developed a statistical theory of zero-count-detector (ZCD), which is defined as a zero-class Poisson under conditions outlined in this paper. ZCD is often encountered in the studies of rare events in physics, health physics, and many other fields where counting of events occurs. We found no acceptable solution to ZCD in classical statistics and affirmed the need for the Bayesian statistics. Several uniform and reference priors were studied, and we derived Bayesian posteriors, point estimates, and upper limits. It was shown that the maximum-entropy prior, containing the most information, resulted in the smallest bias and the lowest risk, making it the most admissible and acceptable among the priors studied. We also investigated application of zero-inflated Poisson and Negative-binomial distributions to ZCD. It was shown using Bayesian marginalization that, under limited information, these distributions reduce to the Poisson distribution.
Preprint
Data collection biases are a persistent issue for studies of social networks. This issue has been particularly important in Animal Social Network Analysis (ASNA), where data are unevenly sampled and such biases may potentially lead to incorrect inferences about animal social behavior. Here, we address the issue by developing a Bayesian generative model, which not only estimates network structure, but also explicitly accounts for sampling and censoring biases. Using a set of simulation experiments designed to reflect various sampling and observational biases encountered in real-world scenarios, we systematically validate our model and evaluate it's performance relative to other common ASNA methodologies. By accounting for differences in node-level censoring (i.e., the probability of missing an individual interaction.), our model permits the recovery of true latent social connections, even under a wide range of conditions where some key individuals are intermittently unobserved. Our model outperformed all other existing approaches and accurately captured network structure, as well as individual-level and dyad-level effects. Antithetically, permutation-based and simple linear regression aprroaches performed the worst across many conditions. These results highlight the advantages of generative network models for ASNA, as they offer greater flexibility, robustness, and adaptability to real-world data complexities. Our findings underscore the importance of generative models that jointly estimate network structure and adjust for measurement biases typical in empirical studies of animal social behaviour.
Article
In many applications in biology, engineering, and economics, identifying similarities and differences between distributions of data from complex processes requires comparing finite categorical samples of discrete counts. Statistical divergences quantify the difference between two distributions. However, their estimation is very difficult and empirical methods often fail, especially when the samples are small. We develop a Bayesian estimator of the Kullback-Leibler divergence between two probability distributions that makes use of a mixture of Dirichlet priors on the distributions being compared. We study the properties of the estimator on two examples: probabilities drawn from Dirichlet distributions and random strings of letters drawn from Markov chains. We extend the approach to the squared Hellinger divergence. Both estimators outperform other estimation techniques, with better results for data with a large number of categories and for higher values of divergences.
Article
In contingency table analysis, one is interested in testing whether a model of interest (e.g., the independent or symmetry model) holds using goodness-of-fit tests. When the null hypothesis where the model is true is rejected, the interest turns to the degree to which the probability structure of the contingency table deviates from the model. Many indexes have been studied to measure the degree of the departure, such as the Yule coefficient and Cramér coefficient for the independence model, and Tomizawa’s symmetry index for the symmetry model. The inference of these indexes is performed using sample proportions, which are estimates of cell probabilities, but it is well known that the bias and mean square error (MSE) values become large without a sufficient number of samples. To address the problem, this study proposes a new estimator for indexes using Bayesian estimators of cell probabilities. Assuming the Dirichlet distribution for the prior of cell probabilities, we asymptotically evaluate the value of MSE, when plugging the posterior means of cell probabilities into the index, and propose an estimator of the index using the Dirichlet hyperparameter that minimizes the value. Numerical experiments show that when the number of samples per cell is small, the proposed method has smaller values of bias and MSE than other methods of correcting estimation accuracy. We also show that the values of bias and MSE are smaller than those obtained by using the uniform and Jeffreys priors.
Preprint
In many applications in biology, engineering and economics, identifying similarities and differences between distributions of data from complex processes requires comparing finite categorical samples of discrete counts. Statistical divergences quantify the difference between two distributions. However, their estimation is very difficult and empirical methods often fail, especially when the samples are small. We develop a Bayesian estimator of the Kullback-Leibler divergence between two probability distributions that makes use of a mixture of Dirichlet priors on the distributions being compared. We study the properties of the estimator on two examples: probabilities drawn from Dirichlet distributions, and random strings of letters drawn from Markov chains. We extend the approach to the squared Hellinger divergence. Both estimators outperform other estimation techniques, with better results for data with a large number of categories and for higher values of divergences.
Chapter
In this chapter we describe approaches for Bayesian estimation and model determination for multivariate categorical data, typically summarised in the form of a multiway contingency table. The focus is on log-linear models, particularly log-linear interaction models which describe associations and (conditional) independence of the variables. We view modelling both as a mechanism for obtaining smooth estimated of cell probabilities, and for drawing inferences about associations between the categorical variables being studied. We discuss possible families of prior distributions, with a focus on conjugate prior distributions. For some models these lead to convenient, analytically tractable, posterior summaries, but generally further computation is required. Computational methodology is introduced and discussed.
Article
Full-text available
En este trabajo presentamos la aplicación de Procedimientos Automáticos Bayesianos sobre los conjuntos líticos descontextualizados de las terrazas próximas al Pas de l’Ase (Ribera d’Ebre, Tarragona) para proporcionar probabilidades cronológicas y diacronía espaciotemporal. Bajo los objetivos de obtener una mayor base empírica y contribuir a rellenar vacíos de información mediante el estudio de los palimpsestos, se establece un marco referencial que engloba los contextos arqueológicos fechados del Mediterráneo peninsular en el periodo comprendido entre el 14000-3000 cal. BP. Evaluamos la aplicación de esta nueva metodología, su efectividad y su adaptación sobre los diferentes casos de estudio y los referentes usados.
Thesis
Full-text available
La technique de l’émission acoustique est une méthode passive et non destructive d’évaluation de l’état de santé des structures. Sous contrainte, les structures libèrent de l’énergie sous forme d’ondes transitoires appelées émissions acoustiques, portant les caractéristiques de leurs sources. Ces ondes sont représentées dans un espace commun de descripteurs. Le nombre de sources et les descripteurs pertinents étant inconnus, pour différents sous-ensembles de descripteurs, un partitionnement est fait en faisant varier le nombre de clusters. La validation consiste à déterminer le nombre de clusters et le sous-ensemble de descripteurs pertinents. Cette validation a longtemps reposé sur des critères basés sur la forme des clusters, sans tenir compte de la chronologie de ces derniers. Dans cette thèse nous avons défini des critères de validation séquentiels dédiés à l’émission acoustique. Une méthode d’évaluation du partitionnement basée sur les histogrammes de déclenchement des clusters a également été proposée pour un suivi chronologique des événements. Enfin un modèle de partitionnement partiellement supervisé a été proposé pour orienter le partitionnement avec des informations partielles. Ce modèle permet aussi de déterminer le nombre de clusters naturels de façon automatique, sans l’intervention d’un critère. Mots-clés : Validation du partitionnement, émission acoustique, incertitude, comparaison d’algorithmes, monitoring, SHM, CND
Article
The daily counts of COVID-19 cases differed significantly from one region to another at the beginning of the COVID-19 pandemic in any given country. The disease first hit some regions before spreading to others. The Poisson distribution is frequently used to analyze disease occurrence in certain locations at certain times. However, in highly heterogeneous situations, the estimator of multiple Poisson means is not close to the actual population parameter. The estimator of multinomial probabilities under an existing prior is also not close to the actual population parameter in highly heterogeneous situations. We propose a Bayesian estimator of multinomial probabilities under a data-dependent prior. This prior is built using zeta distribution coefficients and depends only on the rank of data. Using simulation studies, the proposed estimator is evaluated with two well-known risks. Finally, the daily counts of COVID-19 cases are analyzed to show how the proposed estimator can be used in practice.
Article
Full-text available
The purpose of this work is to show an automatic Bayesian procedure to obtain accurate chronological information of archaeological assemblages characterized by palimpsest or without radiocarbon dates and whose temporal information comes only from bifacial flint arrowheads. In this paper, a classification method based on the Dirichlet-multinomial inferential process and its posterior predictive probability distribution is discussed. Its purpose is to predict the chronological period of undated archaeological assemblages (levels or sites) by means of a Bayesian predictive process based on the posterior distribution of each bifacial flint arrowhead types in the Eastern Iberia during the 4th and 3rd millennium cal. BC. The results obtained suggest that this approach is very useful to achieve an accurate chronology when other archaeological information is not available, or it is not conclusive.
Article
Multiple category prevalence represents the prevalence of the specific disease with different k-category (k > 2) statuses, such as mild, moderate and severe. This study proposed the Bayesian method for the meta-analysis of studies with multiple category prevalence. The Dirichlet-multinomial model was used to obtain the Bayesian approach. In this way, both the opportunity to consider the preinformation regarding the prevalences and to obtain an effective estimation regardless of the value of the prevalences (around 0.5 or close to 0-1) was possible. The proposed method was compared with the frequentist method based on simulation and Barendregt et al. (2013 Barendregt, J. J., S. A. Doi, Y. Y. Lee, R. E. Norman, and T. Vos. 2013. Meta-analysis of prevalence. Journal of Epidemiology and Community Health 67 (11):974–8. doi:10.1136/jech-2013-203104.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) data. It was demonstrated that without requiring any transformation, the Bayesian method resulted in more consistency; smaller relative errors and mean squared errors, powerful accepted probability estimators, especially for small total sample sizes; and prevalences close to 0-1.
Chapter
Bayesian methods for the analysis of categorical data use the same classes of models as the classical approach. However, Bayesian analyses can be more informative and may provide more natural solutions in certain situations such as those involving sparse or missing data or unidentifiable parameters. In this article, we review some of the most common Bayesian methods for categorical data. We focus on the analysis of contingency tables, but several other useful models are also discussed. For ease of exposition, we describe most of the ideas in terms of two‐way contingency tables.
Article
This article describes an extension to the use of heteroskedastic ordered probit (HETOP) models to estimate latent distributional parameters from grouped, ordered-categorical data by pooling across multiple waves of data. We illustrate the method with aggregate proficiency data reporting the number of students in schools or districts scoring in each of a small number of ordered “proficiency” levels. HETOP models can be used to estimate means and standard deviations of the underlying (latent) test score distributions but may yield biased or very imprecise estimates when group sample sizes are small. A simulation study demonstrates that the pooled HETOP models described here can reduce the bias and sampling error of standard deviation estimates when group sample sizes are small. Analyses of real test score data demonstrate the use of the models and suggest the pooled models are likely to improve estimates in applied contexts.
Article
Short Tandem Repeats (STRs) are a type of DNA polymorphism. This study considers discriminant analysis to determine the population of test individuals using an STR database containing the lengths of STRs observed at more than one locus. The discriminant method based on the Bayes factor is discussed and an improved method is proposed. The main issues are to develop a method that is relatively robust to sample size imbalance, identify a procedure to select loci, and treat the parameter in the prior distribution. A previous study achieved a classification accuracy of 0.748 for the g-mean (geometric mean of classification accuracies for two populations) and 0.867 for the AUC (area under the receiver operating characteristic curve). We improve the maximum values for the g-mean to 0.830 and the AUC to 0.935. Computer simulations indicate that the previous method is susceptible to sample size imbalance, whereas the proposed method is more robust while achieving almost identical classification accuracy. Furthermore, the results confirm that threshold adjustment is an effective countermeasure to sample size imbalance.
Article
For the last thirty years the teaching of statistics in universities in this country has been dominated by the relative frequency theory of probability, exemplified in Richard von Mises's book, Probability, Statistics and Truth . This statistical definition along the lines of the long run frequency concept of probability can be illustrated by asking what meaning is to be given to the statement ‘the probability of getting a head on a single toss of a penny is one half.’ The relative frequency adherent would answer something like ‘in a long sequence of repeated tosses, the proportion of outcomes that are heads is one half’. In the last ten years, however, an alternative approach has come to the fore, under the general title of the Bayesian Approach to Statistics. A Bayesian adherent would answer something along the lines that he would be ‘prepared to offer even money on getting a head on a single toss’.
Article
In many current problems, the actual class of the instances, the ground truth, is unavailable. Instead, with the intention of learning a model, the labels can be crowdsourced by harvesting them from different annotators. In this work, among those problems we focus on those that are binary classification problems. Specifically, our main objective is to explore the evaluation and selection of models through the quantitative assessment of the goodness of evaluation methods capable of dealing with this kind of context. That is a key task for the selection of evaluation methods capable of performing a sensible model selection. Regarding the evaluation and selection of models in such contexts, we identify three general approaches, each one based on a different interpretation of the nature of the underlying ground truth: deterministic, subjectivist or probabilistic. For the analysis of these three approaches, we propose how to estimate the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve within each interpretation, thus deriving three evaluation methods. These methods are compared in extensive experimentation whose empirical results show that the probabilistic method generally overcomes the other two, as a result of which we conclude that it is advisable to use that method when performing the evaluation in such contexts. In further studies, it would be interesting to extend our research to multiclass classification problems.
Article
Suppose that we require estimates of the population frequencies corresponding to small entries in a large pure contingency table containing heterogeneity. It would be natural to lump rows or columns together with weights depending on the correlation coefficients. But if the rows, say, are nearly orthogonal, no lumping method will be reasonable. Moreover, if many of the entries in the table are missing, a lumping method may fail. A method is given here (which it may be possible to combine with a lumping method when appropriate) that depends on an assumption for the initial distribution of the “association factor” in each cell, i.e. the ratio of the population frequency in the cell to the product of the population frequencies of its row and column. If the logarithm of the association factor is assumed to have, initially, a normal distribution, then the final expectations and variances of the population frequencies and of their logarithms can be expressed in terms of the “after‐effect” function which was originally tabulated by K. W. Wagner for an electrodynamical application. Instead of a log‐normal distribution, a Pearson Type III distribution may be assumed for the association factor. This assumption may be less accurate but is easier to handle. The paper concludes with a list of properties of the after‐effect function (and of some of its generalizations) that may be useful for further tabulation.
Article
A Bayesian significance test for multinomial distributions is discussed, together with the results of 18 numerical experiments in which the test is compared with non‐Bayesian methods. Several different non‐Bayesian criteria are considered because the circumstances under which their tail‐area probabilities can be conveniently approximated differ from one to the other. A provisional empirical formula, connecting the Bayes factors with the tail‐area probabilities, is found to be correct within a factor of 6 in the 18 experiments. As a by‐product, a new non‐Bayesian statistic is suggested by the theory, and its asymptotic distribution obtained. It seems to be useful sometimes when chi‐squared is not, although chi‐squared also has a Bayesian justification for large samples. The work originated in, and is relevant to, the problem of the estimation of multinomial probabilities, but significance tests are a better proving ground for assumptions concerning the initial (prior) distribution of the physical probabilities.
Article
Modifications of the logit and likelihood‐ratio methods of analysing two‐and three‐dimensional contingency tables are developed from a semi‐empirical approximation to the multinomial distribution. The modified likelihood‐ratio test is shown to have an approximate relationship to the Freeman and Halton (1951) criterion for choosing a critical region for the exact test. A Bayesian argument leading to the same results is discussed.
Article
In the context of an objective Bayesian approach to the multinomial model, Dirichlet(a, …, a) priors with a < 1 have previously been shown to be inadequate in the presence of zero counts, suggesting that the uniform prior (a = 1) is the preferred candidate. In the presence of many zero counts, however, this prior may not be satisfactory either. A model selection approach is proposed, allowing for the possibility of zero parameters corresponding to zero count categories. This approach results in a mixture of Dirichlet posteriors and mixtures of marginal beta posteriors, which seem to avoid the problems that potentially result from the various proposed Dirichlet priors, in particular in the context of extreme data with zero counts.
Article
Welch and Peers have shown that it is possible to determine a prior distribution for a one‐dimensional parameter by requiring that the associated one‐sided Bayes intervals are asymptotically confidence intervals. It is shown in this note that their procedure does not apply if certain natural two‐sided Bayes intervals are used.
Article
Full-text available
The goal of this paper is to show that there exists a simple, yet universal statistical logic of spectral graph analysis by recasting it into a nonparametric function estimation problem. The prescribed viewpoint appears to be good enough to accommodate most of the existing spectral graph techniques as a consequence of just one single formalism and algorithm. Dedicated to the beloved memory of Emanuel (Manny) Parzen.
Article
This article features an analysis of the relationship between the DOW JONES Industrial Average (DJIA) Index and a sentiment news series using daily data obtained from the Thomson Reuters News Analytics (TRNA) provided by SIRCA (The Securities Industry Research Centre of the Asia Pacific). The recent growth in the availability of on-line financial news sources, such as internet news and social media sources provides instantaneous access to financial news. Various commercial agencies have started developing their own filtered financial news feeds which are used by investors and traders to support their algorithmic trading strategies. TRNA is one such data set. In this study, we use the TRNA data set to construct a series of daily sentiment scores for DJIA stock index component companies. We use these daily DJIA market sentiment scores to study the relationship between financial news sentiment scores and the stock prices of these companies using entropy measures. The entropy and mutual information (MI) statistics permit an analysis of the amount of information within the sentiment series, its relationship to the DJIA and an indication of how the relationship changes over time.
Chapter
The essence of Bayes theory is giving probability values to bets. Methods of generating such probabilities are what separate the various theories.
Article
Spectral graph theory is undoubtedly the most favored graph data analysis technique, both in theory and practice. It has emerged as a versatile tool for a wide variety of applications including data mining, web search, quantum computing, computer vision, image segmentation, and among others. However, the way in which spectral graph theory is currently taught and practiced is rather mechanical, consisting of a series of matrix calculations that at first glance seem to have very little to do with statistics, thus posing a serious limitation to our understanding of graph problems from a statistical perspective. Our work is motivated by the following question: How can we develop a general statistical foundation of "spectral heuristics" that avoids the cookbook mechanical approach? A unified method is proposed that permits frequency analysis of graphs from a nonparametric perspective by viewing it as function estimation problem. We show that the proposed formalism incorporates seemingly unrelated spectral modeling tools (e.g., Laplacian, modularity, regularized Laplacian, etc.) under a single general method, thus providing better fundamental understanding. It is the purpose of this paper to bridge the gap between two spectral graph modeling cultures: Statistical theory (based on nonparametric function approximation and smoothing methods) and Algorithmic computing (based on matrix theory and numerical linear algebra based techniques) to provide transparent and complementary insight into graph problems.
Article
The complete final product of Bayesian inference is the posterior distribution of the quantity of interest. Important inference summaries include point estimation, region estimation and precise hypotheses testing. Those summaries may appropriately be described as the solution to specific decision problems which depend on the particular loss function chosen. The use of a continuous loss function leads to an integrated set of solutions where the same prior distribution may be used throughout. Objective Bayesian methods are those which use a prior distribution which only depends on the assumed model and the quantity of interest. As a consequence, objective Bayesian methods produce results which only depend on the assumed model and the data obtained. The combined use of intrinsic discrepancy, an invariant information-based loss function, and appropriately defined reference priors, provides an integrated objective Bayesian solution to both estimation and hypothesis testing problems. The ideas are illustrated with a large collection of non-trivial examples.
ResearchGate has not been able to resolve any references for this publication.