FIGURE 2 - uploaded by Daniel Lüdecke

Content may be subject to copyright.

# | Impact of sample size on the different indices, for linear and logistic models, and when the null hypothesis is true or false. Gray vertical lines for p-values and Bayes factors represent commonly used thresholds.

Source publication

Turmoil has engulfed psychological science. Causes and consequences of the reproducibility crisis are in dispute. With the hope of addressing some of its aspects, Bayesian methods are gaining increasing attention in psychological science. Some of their advantages, as opposed to the frequentist framework, are the ability to describe parameters in pr...

## Contexts in source publication

**Context 1**

... Bayesian indices were calculated using the bayestestR package ( Makowski et al., 2019). Figure 2 shows the sensitivity of the indices to sample size. The p-value, the pd and the MAP-based p-value are sensitive to sample size only in case of the presence of a true effect (when the null hypothesis is false). ...

**Context 2**

... with Figure 2 and Table 1, the model investigating the sensitivity of sample size on the different indices suggests that BF indices are sensitive to sample size both when an effect is present (null hypothesis is false) and absent (null hypothesis is true). ROPE indices are particularly sensitive to sample size when the null hypothesis is true, while p-value, pd and MAP-based p-value are only sensitive to sample size when the null hypothesis is false, in which case they are more sensitive than ROPE indices. ...

## Similar publications

Background
Stable Isotope Resolved Metabolomics (SIRM) is a new biological approach that uses stable isotope tracers such as uniformly 13C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt}...

The current paper highlights a new, interactive Shiny App that can be used to aid in understanding and teaching the important task of conducting a prior sensitivity analysis when implementing Bayesian estimation methods. In this paper, we discuss the importance of examining prior distributions through a sensitivity analysis. We argue that conductin...

An innovative Bayesian motion-based wave inference method is derived and assessed in this work. The evaluation of the accuracy of the proposed prior distribution has been carried out using the results obtained during a dedicated experimental campaign with a scale model an Oil and Gas (O&G) semisubmersible platform. As for the Bayesian statistical i...

Understanding the formation of feeding links provides insights into processes underlying food webs. Generally, predators feed on prey within a certain body‐size range, but a systematic quantification of such feeding niches is lacking. We developed a size‐constrained feeding‐niche (SCFN) model and parameterized it with information on both realized a...

Background. Our previous studies showed that N-of-1 trials could reflect the individualized characteristics of traditional Chinese medicine (TCM) syndrome differentiation with good feasibility, but the sensitivity was low. Therefore, this study will use hierarchical Bayesian statistical method to improve the sensitivity and applicability of N-of-1...

## Citations

... For factors we investigated whether the credible intervals of the parameter i overlapped between the categories to detect a significant difference. We also assessed the probability of direction which is the probability that effects were different from zero by the proportion of MCMC samples that were positive for a positive effect or negative for a negative effect (Kéry and Schaub 2011;Makowski et al. 2019). Because we ...

Explaining changes in waterfowl distribution and abundance is requested by wetland managers for a better understanding of their population dynamics and habitat use. The objective of this study was to assess the influence of interannual changes in wetland management, both through direct data and proxies, on the distribution dynamics of Teal day-roosts in the Camargue, a large wetland complex in southern France. We constructed a state-space model accounting for a conditional detection probability by aerial observers during duck counts, since changes in observers have a strong influence on variations in detection probability. First, we showed that the distribution of Teal day-roosts within the Camargue delta has changed over the last 35 years. Second, on a sub-sample of 18 years, we showed that annual changes in Teal abundance depended on salinity and open water area at the day-roost, and on the availability of potential feeding grounds surrounding the day-roost (available wetland area within 5 km). No association was detected between changes in Teal abundance and changes in the typology of wetland hydrology, or with changes in site protection status (i.e. hunted to protected). Our results reinforce the importance of considering management at the scale of functional units, by considering the complementarity of nocturnal feeding areas (mainly hunted areas) specifically managed for waterfowl, and diurnal roosts (mainly nature reserves, which have high conservation value for other animal and plant species). A good understanding of the factors affecting the localisation of waterfowl day-roosts is becoming more important in the context of climate change, which is likely to redistribute local birds with rising sea levels and increasing salinity in wetlands.

... = 8.09. For a directional hypothesis like this, an evidence ratio > 19 (a posterior probability > 0.95) is somewhat analogous to a p-value < 0.05 [63,64], and we refer to such a ratio as 'strong evidence'; for a bidirectional exploratory hypothesis, this threshold is > 39 (a posterior probability of 0.975 for one direction and 0.025 for the opposite direction, hence the total probability is still 0.95). ...

We provide evidence that the roughness of chords – a psychoacoustic property resulting from unresolved frequency components – is associated with perceived musical stability (operationalized as finishedness) in participants with differing levels and types of exposure to Western or Western-like music. Three groups of participants were tested in a remote cloud forest region of Papua New Guinea (PNG), and two groups in Sydney, Australia (musicians and non-musicians). Unlike prominent prior studies of consonance/dissonance across cultures, we framed the concept of consonance as stability rather than as pleasantness. We find a negative relationship between roughness and musical stability in every group including the PNG community with minimal experience of musical harmony. The effect of roughness is stronger for the Sydney participants, particularly musicians. We find an effect of harmonicity – a psychoacoustic property resulting from chords having a spectral structure resembling a single pitched tone (such as produced by human vowel sounds) – only in the Sydney musician group, which indicates this feature's effect is mediated via a culture dependent mechanism. In sum, these results underline the importance of both universal and cultural mechanisms in music cognition, and they suggest powerful implications for understanding the origin of pitch structures in Western tonal music as well as on possibilities for new musical forms that align with humans' perceptual and cognitive biases. They also highlight the importance of how consonance/dissonance is operationalized and explained to participants-particularly those with minimal prior exposure to musical harmony.

... The strength of evidence for each directional hypothesis was obtained from Bayesian evidence ratios which are the posterior odds of the effect being in the direction specified in the hypothesis. For a one-sided hypothesis, an evidence ratio greater than 19 is loosely analogous to a one-sided p-value below 0.05 [21,22]; that is, there is a posterior probability of 95% the effect is in the hypothesized direction. We also used ROPE tests to quantify the probability the effect is practically equivalent to zero. ...

Appendix for 'Evidence for a universal association of auditory roughness with musical stability'

... The differences between the individual protection methods and the control were tested using probability of direction. Probability of direction varies between 50% and 100% and can be interpreted as the probability (expressed in percentage) that a parameter (described by its posterior distribution) is strictly positive or negative (whichever is the most probable) (Makowski et al. 2019a). The probability of direction was then converted to a P-value using one-sided method. ...

Abstract: National parks (NP) are the last refugia of forests dominated by the Nothofagus species in Chile. However,
frequent, careless human-caused fires are destroying these forests even within the national parks. After large-scale
fires, N. pumilio stands are unable to recover naturally neither in generative nor in vegetative ways and artificial regen-
eration must be resorted to in order to maintain their extent. However, even artificial regeneration is not successful
without protection against browsing. Therefore, the aim of this study was to experimentally test a range of repellents
and other methods of mechanical protection of seedlings against browsing. Five replicates of plots were placed in Tor-
res del Paine NP, in different habitat conditions and with different methods of protection against browsing (11 repel-
lents, wire mesh, plastic tube and control). In each plot, 12 seedlings were treated with each type of protection. In our
experiment, only 8% of the seedlings were damaged by browsing, while the mortality rate was 38%. The results indicate
a more significant effect of abiotic factors (mainly frost, drought or wind) on seedling mortality than browsing. In par-
allel, however, it is clear that, compared with the control, six of the eleven repellents used in the experiment showed
a significantly positive effect. We suggest the use of plastic tubes as the best option to protect seedlings, which, in ad-
dition to providing 100% protection against browsing, are likely to provide more favourable microclimatic conditions
for seedlings, similar to leaving the burned snags.
Keywords: browsing; forest fires; guanaco; lenga; repellents

... Prob. direction denotes the probability of direction, defined as the proportion of the parameter posterior distribution that has the same sign as the distribution median (Makowski et al., 2019 ...

... A negative posterior median was indicative of downregulation in the ME/CFS group, and a positive posterior median was indicative of upregulation. ChemRICH [24] was performed for set enrichment statistics, with the posterior median used as an estimate of the effect size, and the probability of direction was used as a Bayesian analog of the p-value [25]. Figure 2 shows the results of the classic p-value (frequentist) analyses for the Nagy-Szakal and Che datasets. ...

... When a 95% credible interval does not overlap with zero, the probability of an effect being zero is <5%. Conversely, if the credible interval overlaps with zero, it could be considered as a Bayesian analog of a significance test [25]. Notably, the similar-sounding "confidence intervals" are different because those values would provide an interval within which the true effect size would fall at 95% confidence if the same study were repeated 100 times [3]. Figure 4 shows the compounds that have 95% credible intervals that do not overlap with zero and have a BF > 3. Again, phosphatidylcholines were found to be downregulated along with tyrosine and one phosphatidylethanolamine (PE) ether-lipid, PE (p-38:6). ...

Univariate analyses of metabolomics data currently follow a frequentist approach, using p-values to reject a null hypothesis. We here propose the use of Bayesian statistics to quantify evidence supporting different hypotheses and discriminate between the null hypothesis versus the lack of statistical power. We used metabolomics data from three independent human cohorts that studied the plasma signatures of subjects with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). The data are publicly available, covering 84–197 subjects in each study with 562–888 identified metabolites of which 777 were common between the two studies and 93 were compounds reported in all three studies. We show how Bayesian statistics incorporates results from one study as “prior information” into the next study, thereby improving the overall assessment of the likelihood of finding specific differences between plasma metabolite levels. Using classic statistics and Benjamini–Hochberg FDR-corrections, Study 1 detected 18 metabolic differences and Study 2 detected no differences. Using Bayesian statistics on the same data, we found a high likelihood that 97 compounds were altered in concentration in Study 2, after using the results of Study 1 as the prior distributions. These findings included lower levels of peroxisome-produced ether-lipids, higher levels of long-chain unsaturated triacylglycerides, and the presence of exposome compounds that are explained by the difference in diet and medication between healthy subjects and ME/CFS patients. Although Study 3 reported only 92 compounds in common with the other two studies, these major differences were confirmed. We also found that prostaglandin F2alpha, a lipid mediator of physiological relevance, was reduced in ME/CFS patients across all three studies. The use of Bayesian statistics led to biological conclusions from metabolomic data that were not found through frequentist approaches. We propose that Bayesian statistics is highly useful for studies with similar research designs if similar metabolomic assays are used.

... It is important to note that such interval null-hypothesis tests have been found to demonstrate substantially lower false-positive rates than traditional frequentist tests of a point-null hypothesis [138], thereby guarding against errors relevant to the large number of hypothesis tests in the current study (see also [139,140] for Bayesian perspectives on multiple testing). These interval null hypotheses were assessed using a Bayesian hypothesis testing procedure based on the region of practical equivalence (ROPE [136]) and the ROPE Bayes factor (BF ROPE [141][142][143]). From the posterior distribution of effect sizes, we calculated the following indices: (a) P ROPE , the posterior probability that the null hypothesis is true (i.e., the summary effect size is too small to be practically meaningful), (b) log(BF ROPE ), a measure of evidence that the summary effect size falls within versus outside the ROPE. ...

Background:
Differences in responding to sensory stimuli, including sensory hyperreactivity (HYPER), hyporeactivity (HYPO), and sensory seeking (SEEK) have been observed in autistic individuals across sensory modalities, but few studies have examined the structure of these "supra-modal" traits in the autistic population.
Methods:
Leveraging a combined sample of 3868 autistic youth drawn from 12 distinct data sources (ages 3-18 years and representing the full range of cognitive ability), the current study used modern psychometric and meta-analytic techniques to interrogate the latent structure and correlates of caregiver-reported HYPER, HYPO, and SEEK within and across sensory modalities. Bifactor statistical indices were used to both evaluate the strength of a "general response pattern" factor for each supra-modal construct and determine the added value of "modality-specific response pattern" scores (e.g., Visual HYPER). Bayesian random-effects integrative data analysis models were used to examine the clinical and demographic correlates of all interpretable HYPER, HYPO, and SEEK (sub)constructs.
Results:
All modality-specific HYPER subconstructs could be reliably and validly measured, whereas certain modality-specific HYPO and SEEK subconstructs were psychometrically inadequate when measured using existing items. Bifactor analyses supported the validity of a supra-modal HYPER construct (ωH = .800) but not a supra-modal HYPO construct (ωH = .653), and supra-modal SEEK models suggested a more limited version of the construct that excluded some sensory modalities (ωH = .800; 4/7 modalities). Modality-specific subscales demonstrated significant added value for all response patterns. Meta-analytic correlations varied by construct, although sensory features tended to correlate most with other domains of core autism features and co-occurring psychiatric symptoms (with general HYPER and speech HYPO demonstrating the largest numbers of practically significant correlations).
Limitations:
Conclusions may not be generalizable beyond the specific pool of items used in the current study, which was limited to caregiver report of observable behaviors and excluded multisensory items that reflect many "real-world" sensory experiences.
Conclusion:
Of the three sensory response patterns, only HYPER demonstrated sufficient evidence for valid interpretation at the supra-modal level, whereas supra-modal HYPO/SEEK constructs demonstrated substantial psychometric limitations. For clinicians and researchers seeking to characterize sensory reactivity in autism, modality-specific response pattern scores may represent viable alternatives that overcome many of these limitations.

... Wives indicates the certainty that the estimated effect (based upon its posterior distribution) is different from zero in either a positive or negative direction (Makowski et al., 2019). A PD equal to 50% indicates that the parameter is equally likely to be positive or negative and 100% indicates the parameter is very likely to be different from zero in either a positive or negative direction. ...

The assumption that stress negatively impacts marital relationships is widely accepted; however, the majority of research has focused on marital satisfaction as the outcome of interest. Relational turbulence is a quality of romantic associations on par with—but distinct from—satisfaction, in which partners conceptualize their relationship as chaotic or tumultuous. This paper draws on relational turbulence theory (RTT) and stress spillover research to propose that day‐to‐day stress corresponds with perceptions of relational uncertainty and interdependence, which contributes to increases in relational turbulence. We evaluated these assumptions using data from 64 heterosexual married partners who experienced work‐related disruptions due to the COVID‐19 pandemic. Spouses completed a pre‐test survey, 10 weekly surveys, and a post‐test survey over 12 weeks from June to August 2020. Results from longitudinal actor‐partner interdependence models indicated that (a) wives' weekly stress corresponded positively with their own partner uncertainty, relationship uncertainty, and perceptions of partner interference, and (b) the magnitude of wives' stress spillover and husbands' change in relational turbulence were positively associated. Implications for RTT and research on stress spillover are discussed.

... First, inference about the parameter estimates in the Bayesian framework makes use of 95% credible intervals (instead of confidence intervals) and transformation into relative risk (RR) ratios (given the multinomial nature of the outcome variable). Another useful metric for interpretation is the probability of direction, which is the probability that a parameter (based upon its posterior distribution) is positive or negative, that is, different from zero (Makowski et al., 2019). The probability of direction ranges from 50% to 100%, with 50% indicating that the parameter is equally likely to be positive or negative and 100% indicating the parameter is very likely to be either positive or negative. ...

Several theoretical perspectives suggest that dyadic experiences are distinguished by patterns of behavioral change that emerge during interactions. Methods for examining change in behavior over time are well elaborated for the study of change along continuous dimensions. Extensions for charting increases and decreases in individuals' use of specific, categorically defined behaviors, however, are rarely invoked. Greater accessibility of Bayesian frameworks that facilitate formulation and estimation of the requisite models is opening new possibilities. This article provides a primer on how multinomial logistic growth models can be used to examine between-dyad differences in within-dyad behavioral change over the course of an interaction. We describe and illustrate how these models are implemented in the Bayesian framework using data from support conversations between strangers (N = 118 dyads) to examine (RQ1) how six types of listeners' and disclosers' behaviors change as support conversations unfold and (RQ2) how the disclosers' preconversation distress moderates the change in conversation behaviors. The primer concludes with a series of notes on (a) implications of modeling choices, (b) flexibility in modeling nonlinear change, (c) necessity for theory that specifies how and why change trajectories differ, and (d) how multinomial logistic growth models can help refine current theory about dyadic interaction. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

... We calculated Bayesian p values from posterior predictive distributions, for model components of detection probability ( p d ) and availability for detection ( p a ) to evaluate goodness of fit, where values approaching 0 or 1 imply lack of fit (Amundson et al., 2014;Gelman et al., 1996). We report mean values of the posterior distribution and 95% credible intervals (CRIs) for parameter estimates, unless otherwise stated, and we determined the strength of evidence for covariate effects based on the probability of direction (pd) representing the proportion of the posterior distribution having the same sign as its median value (Makowski et al., 2019). We provide code and data to fit our models using R and JAGS through USGS GitLab and ScienceBase ( ...

Anthropogenic resource subsidization across western ecosystems has contributed to widespread increases in generalist avian predators, including common ravens (Corvus corax; hereafter, raven). Ravens are adept nest predators and can negatively impact species of conservation concern. Predation effects from ravens are especially concerning for greater sage-grouse (Centrocercus urophasianus; hereafter, sage-grouse), which have experienced prolonged population decline. Our objectives were to quantify spatiotemporal patterns in raven density, evaluate sage-grouse nest success concurrent with fluctuating raven densities, and demonstrate a spatially explicit decision support tool to guide management applications to appropriate conflict areas. We combined~28,000 raven point count surveys with data from more than 900 sage-grouse nests between 2009 and 2019 within the Great Basin, USA. We modeled variation in raven density using a Bayesian hierarchical distance sampling approach with environmental covariates on detection and abundance. Concurrently, we modeled sage-grouse nest survival using a hierarchical frailty model as a function of raven density and other environmental covariates that influence the risk of nest failure. Raven density commonly exceeded 0.5 ravens km −2 and increased at low elevations with more anthro-pogenic development and/or agriculture. Reduced sage-grouse nest survival was strongly associated with elevated raven density (e.g., >0.5 ravens km −2) and varied with topographic ruggedness, shrub cover, and burned areas. For conservation application, we developed a spatially explicit planning tool that predicts nest survival under current and reduced raven numbers within the Great Basin to help direct management actions to localized areas where sage-grouse nests are at highest risk of failure. Our modeling framework can be generalized to multiple species where spatially registered abundance and demographic data are available.