Anton Olsson-Collentine’s research while affiliated with Tilburg University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Overview of the stratified sampling and matching procedure.
A Gaussian kernel density plot of included preprints over time per preprint category (COVID-19 versus non-COVID-19 preprint).
A Gaussian kernel density plot of the percentage statistics within a preprint that was inconsistent per preprint category (COVID-19 versus non-COVID-19 preprint).
The top panel shows the number of extracted statistics per type of statistic per preprint category (COVID-19 versus non-COVID-19 preprint). The results based on COVID-19 preprints are indicated by purple bars and those of non-COVID-19 preprints by yellow bars. The bottom panel shows the percentage of extracted statistics that was internally inconsistent per preprint category. Error bars indicate standard errors of the percentages.
Comparing the prevalence of statistical reporting inconsistencies in COVID-19 preprints and matched controls: a registered report
  • Article
  • Full-text available

August 2023

·

79 Reads

·

4 Citations

Robbie C. M. van Aert

·

·

Anton Olsson-Collentine

·

[...]

·

The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output. In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints. Our results show that the overall prevalence of incorrectly reported statistics is 9–10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints. In conclusion, the literature suggests that COVID-19 research may on average have more methodological problems than non-COVID-19 research, but our results show that there is no difference in the statistical reporting quality.

Download

Unreliable Heterogeneity: How Measurement Error Obscures Heterogeneity in Meta-analyses in Psychology

June 2023

·

90 Reads

·

1 Citation

Measurement error (imperfect reliability) is present in any empirical effect size estimate and systematically attenuates observed effect sizes compared to true underlying effect sizes. Yet there exist broad concerns that proper measurement tends to be neglected in much of psychological research. We examined how measurement error in primary studies affects meta-analytic heterogeneity estimates using Monte-Carlo simulations. Our results indicate that although measurement error in primary studies can both inflate and suppress heterogeneity, under most circumstances measurement error in primary studies leads to a severe underestimate of heterogeneity in meta-analysis. Our simulations showed expected heterogeneity to be underestimated by about 15% - 60% when considering a typical effect size around r = 0.2 and true heterogeneity levels that are common in the meta-analytic literature (τ>0.1, in Pearson’s r). The underestimate primarily depends on average reliability in primary studies (higher reliability leads to a smaller underestimate), but also worsens with smaller primary study sample sizes. We observed a positive bias in heterogeneity estimates due to measurement error only under specific and arguably uncommon circumstances of (1) actual zero heterogeneity, particularly when mean effect sizes are large, or (2) combinations of very small true heterogeneity, large variance in primary study reliabilities, large mean effect sizes, and a limited number of primary studies. Severe underestimates of heterogeneity due to measurement error may affect many meta-analyses in psychology and obscure true differences between studies that could be relevant for theory, practice, and future research efforts. Research on concrete guidance to applied meta-analysts is needed, as sophisticated methods for correcting measurement unreliability such as meta-analytic structural equation modeling (MASEM) are only applicable in exceptional cases and corrections based on classical test theory come with caveats and strong assumptions.


Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

May 2023

·

338 Reads

·

10 Citations

Psychological Methods

Researcher degrees of freedom refer to arbitrary decisions in the execution and reporting of hypothesis-testing research that allow for many possible outcomes from a single study. Selective reporting of results (p-hacking) from this "multiverse" of outcomes can inflate effect size estimates and false positive rates. We studied the effects of researcher degrees of freedom and selective reporting using empirical data from extensive multistudy projects in psychology (Registered Replication Reports) featuring 211 samples and 14 dependent variables. We used a counterfactual design to examine what biases could have emerged if the studies (and ensuing meta-analyses) had not been preregistered and could have been subjected to selective reporting based on the significance of the outcomes in the primary studies. Our results show the substantial variability in effect sizes that researcher degrees of freedom can create in relatively standard psychological studies, and how selective reporting of outcomes can alter conclusions and introduce bias in meta-analysis. Despite the typically thousands of outcomes appearing in the multiverses of the 294 included studies, only in about 30% of studies did significant effect sizes in the hypothesized direction emerge. We also observed that the effect of a particular researcher degree of freedom was inconsistent across replication studies using the same protocol, meaning multiverse analyses often fail to replicate across samples. We recommend hypothesis-testing researchers to preregister their preferred analysis and openly report multiverse analysis. We propose a descriptive index (underlying multiverse variability) that quantifies the robustness of results across alternative ways to analyze the data. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Preprint_vanAert_etal_Covid19

May 2023

·

9 Reads

The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output. In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints. Our results show that the overall prevalence of incorrectly reported statistics is 9-10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints. In conclusion, the literature suggests that COVID-19 research may on average have more methodological problems than non-COVID-19 research, but our results show that there is no difference in the statistical reporting quality.


Figure 2. Meta-plots (first row) and summary meta-plots (second row) of McCall and Carriger (1993; first column) and Rabelo et al. (2015; second column).
Figure 3. Overview of a summary meta-plot with a brief explanation of its elements.
Figure 4. Screenshot of the web application of meta-plot after applying meta-plot to the meta-analysis of Rabelo et al. (2015).
The Meta-Plot

January 2023

·

1,811 Reads

·

2 Citations

Zeitschrift für Psychologie

The meta-plot is a descriptive visual tool for meta-analysis that provides information on the primary studies in the meta-analysis and the results of the meta-analysis. More precisely, the meta-plot portrays (1) the precision and statistical power of the primary studies in the meta-analysis, (2) the estimate and confidence interval of a random-effects meta-analysis, (3) the results of a cumulative random-effects meta-analysis yielding a robustness check of the meta-analytic effect size with respect to primary studies’ precision, and (4) evidence of publication bias. After explaining the underlying logic and theory, the meta-plot is applied to two cherry-picked meta-analyses that appear to be biased and to 10 randomly selected meta-analyses from the psychological literature. We recommend accompanying any meta-analysis of common effect size measures with the meta-plot.


A many-analysts approach to the relation between religiosity and well-being

July 2022

·

1,472 Reads

·

69 Citations

Religion Brain & Behavior

The relation between religiosity and well-being is one of the most researched topics in the psychology of religion, yet the directionality and robustness of the effect remains debated. Here, we adopted a many-analysts approach to assess the robustness of this relation based on a new cross-cultural dataset (N = 10, 535 participants from 24 countries). We recruited 120 analysis teams to investigate (1) whether religious people self-report higher well-being, and (2) whether the relation between religiosity and self-reported well-being depends on perceived cultural norms of religion (i.e., whether it is considered normal and desirable to be religious in a given country). In a two-stage procedure, the teams first created an analysis plan and then executed their planned analysis on the data. For the first research question, all but 3 teams reported positive effect sizes with credible/confidence intervals excluding zero (median reported b = 0.120). For the second research question, this was the case for 65% of the teams (median reported b = 0.039). While most teams applied (multilevel) linear regression models, there was considerable variability in the choice of items used to construct the independent variables, the dependent variable, and the included covariates.


The meta-plot: A graphical tool for interpreting the results of a meta-analysis

June 2022

·

448 Reads

·

3 Citations

The meta-plot is a descriptive visual tool for meta-analysis that provides information on the primary studies in the meta-analysis and the results of the meta-analysis. More precisely, the meta-plot portrays (i) the precision and statistical power of the primary studies in the meta-analysis, (ii) the estimate and confidence interval of a random-effects meta-analysis, (iii) the results of a cumulative random-effects meta-analysis yielding a robustness check of the meta-analytic effect size with respect to primary studies’ precision, and (iv) evidence of publication bias. After explaining the underlying logic and theory, the meta-plot is applied to two cherry-picked meta-analyses that appear to be biased and to ten meta-analyses randomly selected from the psychological literature. We recommend using the meta-plot in addition to any meta-analysis of common effect size measures, rather than variants of the funnel plot.


Figure 1: Unconstrained t distributed posterior for the difference in means θ based on the birthwt data (solid line), and the unconstrained fractional Cauchy prior (dashed line). Note that P u (θ > 0) = 0.5, P u (θ > 0 | y) = 0.996, π u (θ = 0) = 2.261e−4, and π u (θ = 0 | y) = 1.156e−4.
BFpack : Flexible Bayes Factor Testing of Scientific Theories in R

November 2021

·

104 Reads

·

50 Citations

Journal of Statistical Software

There have been considerable methodological developments of Bayes factors for hypothesis testing in the social and behavioral sciences, and related fields. This development is due to the flexibility of the Bayes factor for testing multiple hypotheses simultaneously, the ability to test complex hypotheses involving equality as well as order constraints on the parameters of interest, and the interpretability of the outcome as the weight of evidence provided by the data in support of competing scientific theories. The available software tools for Bayesian hypothesis testing are still limited however. In this paper we present a new R package called BFpack that contains functions for Bayes factor hypothesis testing for the many common testing problems. The software includes novel tools for (i) Bayesian exploratory testing (e.g., zero vs positive vs negative effects), (ii) Bayesian confirmatory testing (competing hypotheses with equality and/or order constraints), (iii) common statistical analyses, such as linear regression, generalized linear models, (multivariate) analysis of (co)variance, correlation analysis, and random intercept models, (iv) using default priors, and (v) while allowing data to contain missing observations that are missing at random.


Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

April 2021

·

34 Reads

·

4 Citations

Researcher degrees of freedom refer to arbitrary decisions in the execution and reporting of hypothesis-testing research that many possible outcomes from a single study. Selective reporting of results (p-hacking) from this ‘multiverse’ of outcomes can inflate effect size estimates and false positive rates. We studied the effects of researcher degrees of freedom and selective reporting using empirical data from extensive multi-study projects in psychology (Registered Replication Reports) featuring 211 samples and 14 dependent variables. Our results show the substantial variability in effect sizes that researcher degrees of freedom can create in relatively standard psychological studies, and how selective reporting of outcomes can alter conclusions and introduce bias in meta-analysis. Despite the typically thousands of outcomes appearing in the multiverses of the 294 included studies, only in about 30% of studies did significant effect sizes in the hypothesized direction emerge. We also observed that the effect of a particular researcher degree of freedom was inconsistent across studies using the same protocol, meaning multiverse analyses often fail to replicate across samples. We recommend hypothesis-testing researchers to preregister their preferred analysis and openly report multiverse analysis. We propose a descriptive index (Underlying Multiverse Variability) that quantifies the robustness of results across alternative ways to analyze the data.


Figure 1. Result of simulation relating I 2 -values to between studies standard deviation.
Figure 2. Simulated I 2 densities across 68 meta-analyses for zero, small, medium, and large heterogeneity according to the definitions of Higgins (2003), and the distribution of the observed I 2 estimates (bars) for the 68 meta-analyses. Each simulated density consists of
Table 2 continued
Figure 3. The Pearson correlation between absolute effect size and A) τ ̂, B) I 2 , and C)
Variation in observed effect sizes as a function of true effect size and measurement reliability.
Heterogeneity in Direct Replications in Psychology and Its Association With Effect Size

July 2020

·

746 Reads

·

70 Citations

Psychological Bulletin

We examined the evidence for heterogeneity (of effect sizes) when only minor changes to sample population and settings were made between studies and explored the association between heterogeneity and average effect size in a sample of 68 meta-analyses from 13 preregistered multilab direct replication projects in social and cognitive psychology. Among the many examined effects, examples include the Stroop effect, the "verbal overshadowing" effect, and various priming effects such as "anchoring" effects. We found limited heterogeneity; 48/68 (71%) meta-analyses had nonsignificant heterogeneity, and most (49/68; 72%) were most likely to have zero to small heterogeneity. Power to detect small heterogeneity (as defined by Higgins, Thompson, Deeks, & Altman, 2003) was low for all projects (mean 43%), but good to excellent for medium and large heterogeneity. Our findings thus show little evidence of widespread heterogeneity in direct replication studies in social and cognitive psychology, suggesting that minor changes in sample population and settings are unlikely to affect research outcomes in these fields of psychology. We also found strong correlations between observed average effect sizes (standardized mean differences and log odds ratios) and heterogeneity in our sample. Our results suggest that heterogeneity and moderation of effects is unlikely for a 0 average true effect size, but increasingly likely for larger average true effect size. (PsycInfo Database Record (c) 2020 APA, all rights reserved).


Citations (12)


... COVID-related articles were more likely to suffer from selection biases in randomized control trials, lack of representativeness in cohort studies, comparability issues in case-control studies, etc. [11,12]). However, other analyses suggest that neither the prevalence of statistical reporting errors nor the level of scrutiny in peer reviews differed systematically between COVID-related and non-COVID articles [13,14]. ...

Reference:

How the pandemic affected psychological research
Comparing the prevalence of statistical reporting inconsistencies in COVID-19 preprints and matched controls: a registered report

... For meta-analyses, the DerSimonian and Laird random-effects model was used to provide a conservative estimate of the pooled effect size. Subgroup analyses and metaregressions explored potential sources of heterogeneity, such as study design differences, patient populations, or miR-150 measurement techniques [58]. Publication bias was assessed using funnel plots and Egger's test, which evaluates funnel plot symmetry to detect small-study effects [59]. ...

Unreliable Heterogeneity: How Measurement Error Obscures Heterogeneity in Meta-analyses in Psychology
  • Citing Preprint
  • June 2023

... Further, one could argue that the results of our analyses depend upon many parameters chosen by the researcher (see multiverse debate 113,114), which questions the replicability of the findings. We propose that the methodological choices made in our study set a new standard for future research. ...

Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

Psychological Methods

... Hence, a wealth of studies addressing the same research question is considered valuable in reaching robust generalizations. Meta-analysis allows for reaching consistent results through the effect size unit (Assen, et al., 2023;Bayraktar, 2021). ...

The Meta-Plot

Zeitschrift für Psychologie

... The impact of beliefs on well-being also depends on the cultural context in which people live. A study of 10,535 participants from 24 countries, recruiting 120 analysis teams, provided evidence that the effect of religious belief systems on well-being depends on the cultural norms of religiosity, which refer to the perceived importance of religious beliefs and behaviours for the average person within a given culture (Hoogeveen et al., 2023). In addition, research has found that there appears to be a relationship between spiritual beliefs and SWB in religious countries, but not in secular ones (Pérez & Rohde, 2022). ...

A many-analysts approach to the relation between religiosity and well-being

Religion Brain & Behavior

... Bayes factors between all pairs of hypotheses were computed using the R package BFpack [69]. BFpack uses default priors that do not require the specification of the scale of the expected effects (for details see [69]). ...

BFpack : Flexible Bayes Factor Testing of Scientific Theories in R

Journal of Statistical Software

... Heirene et al. and Bakker et al. found similar results: Decisions relating to study design were relatively wellrestricted compared to decisions regarding data collection and statistical analysis. This is problematic because the many decisions in analyzing data could still create sizeable variation in outcomes that researchers could selectively report (Olsson-Collentine et al., 2023). ...

Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting
  • Citing Preprint
  • April 2021

... The GRADE is rated in four categories: 1) Very low: the true effect is probably markedly different from the estimated effect; 2) Low: the true effect might be markedly different from the estimated effect; 3) Moderate: the authors believe that the true effect is probably close to the estimated effect; and 4) High: the authors have high confidence that the true effect is similar to the estimated effect. For meta-analysis, we classified three levels of evidence certainty based on published statistical metrics [27][28][29]: Level I (strong evidence): heterogeneity I 2 <50 % (p>0.10) and significance of the overall effect p<10 −5 ; Level II (moderate evidence): heterogeneity I 2 <50 % (p>0.10) or significance of the overall effect p<10 −5 ; and Level III (weak evidence): heterogeneity I 2 >50 % (p<0.10) or significance of the overall effect p>10 −5 . ...

Heterogeneity in Direct Replications in Psychology and Its Association With Effect Size

Psychological Bulletin

... Meta-analysis, often included in systematic reviews, summarizes quantitative results from a number of research reports on a particular topic to formulate general conclusions from multiple datasets. It relies primarily on statistical analysis to produce a common metric, called an "efect size," which facilitates the identifcation of patterns and anomalies among publications [22]. ...

Reproducibility of individual effect sizes in meta-analyses in psychology

... In Supplemental Materials, we report results based on the analyses described in the original preregistration, but the results of the original and amended analyses were not qualitatively different. We also report the results of non-preregistered Bayesian analyses, calculated using the BFpack package in R (Mulder et al., 2019), to evaluate the strength of the evidence for our hypotheses. We created interaction plots using the interact_plot() function from the Interactions package in R. All materials and procedures were approved by the institutional review boards at the University of Miami and the University of California, San Diego. ...

BFpack: Flexible Bayes Factor Testing of Scientific Theories in R