Figure - available from: Nature
This content is subject to copyright. Terms and conditions apply.
Results of the consistent thresholding and ROI selection analysis
n =64. a, Activation for each hypothesis as determined using consistent thresholding (black, P < 0.001 and cluster size (k) > 10 voxels; blue, FDR correction with P < 0.05) and ROI selection across teams (y axis), versus the actual proportion of teams reporting activation (x axis). Numbers next to each symbol represent the hypothesis number for each point. b, Results from re-thresholding of unthresholded maps, using either uncorrected values with the threshold (P < 0.001, k > 10) or FDR correction (PFDR < 5%) and common anatomical ROIs for each hypothesis. A team is recorded as having an activation if one or more significant voxels are found in the ROI. Results for image-based meta-analysis (IBMA) for each hypothesis are presented, also thresholded at PFDR < 5%.

Results of the consistent thresholding and ROI selection analysis n =64. a, Activation for each hypothesis as determined using consistent thresholding (black, P < 0.001 and cluster size (k) > 10 voxels; blue, FDR correction with P < 0.05) and ROI selection across teams (y axis), versus the actual proportion of teams reporting activation (x axis). Numbers next to each symbol represent the hypothesis number for each point. b, Results from re-thresholding of unthresholded maps, using either uncorrected values with the threshold (P < 0.001, k > 10) or FDR correction (PFDR < 5%) and common anatomical ROIs for each hypothesis. A team is recorded as having an activation if one or more significant voxels are found in the ROI. Results for image-based meta-analysis (IBMA) for each hypothesis are presented, also thresholded at PFDR < 5%.

Source publication
Article
Full-text available
Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses¹. The flexibility of analytical approaches is exempl...

Citations

... Indeed, variations in processing methods across different modalities, research groups, studies, and even individual researchers have contributed to inconsistencies and discrepancies in reported findings [9][10][11]. This problem was more recently surfaced with the Neuroimaging Analysis Replication and Prediction Study (NARPS; [12]), where 70 teams of functional MRI (fMRI) experts were provided with the same dataset and tasked with testing a closed set of nine hypotheses. The results highlighted an overall poor agreement in conclusions across teams. ...
Chapter
Full-text available
    This chapter critically examines the standardization of preprocessing in neuroimaging, exploring the field’s evolution, the necessity of methodological consistency, and the future directions shaped by artificial intelligence (AI). It begins with an overview of the technical advancements and the emergence of software tools with standardized neuroimaging processes. It also emphasizes the importance of the Brain Imaging Data Structure ( BIDS ) and data sharing to improve reproducibility. The chapter then discusses the impact of methodological choices on research reliability, advocating for standardization to mitigate analytical variability. The multifaceted approach to standardization is explored, including workflow architecture, quality control, and community involvement in open-source projects. Challenges such as method selection, resource optimization, and the integration of AI are addressed, highlighting the role of openly available data and the potential of AI-assisted code writing in enhancing productivity. In conclusion, the chapter underscores NiPreps ’ contribution to providing reliable and reproducible preprocessing solutions, inviting community engagement to advance neuroimaging research. The chapter envisions a collaborative and robust scientific culture in neuroimaging by promoting standardized practices.
    ... In this work, we perform a re-analysis of the NARPS dataset (Botvinik-Nezer et al., 2019;Botvinik-Nezer et al., 2020a), openly available on https://openneuro.org/; Poldrack et al., 2013. ...
    Article
    Full-text available
      Is irrational behavior the incidental outcome of biological constraints imposed on neural information processing? In this work, we consider the paradigmatic case of gamble decisions, where gamble values integrate prospective gains and losses. Under the assumption that neurons have a limited firing response range, we show that mitigating the ensuing information loss within artificial neural networks that synthetize value involves a specific form of self-organized plasticity. We demonstrate that the ensuing efficient value synthesis mechanism induces value range adaptation. We also reveal how the ranges of prospective gains and/or losses eventually determine both the behavioral sensitivity to gains and losses and the information content of the network. We test these predictions on two fMRI datasets from the OpenNeuro.org initiative that probe gamble decision-making but differ in terms of the range of gain prospects. First, we show that peoples' loss aversion eventually adapts to the range of gain prospects they are exposed to. Second, we show that the strength with which the orbitofrontal cortex (in particular: Brodmann area 11) encodes gains and expected value also depends upon the range of gain prospects. Third, we show that, when fitted to participant’s gambling choices, self-organizing artificial neural networks generalize across gain range contexts and predict the geometry of information content within the orbitofrontal cortex. Our results demonstrate how self-organizing plasticity aiming at mitigating information loss induced by neurons’ limited response range may result in value range adaptation, eventually yielding irrational behavior.
      ... They are therefore by no means as direct or objective representations of experience and thought as is often portrayed (Racine et al., 2010). In fact, it has now been shown that different neuroscientists can draw different conclusions from the same brain data (Botvinik-Nezer et al., 2020). ...
      Chapter
      Full-text available
      The introduction explains how the brain became the focus of scientific attention as early as the nineteenth century, but then increasingly since the 1980s. Before we relate this to moral and legal issues, we first look at the development of biological or neuropsychiatry. Using real case studies, we will learn how people and their brains need to be seen in a psychosocial context. In particular, Nancy Andreasen's view that mental disorders are caused by “broken brains” is problematized. Finally, the emergence of neuroethics and neurolaw since the turn of the millennium is described and the issues that are relevant to these disciplines are discussed.
      ... Following the PROBAST guidelines, this study is rated of high concern with regards to applicability. Not only are neuroimaging data-sets prone to variability in analysis, 54 but the limited sample size of these studies also limits the applicability of the results. 55,56 The specialised techniques required for imaging techniques and their associated complexity and cost may make the applicability lower for widespread clinical deployment. ...
      ... In existing biomarker-based models, the great variability in raw data processing and feature type selection potentially hampers successful external validation. 54,55 Some even question if biomarkers are a necessity to include when estimating treatment outcomes. In Meehan et al, 8 studies reporting models with a majority of biomarkers were excluded because of pragmatic concerns. ...
      Article
      Full-text available
      Background Suboptimal treatment outcomes contribute to the high disease burden of mood, anxiety or psychotic disorders. Clinical prediction models could optimise treatment allocation, which may result in better outcomes. Whereas ample research on prediction models is performed, model performance in other clinical contexts (i.e. external validation) is rarely examined. This gap hampers generalisability and as such implementation in clinical practice. Aims Systematically appraise studies on externally validated clinical prediction models for estimated treatment outcomes for mood, anxiety and psychotic disorders by (1) reviewing methodological quality and applicability of studies and (2) investigating how model properties relate to differences in model performance. Method The review and meta-analysis protocol was prospectively registered with PROSPERO (registration number CRD42022307987). A search was conducted on 8 November 2021 in the databases PubMED, PsycINFO and EMBASE. Random-effects meta-analysis and meta-regression were conducted to examine between-study heterogeneity in discriminative performance and its relevant influencing factors. Results Twenty-eight studies were included. The majority of studies ( n = 16) validated models for mood disorders. Clinical predictors (e.g. symptom severity) were most frequently included ( n = 25). Low methodological and applicability concerns were found for two studies. The overall discrimination performance of the meta-analysis was fair with wide prediction intervals (0.72 [0.46; 0.89]). The between-study heterogeneity was not explained by number or type of predictors but by disorder diagnosis. Conclusions Few models seem ready for further implementation in clinical practice to aid treatment allocation. Besides the need for more external validation studies, we recommend close examination of the clinical setting before model implementation.
      ... In the methodological part of a study, data analysis demands a lot of decisions. More recently, 70 independent analysis teams tested nine prespecified hypotheses using the same task-functional magnetic resonance imaging (fMRI) dataset (Botvinik-Nezer et al., 2020). The 70 teams selected 70 different analytical pipelines, and this variation affected the results, including the statistical maps and conclusions drawn regarding the preselected hypotheses tested. ...
      Article
      Full-text available
      One of the key ingredients of scientific progress is the ability to repeat, replicate, and reproduce independently important scientific findings. Recently, independent groups failed to replicate the results of several experiments in various research areas, opening the so-called “reproducibility crisis.” The reasons behind these failures may be motivated by the excessive trust given to the results obtained by digital computers. Indeed, little attention was given to the implementation of a principal algorithm, and method or to the variation introduced by the use of different software, and hardware systems or to how difficult a finding can be recover after weeks or years or to the precision level one had performed a computational experiment (Donoho et al., 2009; Peng, 2011).
      ... About 70 teams analyzed the same dataset, and when strict thresholding was applied before comparisons, conclusions were judged to be inconsistent across a large fraction of teams. 48 Even though this conclusion of inconsistency has been most widely cited and broadcast in literature and social media, we can recognize that thresholding actually plays the role of filtering, and conditioning on stringent thresholding produces a strong selection bias, strongly distorting interpretation. In fact, without inserting dichotomization, the very same results were evaluated as being predominantly consistent. ...
      ... In fact, without inserting dichotomization, the very same results were evaluated as being predominantly consistent. 35,48 Moreover, instead of solely reporting statistical values through artificial di-Common practices in neuroimaging association analyses should be approached with caution from a causal inference perspective. Statistical modeling typically falls into two categories: prediction and causal inference. ...
      Article
      Full-text available
      The critical importance of justifying the inclusion of covariates is a facet often overlooked in data analysis. While the incorporation of covariates typically follows informal guidelines, we argue for a comprehensive exploration of underlying principles to avoid significant statistical and interpretational challenges. Our focus is on addressing three common yet problematic practices: the indiscriminate lumping of covariates, the lack of rationale for covariate inclusion, and the oversight of potential issues in result reporting. These challenges, prevalent in neuroimaging models involving covariates such as reaction time, demographics, and morphometric measures, can introduce biases, including overestimation, underestimation, masking, sign flipping, or spurious effects. Our exploration of causal inference principles underscores the pivotal role of domain knowledge in guiding covariate selection, challenging the common reliance on statistical measures. This understanding carries implications for experimental design, model-building, and result interpretation. We draw connections between these insights and reproducibility concerns, specifically addressing the selection bias resulting from the widespread practice of strict thresholding, akin to the logical pitfall associated with “double dipping.” Recommendations for robust data analysis involving covariates encompass explicit research question statements, justified covariate inclusions/exclusions, centering quantitative variables for interpretability, appropriate reporting of effect estimates, and advocating a “highlight, don’t hide” approach in result reporting. These suggestions are intended to enhance the robustness, transparency, and reproducibility of covariate-driven analyses, encompassing investigations involving consortium datasets such as ABCD and UK Biobank. We discuss how researchers can use a transparent depiction of the covariate relationships to enhance the ethos of open science and promote research reproducibility.
      ... Another issue is the methodological heterogeneity of neuroimaging research. The choice of hardware, data acquisition protocol, pre-processing steps, and analysis pipelines can have unexpected and substantial effects on the results of studies using a variety of neuroimaging modalities [175][176][177] . While it is impossible to prescribe a similar set of best practices for every study, the design should be appropriate to specific contexts of use if the results are to contribute to biomarker development. ...
      Article
      Full-text available
      As a neurobiological process, addiction involves pathological patterns of engagement with substances and a range of behaviors with a chronic and relapsing course. Neuroimaging technologies assess brain activity, structure, physiology, and metabolism at scales ranging from neurotransmitter receptors to large-scale brain networks, providing unique windows into the core neural processes implicated in substance use disorders. Identified aberrations in the neural substrates of reward and salience processing, response inhibition, interoception, and executive functions with neuroimaging can inform the development of pharmacological, neuromodulatory, and psychotherapeutic interventions to modulate the disordered neurobiology. Closed- or open-loop interventions can integrate these biomarkers with neuromodulation in real time or offline to personalize stimulation parameters and deliver precise intervention. This Analysis provides an overview of neuroimaging modalities in addiction medicine, potential neuroimaging biomarkers, and their physiologic and clinical relevance. Future directions and challenges in bringing these putative biomarkers from the bench to the bedside are also discussed.
      ... The vast variability in processing and analysis pipelines for neuroimaging data is another potential source for lack of reproducibility, which is addressed with meta-and multiverse-analyses, or multianalyst studies 47 . With our multiverse analysis, we find an influence of some of the investigated processing steps, and observe that some of the biomarkers that perform on chance level on average have an acceptable performance and robustness in individual pipelines, e.g. the alpha peak frequency 48,49 . ...
      Preprint
      Major depressive disorder (MDD) and other psychiatric diseases can greatly benefit from objective decision support in diagnosis and therapy. Machine learning approaches based on electroencephalography (EEG) have the potential to serve as low-cost decision support systems. Despite the successful demonstration of this approach, contradictory findings regarding the diagnostic value of those biomarkers hamper their deployment in a clinical setting. Therefore, the reproducibility and robustness of these biomarkers needs to be established first. We employ a multiverse analysis to systematically investigate variations in five data processing steps, which may be one source of contradictory findings. These steps are normalization, time-series segment length, biomarker from the alpha band, aggregation, and classification algorithm. For replicability of our results, we utilize two publicly available EEG data sets with eyes-closed resting-state data containing 16/19 MDD patients and 14/14 healthy control subjects. The diagnostic classifiers range from chance level up to 85%, dependent on dataset and combination of processing steps. We find a large influence of choice of processing steps and their combinations. However, only the biomarker has an overall significant effect on both datasets. We find one biomarker candidate that has shown a robust and reproducible high performance for MDD diagnostic support, the relative centroid frequency. Overall, the replicability of our findings with the two datasets is rather inconsistent. This study is a showcase for the advantages of employing a multiverse approach in EEG data analysis and advocates for larger, well-curated data sets to further neuroscience research that can be translated to clinical practice.
      ... Similar concerns have been observed in psychology, where concepts such as 'researcher degrees of freedom' (Simmons et al., 2011) and the 'garden of forking paths' (Gelman and Loken, 2013) emphasise how flexibility in data collection and multiple potential tests based on the same dataset, along with the pursuit of meaningful parameters, can increase the risk of false positives. Furthermore, crowd-science experiments have demonstrated significant variability in research processes, a lack of consensus in decision-making, and divergent outcomes when different researchers analyse the same data (Botvinik-Nezer et al., 2020;Wicherts et al., 2016;Silberzahn et al., 2018) . While this degree of freedom promotes exploration and efficient methodological progress, it also carries a risk of poor decisions and undesirable outcomes. ...
      Preprint
      Full-text available
      Discrete Choice Modelling serves as a robust framework for modelling human choice behaviour across various disciplines. Building a choice model is a semi structured research process that involves a combination of a priori assumptions, behavioural theories, and statistical methods. This complex set of decisions, coupled with diverse workflows, can lead to substantial variability in model outcomes. To better understand these dynamics, we developed the Serious Choice Modelling Game, which simulates the real world modelling process and tracks modellers' decisions in real time using a stated preference dataset. Participants were asked to develop choice models to estimate Willingness to Pay values to inform policymakers about strategies for reducing noise pollution. The game recorded actions across multiple phases, including descriptive analysis, model specification, and outcome interpretation, allowing us to analyse both individual decisions and differences in modelling approaches. While our findings reveal a strong preference for using data visualisation tools in descriptive analysis, it also identifies gaps in missing values handling before model specification. We also found significant variation in the modelling approach, even when modellers were working with the same choice dataset. Despite the availability of more complex models, simpler models such as Multinomial Logit were often preferred, suggesting that modellers tend to avoid complexity when time and resources are limited. Participants who engaged in more comprehensive data exploration and iterative model comparison tended to achieve better model fit and parsimony, which demonstrate that the methodological choices made throughout the workflow have significant implications, particularly when modelling outcomes are used for policy formulation.
      ... For many years, those choices have been considered as "implementation details" but evidence is growing that they can lead to different and sometimes contradictory results. For instance, the same dataset of functional Magnetic Resonance Imaging (fMRI) results was independently analyzed by 70 teams, testing 9 ex-ante hypotheses [10]. Significant variations appeared in reported results, with substantial effects on scientific conclusions, thus jeopardizing the confidence one could have in these studies. ...
      Article
      Full-text available
      Uncertainty in Informatics can stem from various sources, whether ontological (inherent unpredictability, such as aleatory factors) or epistemic (due to insufficient knowledge). Effectively handling uncertainty, encompassing both ontological and epistemic aspects, to create predictable systems is a key objective for a significant portion of the software engineering community, particularly within the model-driven engineering (MDE) realm. Numerous techniques have been proposed over the years, leading to evolving trends in model-based software development paradigms. This paper revisits the history of MDE, aiming to pinpoint the primary aspects of uncertainty that these paradigms aimed to tackle upon their introduction. Our claim is that MDE progressively came after more and more aspects of uncertainty, up to the point that it could now help fully embrace it.