Tal Yarkoni’s research while affiliated with Mountain View College and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (128)


141. Neurosynth Compose: A Free an Open Platform for Precise Large-Scale Neuroimaging Meta-Analysis
  • Article

May 2024

·

26 Reads

Biological Psychiatry

James Kent

·

Nicolas Lee

·

Julio Peraza

·

[...]

·

Alejandro De La Vega

Figure 2. A graphical timeline of the historical development of the BIDS project, including important publications, meetings, and other developments.
Figure 4. Growing usage of BIDS over time. Left: Growth of the OpenNeuro database since its inception in 2017, adapted from (Markiewicz et al., 2021). Right: Cumulative number of unique T1-weighted anatomical (t1w) and BOLD images from BIDS datasets submitted to the MRIQC web API (Esteban, Blair, et al., 2019) from 2018 to June 2023. Source data and code to generate figures available at https://osf.io/x7fh8/.
The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)
  • Article
  • Full-text available

March 2024

·

583 Reads

·

10 Citations

Imaging Neuroscience

The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves. We also discuss the lessons learned through the project, with the aim of enabling researchers in other domains to learn from the success of BIDS.

Download

Fig. 1. A graphical representation of tools and methods implemented in NiMARE. This diagram outlines six of the most common use-cases for NiMARE. (A) Coordinate-Based Meta-Analysis (CBMA) is performed by creating a NiMARE Dataset with coordinate information stored in the Dataset.coordinates attribute, which is then used in a CBMA Estimator. This produces a MetaResult object with statistical maps, which can then be used in a Corrector object for multiple comparisons correction. Once the Corrector has been fitted, it will produce a corrected version of the MetaResult object, containing updated statistical maps. (B) Image-Based Meta-Analysis (IBMA) operates similarly to CBMA, except that IBMA Estimators use statistical maps stored in the Dataset.images attribute. (C) Meta-Analytic Coactivation Modeling (MACM) uses a region of interest to select coordinate-based studies within a Dataset, after which the standard CBMA workflow is performed. (D) Automated Annotation infers labels from textual (and sometimes other) data associated with the Dataset, as stored in the Dataset.texts attribute. The annotation functions produce labels which may be integrated into the Dataset as the Dataset.annotations attribute. (E) Functional decoding of continuous statistical maps operates similarly to discrete decoding, in that the input Dataset must have both coordinates and annotations attributes. The Dataset, along with an unthresholded statistical map to decode, is provided to the Decoder object, which then outputs measures of similarity or associativeness with each label. (F) Functional decoding of discrete inputs applies a selection criterion to a Dataset with both coordinates and annotations attributes, using a Decoder object. The decoding algorithm will output measures of similarity or associativeness with each label in the annotations.
Fig. 2. A schematic figure of Datasets, Estimators, Transformers, and MetaResults in NiMARE.
Fig. 3. A flowchart of the typical workflow for coordinate-based meta-analyses in NiMARE.
Fig. 4. Modeled activation maps produced by NiMARE's KernelTransformer classes.
NiMARE: Neuroimaging Meta-Analysis Research Environment

August 2023

·

302 Reads

·

19 Citations

Aperture Neuro

We present NiMARE (Neuroimaging Meta‑Analysis Research Environment; RRID:SCR_0173981), a Python library for neuroimaging meta‑analyses and metaanalysis‑related analyses. NiMARE is an open source, collaboratively‑developed package that implements a range of meta‑ analytic algorithms, including coordinate‑ and image‑based meta‑analyses, automated annotation, functional decoding, and meta‑analytic coactivation modeling. By consolidating meta‑analytic methods under a common library and syntax, NiMARE makes it straightforward for users to employ the appropriate approach for a given analysis. In this paper, we describe NiMARE’s architecture and the methods implemented in the library. Additionally, we provide example code and results for each of the available tools in the library.



Figure 1. Overview of the predictive approach. 1) Features from non-brain, structGS (global and subcortical structural), and func (functional connectivity) modalities are extracted from baseline data. The number of features is provided in parentheses. 2) Feature concatenation produces sets of multimodal input features. For instance, red represents non-brain features only, while orange represents a combination of non-brain and structGS. 3) Extraction of slopes representing cognitive change from CDR (Clinical Dementia Rating) and MMSE (Mini-Mental State Examination). 4) Models are trained to predict cognitive decline based on the input features. Here, we used a multi-target random forest model within a nested cross-validation approach to predict CDR and MMSE change simultaneously.
Figure 2. Adding structural data (orange) to non-brain data (red) improved the prediction of cognitive decline. Test performance (R 2 , coefficient of determination, x-axis) across splits (N splits = 1000) for the combinations of input modalities (y-axis). Targets: cognitive change measured via CDR (Clinical Dementia Rating, middle) and MMSE (Mini-Mental State Examination, right). Input modalities: non-brain, structGS (global and subcortical structural volumes), func (functional connectivity). The left panel represents combinations of input modalities (e.g., orange is non-brain + structGS). The number represents the median, the dashed vertical line marks the median of the bestperforming combination of modalities (within a target measure). For the full results that include singlemodality brain imaging, see Figure S2.
Figure 3. Cognitive performance, daily functioning, and subcortical volume were among the most informative features. Permutation importance of the top 15 features of the non-brain + structGS model (median across splits). Permutation importance is quantified as the decrease in test performance R 2 with the feature permuted. Red: non-brain features, light orange: structGS features. CDR: Clinical Dementia Rating, SOB: Sum of Boxes, FAQ: Functional Assessment Questionnaire, REMDATES: difficulty remembering dates, L: left, MMSE: Mini-Mental State Examination, R: right, SOB: sum of boxes score, TRAIL B: Trail Making Test B, WF: word fluency, WMS: Wechsler Memory Scale, MEMUNITS: Total number of story units recalled (delayed), LOGIMEM: Total number of story units recalled from this current test administration.
Figure 4. Multimodal imaging improves brain-age prediction. Input modalities: non-brain, structGS (global and subcortical structural volumes), func (functional connectivity). The number represents the median, the dashed vertical line marks the median of the best-performing combination of modalities. For the full results that include single-modality brain imaging, see Figure S6.
List of abbreviations of clinical tests
Predicting future cognitive decline from non-brain and multimodal brain imaging data in healthy and pathological aging

October 2022

·

223 Reads

·

11 Citations

Neurobiology of Aging

Previous literature has focused on predicting a diagnostic label from structural brain imaging. Since subtle changes in the brain precede cognitive decline in healthy and pathological aging, our study predicts future decline as a continuous trajectory instead. Here, we tested whether baseline multimodal neuroimaging data improve the prediction of future cognitive decline in healthy and pathological aging. Non-brain data (demographics, clinical and neuropsychological scores), structural MRI and functional connectivity data from OASIS-3 (N=662; age=46–96y) were entered into cross-validated multi-target random forest models to predict future cognitive decline (measured by CDR and MMSE), on average 5.8y into the future. The analysis was preregistered, and all analysis code is publicly available. Combining non-brain with structural data improved the continuous prediction of future cognitive decline (best test-set performance: R2=0.42). Cognitive performance, daily functioning, and subcortical volume drove the performance of our model. Including functional connectivity did not improve predictive accuracy. In the future, the prognosis of age-related cognitive decline may enable earlier and more effective individualized cognitive, pharmacological, and behavioral interventions.


Figure 2. Overview schematic of analysis creation and model execution. (a) Interactive analysis creation is made possible through an easy-to-use web application, resulting in a fully specified reproducible analysis bundle. (b) Automated model execution is achieved with little-to-no configuration through a containerized model fitting workflow. Results are automatically made available in NeuroVault, a public repository for statistical maps.
Figure 4. Comparison of a sample of four single study results with meta-analysis (N=20) for three features: 'building' and 'text' extracted through Clarifai visual scene detection models, and sound 'loudness' (root mean squared of the auditory signal). Images were thresholded at Z=3.29 (p<0.001) voxel-wise. Regions with a priori association with each predictor are highlighted: PPA, parahippocampal place area; VWFA, visual word form area; STS, superior temporal sulcus. Datasets: Budapest, Learning Temporal Structure (LTS), 500daysofsummer task from Naturalistic Neuroimaging Database, and Sherlock.
Figure 6. Meta-analytic statistical maps for concreteness and frequency controlling for speech, text length, number of syllables and phonemes, and phone-level Levenshtein distance. N=33 tasks; images were thresholded at Z=3.29 (p<0.001) voxel-wise. Visual word form area, VWFA.
Neuroscout, a unified platform for generalizable andreproducible fMRI research

August 2022

·

106 Reads

·

7 Citations

eLife

Functional magnetic resonance imaging (fMRI) has revolutionized cognitive neuroscience, but methodological barriers limit the generalizability of findings from the lab to the real world. Here, we present Neuroscout, an end-to-end platform for analysis of naturalistic fMRI data designed to facilitate the adoption of robust and generalizable research practices. Neuroscout leverages state-of-the-art machine learning models to automatically annotate stimuli from dozens of fMRI studies using naturalistic stimuli-such as movies and narratives-allowing researchers to easily test neuroscientific hypotheses across multiple ecologically-valid datasets. In addition, Neuroscout builds on a robust ecosystem of open tools and standards to provide an easy-to-use analysis builder and a fully automated execution engine that reduce the burden of reproducible research. Through a series of meta-analytic case studies, we validate the automatic feature extraction approach and demonstrate its potential to support more robust fMRI research. Owing to its ease of use and a high degree of automation, Neuroscout makes it possible to overcome modeling challenges commonly arising in naturalistic analysis and to easily scale analyses within and across datasets, democratizing generalizable fMRI research.


Bambi : A Simple Interface for Fitting Bayesian Linear Models in Python

August 2022

·

1,072 Reads

·

73 Citations

Journal of Statistical Software

The popularity of Bayesian statistical methods has increased dramatically in recent years across many research areas and industrial applications. This is the result of a variety of methodological advances with faster and cheaper hardware as well as the development of new software tools. Here we introduce an open source Python package named Bambi (BAyesian Model Building Interface) that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in R. We demonstrate Bambi’s versatility and ease of use with a few examples spanning a range of common statistical models including multiple regression, logistic regression, and mixed-effects modeling with crossed group specific effects. Additionally we discuss how automatic priors are constructed. Finally, we conclude with a discussion of our plans for the future development of Bambi.



Figure 6: Meta-analytic statistical maps for concreteness and frequency controlling for speech, text length, number of syllables and phonemes, and phone-level Levenshtein distance. N=33 tasks; images were thresholded at Z=3.29 (p<0.001) voxel-wise. Visual word form area, VWFA.
Neuroscout, a unified platform for generalizable and reproducible fMRI research

April 2022

·

58 Reads

·

5 Citations

Functional magnetic resonance imaging (fMRI) has revolutionized cognitive neuroscience, but methodological barriers limit the generalizability of findings from the lab to the real world. Here, we present Neuroscout, an end-to-end platform for analysis of naturalistic fMRI data designed to facilitate the adoption of robust and generalizable research practices. Neuroscout leverages state-of-the-art machine learning models to automatically annotate stimuli from dozens of naturalistic fMRI studies, allowing researchers to easily test neuroscientific hypotheses across multiple ecologically-valid datasets. In addition, Neuroscout builds on a robust ecosystem of open tools and standards to provide an easy-to-use analysis builder and a fully automated execution engine that reduce the burden of reproducible research. Through a series of meta-analytic case studies, we validate the automatic feature extraction approach and demonstrate its potential to support more robust fMRI research. Owing to its ease of use and a high degree of automation, Neuroscout makes it possible to overcome modeling challenges commonly arising in naturalistic analysis and to easily scale analyses within and across datasets, democratizing generalizable fMRI research.


Replies to commentaries on the generalizability crisis

February 2022

·

28 Reads

·

19 Citations

Behavioral and Brain Sciences

The 38 commentaries on the target article span a broad range of disciplines and perspectives. I have organized my response to the commentaries around three broad questions: First, how serious are the problems discussed in the target article? Second, are there are other, potentially more productive, ways to think about the issues that the target article framed in terms of generalizability? And third, what, if anything, should we collectively do about these problems?


Citations (74)


... At least some of these interoperability issues are likely to be solved in the not-too-distant future. For example, while not entirely stemming from the existence of multiple BRAIN Initiative archives (as even a central database can support multiple standards), the challenge of making separate data standards themselves interoperable is receiving considerable attention from data stewards and neuroscientists (Markiewicz et al., 2021;Poldrack et al., 2024;Rübel et al., 2022). In addition, the efforts of data stewards and standards developers in launching and improving data standards, providing training for users, and beginning to forge paths of interoperability so far have been facilitated by a number of data resource selection, evaluation, and self-study tools keyed not just to the FAIR Principles, but also to similar criteria, such as the TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) Principles (Donaldson and Koepke, 2022;FAIR Data Maturity Model Working Group, 2020;Lin et al., 2020;Murphy et al., 2021;Poline et al., 2022;Sandström et al., 2022;Wilkinson et al., 2016). ...

Reference:

The BRAIN Initiative data-sharing ecosystem: Characteristics, challenges, benefits, and opportunities
The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)

Imaging Neuroscience

... 2.9. Behavioral characterization based on the Neurosynth database To infer mental processes most likely related to the identified brain regions in our metaanalyses, we performed a series of exploratory functional characterization analyses using data derived from Neurosynth and NiMARE (Salo et al., 2023;Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011) which contain a large pool of automatically generated meta-analytic activation maps across a multitude of terms/topics. This approach enables us to discuss our results in relation to these terms/topics, without relying on acquiring data from a wide range of functional neuroimaging tasks in the same cohort. ...

NiMARE: Neuroimaging Meta-Analysis Research Environment

Aperture Neuro

... Language is receiving increasing attention as a possible source of such digital markers (Corona Hernández et al. 2023). Language production encodes rich information about individual traits (e.g., personality, Yarkoni 2010; Park et al. 2015), mental states and psychopathology (Nguyen et al. 2014;Williamson et al. 2016), both in its content (what we talk about) and its style (e.g., lexical, syntactic, and discourse-level choices in how we talk about it) (Rocca and Yarkoni 2022). Indeed, language is central in psychiatric assessment and diagnostics, and diagnostic criteria for many conditions include symptoms that are primarily inferred from linguistic behavior (Low, Bentley, and Ghosh 2020). ...

Language as a fingerprint: Self-supervised learning of user encodings using transformers
  • Citing Conference Paper
  • January 2022

... The effort kicked off with an initial document by Chris Gorgolewski, Tal Yarkoni, and Satra Ghosh in September 2016, which became BEP002. Subsequent efforts to develop the specification and the tooling were driven by Neuroscout (de la Vega et al., 2022), which was the first project to use the specification in production. An effort to substantially complete the specification was undertaken at a meeting at Stanford in October 2018, and was subsequently led by Tal Yarkoni, Alejandro de la Vega, and Chris Markiewicz until Yarkoni left academia in 2021. ...

Neuroscout, a unified platform for generalizable andreproducible fMRI research

eLife

... To analyse the dataset, we perform a Bayesian logistic regression using Bambi (Capretto et al., 2022), a package for Bayesian regression models based on PyMC (Oriol et al., 2023). We fit the regression on Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2024) the combined training and evaluation splits of the data using only tokens that were tagged as content words using spaCy (Honnibal et al., 2020). ...

Bambi : A Simple Interface for Fitting Bayesian Linear Models in Python

Journal of Statistical Software

... There are many examples of the use of machines, computer technology, and software to support workflow management, data collecting, simulations, or decision making [1][2][3][4]. This approach can be applied across a variety of scientific fields, from engineering [5] and biology [6] to social sciences [7]. A considerable amount of emphasis is being placed on the creation of novel algorithms that can be conveniently stored in the cloud, which is easy to implement in open networking systems [8]. ...

Enhancing and Accelerating Social Science Via Automation: Challenges and Opportunities

... Almost every year, studies are published where the FSQ is employed. Its versatility allows for application in research areas such as workplace productivity, targeted industry-specific functional status measurement, predicting future cognitive decline, or measuring the effectiveness of social support [7][8][9][10]. ...

Predicting future cognitive decline from non-brain and multimodal brain imaging data in healthy and pathological aging

Neurobiology of Aging

... Two activation likelihood estimation meta-analyses were conducted with text extraction from abstracts in the NeuroSynth database (Yarkoni et al., 2011), using the NiMARE automated meta-analysis package (Salo et al., 2022). Each analysis contained a salience condition (threat, reward), restricted to studies involving only "threat" or "reward" respectively. ...

NiMARE: Neuroimaging Meta-Analysis Research Environment

... Data Experimental setting Queries OpenNeuro [7,8] original HED (few datasets only) keywords BrainMap [17,18,19] derivatives only BrainMap taxonomy only existing labels NeuroVault [20,21] derivatives only arbitrary labels keywords NeuroSynth [22,23] derivatives only keywords from publication texts keywords NeuroScout [24,25] original ML classifier labels ML classifier labels ...

Neuroscout, a unified platform for generalizable and reproducible fMRI research

... The failure to generalize may indicate that the theory is false or low in utility. Yarkoni (2022aYarkoni ( , 2022b carefully detailed many of the difficulties in generalizing results, and a potential inference (not Yarkoni's) is that it could be sensible for psychology researchers not to focus on that which is too difficult to do, which justifies researchers de-emphasizing external validity in favor of maximizing internal validity. A potential riposte is that regardless of the difficulty in generalizing results, it is nevertheless necessary for researchers to generalize, and so difficulty provides a poor excuse for researchers to sacrifice external validity to maximize internal validity. ...

Replies to commentaries on the generalizability crisis
  • Citing Article
  • February 2022

Behavioral and Brain Sciences