David R. Shanks’s research while affiliated with University College London and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (308)


Validating AI-assisted evaluation of open science practices in brain sciences: ChatGPT, Claude, and human expert comparisons
  • Preprint

February 2025

·

2 Reads

Daryl Yu Heng Lee

·

David Shanks

This study investigates the efficacy of AI-assisted evaluation of open science practices in brain sciences, comparing ChatGPT 4 and Claude 3.5 Sonnet against human expert assessment. We analysed 100 randomly selected journal articles across various brain science disciplines using a 6-item transparency checklist. Three human experts and two AI chatbots independently evaluated the articles. Results showed strong correlations between human and AI chatbot overall ratings. Both chatbots demonstrated high concordance with humans in assessing code sharing, materials availability, preregistration, and sample size rationales. However, they struggled with accurately identifying the presence of data availability statements and assessing public accessibility of shared data. These findings suggest that AI chatbots can effectively support the evaluation of some open science practices and potentially expedite the assessment process in academic research. However, their limitations in certain areas highlight the continued importance of human oversight in ensuring comprehensive and accurate evaluations of scientific transparency.


Validating AI-assisted evaluation of open science practices in brain sciences: ChatGPT, Claude, and human expert comparisons
  • Preprint
  • File available

February 2025

·

18 Reads

This study investigates the efficacy of AI-assisted evaluation of open science practices in brain sciences, comparing ChatGPT 4 and Claude 3.5 Sonnet against human expert assessment. We analysed 100 randomly selected journal articles across various brain science disciplines using a 6-item transparency checklist. Three human experts and two AI chatbots independently evaluated the articles. Results showed strong correlations between human and AI chatbot overall ratings. Both chatbots demonstrated high concordance with humans in assessing code sharing, materials availability, preregistration, and sample size rationales. However, they struggled with accurately identifying the presence of data availability statements and assessing public accessibility of shared data. These findings suggest that AI chatbots can effectively support the evaluation of some open science practices and potentially expedite the assessment process in academic research. However, their limitations in certain areas highlight the continued importance of human oversight in ensuring comprehensive and accurate evaluations of scientific transparency.

Download


Figure 2 PRISMA 2020 flow diagram of the literature search strategy (Page et al., 2021).
Figure 3 Forest-plot of the d' meta-analysis. Experiments included are coded in the left column according to the initial letters of the names of the authors, followed by the last two digits of the year of publication, and the number of the experiment within the study, or its category (Pilot, Main).
Figure 4 Forest plot representing the results of the Cohen's d z meta-analysis. Experiments included are coded in the left column according to the initial letters of the names of the authors, followed by the last two digits of the year of publication, and the number of the experiment within the study, or its category (Pilot, Main).
Figure 5 Violin plots representing the distribution of permuted reliability scores computed as splithalf correlations (r xx ) for priming and visibility tasks across the five experiments conducted by Berkovitch and Dehaene (2019). Diamonds indicate the mean r xx for each experiment. Crosses represent the corresponding mean Spearman-Brown correction (r* xx ) with negative values treated as zero.
Figure 6 Representation of the correlation between the performance in the masked priming task and the visibility task across the five experiments. Each dot represents an individual participant, with different colors referring to each of the five experiments. The colored lines represent the trend of the correlation for each experimental sample. The black line depicts the trend for the five experiments combined.

+1

The Conscious Side of ‘Subliminal’ Linguistic Priming: A Systematic Review With Meta-Analysis and Reliability Analysis of Visibility Measures

January 2025

·

54 Reads

·

2 Citations

Journal of Cognition

Research on unconscious processing has been a valuable source of evidence in psycholinguistics for shedding light on the cognitive architecture of language. The automaticity of syntactic processing, in particular, has long been debated. One strategy to establish this automaticity involves detecting significant syntactic priming effects in tasks that limit conscious awareness of the stimuli. Criteria for assessing unconscious priming include the visibility (d’) of masked words not differing significantly from zero and no positive correlation between visibility and priming. However, such outcomes could also arise for strictly methodological reasons, such as low statistical power in visibility tests or low reliability of dependent measures. In this study, we aimed to address these potential limitations. Through meta-analysis and Bayesian re-analysis,we find evidence of low statistical power and of participants having above-chance awareness of ‘subliminal’ words. Moreover, we conducted reliability analyses on a dataset from Berkovitch and Dehaene (2019), finding that low reliability in both syntactic priming and visibility tasks may better explain the absence of a significant correlation. Overall, these findings cast doubt on the validity of previous conclusions regarding the automaticity of syntactic processing based on masked priming effects.The results underscore the importance of revisiting the methods employed when exploring unconscious processing in future psycholinguistic research.


Instructional Intervention Effects on Interleaving Preference and Distance During Self-Regulated Inductive Learning

November 2024

·

81 Reads

·

1 Citation

Interleaving (intermixing exemplars from different categories) is more effective in promoting inductive learning than blocking (massing exemplars from a given category together). Yet learners typically prefer blocking over interleaving during self-regulated inductive learning, highlighting the need to develop effective interventions to overcome this metacognitive illusion and promote learners’ practical use of the interleaving strategy. Drawing on a sample of university students, three experiments examined the effects of an instructional intervention on (a) correction of metacognitive fallacies regarding the superiority of blocking over interleaving for inductive learning, (b) adoption of the interleaving strategy during self-regulated learning when learners are allowed to make study choices exemplar-by-exemplar, (c) classification performance, and (d) transfer of category learning across diverse domains. Experiments 1 and 2 showed that instructions about the benefits of interleaving over blocking improved metacognitive awareness of the efficacy of interleaving and enhanced self-usage of the interleaving strategy during learning of new categories. However, this intervention had negligible influence on interleaving distance and did not improve classification performance. Experiment 3 found that informing learners about the benefits of extensive interleaving, as compared to minimal interleaving or no interleaving, successfully increased interleaving distance and boosted classification performance, and the intervention effects transferred to learning categories in a different domain. These findings support the practical use of the instructional intervention in promoting self-usage of the interleaving strategy and highlight the important role of enlarging interleaving distance in facilitating inductive learning.


A Grain of Truth in the Grain Size Effect: Retrieval Practice Is More Effective When Interspersed During Learning

November 2024

·

22 Reads

Retrieval practice is a powerful method for consolidating long-term learning. When learning takes place over an extended period, how should tests be scheduled to obtain the maximal benefit? In an end-test schedule, all material is studied prior to a large practice test on all studied material, whereas in an interim test schedule, learning is divided into multiple study/test cycles in which each test is smaller and only assesses material from the preceding study block. Past investigations have generally found a difference between these schedules during practice but not during a final assessment, although they may have been underpowered. Five experiments confirmed that final assessment performance was better in students taught using interim than end tests in list (Experiments 1, 2, and 5) and paired associate (Experiments 3 and 4) learning, with a meta-analysis of all available studies (k = 19) yielding a small- to medium-sized effect, g = 0.25, 95% confidence interval [0.09, 0.42]. Experiment 5 finds that the higher level of practice retrieval success in interim tests contributes to the grain size effect, but the effect is eliminated if these tests are too easy. Additional analyses also suggest that the forward testing effect, in which tests promote subsequent learning, may be a major cause of the grain size effect. The practical and theoretical implications of these demonstrations of robust grain size effects are discussed.


Mapping the Reliability Multiverse of Contextual Cuing

October 2024

·

57 Reads

·

3 Citations

Cronbach (1957) famously noted the divergence between the experimental and psychometric traditions in psychology and called for a unification, but many domains of cognitive experimental psychology continue to pay minimal heed to basic psychometric principles. The present article considers the lack of attention devoted to the reliability of measures extracted in a popular visual search task for studying putatively unconscious mental processes, contextual cuing, and the inferential fallacies that this neglect can cause. Two experiments (total N = 200) demonstrated that the reliability of contextual cuing and awareness measures can be increased by three manipulations designed to increase between-participant variability in search performance. At the same time, the data were subjected to a multiverse analysis, which found that specific data preprocessing pipelines result in more reliable estimates. Nevertheless, the reliability estimates remained too low for drawing firm conclusions from standard statistical techniques. Interpreting results from analyses based on individual differences, such as the typical low correlations between implicit and explicit measures, will be challenging so long as the underlying measures have poor reliability.


Kelley’s Paradox and strength skewness in research on unconscious mental processes

Psychonomic Bulletin & Review

A widely adopted approach in research on unconscious perception and cognition involves contrasting behavioral or neural responses to stimuli that have been presented to participants (e.g., old items in a memory test) against those that have not (e.g., new items), and which participants do not discriminate in their conscious reports. We demonstrate that such contrasts do not license inferences about unconscious processing, for two reasons. One is Kelley’s Paradox, a statistical phenomenon caused by regression to the mean. In the inevitable presence of measurement error, true awareness of the contrasted stimuli is not equal. The second is a consequence, within the framework of Signal Detection Theory, of unequal skewness in the strengths of target and nontarget items. The fallacious reasoning that underlies the employment of this contrast methodology is illustrated through both computational simulations and formal analysis, and its prevalence is documented in a narrative literature review. Additionally, a recognition memory experiment is reported which tests and confirms a prediction of our analysis of the contrast methodology and corroborates the susceptibility of this method to artifacts attributable to Kelley’s Paradox and strength skewness. This work challenges the validity of conclusions drawn from this popular analytic approach.


Ratings of the experimental and neutral (control) abstracts in Experiments 1–3. Dots are individual data points. The boxplots show the median and first and third quartiles.
Forest plot of the data from Handley et al. [22], Xiao et al. [32], and the present experiments.
Effect sizes from Handley et al. [22], Xiao et al. [32] and the present experiments plotted against month/year of data collection.
A re-evaluation of gender bias in receptiveness to scientific evidence of gender bias

September 2024

·

192 Reads

·

1 Citation

Gender bias has been documented in many aspects of Science, Technology, Engineering and Mathematics (STEM) careers, yet efforts to identify the underlying causes have been inconclusive. To what extent do cognitive biases, including unequal receptiveness in women and men to evidence of gender bias, contribute to gender bias in STEM? We investigated receptiveness in a STEM context among members of the general public, by undertaking a high-powered (total N = 1171) replication, including three experiments (2 pre-registered) of the prominent study by Handley et al. [22]. It was hypothesized that men would evaluate a research summary reporting evidence of gender bias less favourably than women but that there would be no difference between men and women’s evaluations of research summaries unrelated to gender bias. The results revealed no effect of the assessor’s gender on receptiveness to scientific evidence of gender bias. The different results compared to those of Handley et al. [22] suggest either that the gender bias they detected has diminished in the past decade or that their findings are a false positive. The present research adds to a growing body of evidence suggesting that some influential studies on cognitive ‘markers’ of gender bias warrant re-examination.



Citations (60)


... The results showed that the JOL group recalled significantly more related word pairs than the no-JOL group, with no significant difference observed in recall of unrelated word pairs. Numerous studies have consistently demonstrated that making JOLs has a positive reactivity effect on memory for related word pairs (e.g., Halamish & Undorf, 2023;Janes et al., 2018;Li et al., 2022Li et al., , 2023Maxwell & Huff, 2022Myers et al., 2020;Rivers et al., 2021Rivers et al., , 2023Zhao et al., 2025), and this effect remains robust even after long delays (at least 48 hr; Witherby & Tauber, 2017). However, making JOLs typically has no effect (Double et al., 2018;Maxwell & Huff, 2022), or sometimes even has a negative effect on memory for unrelated word pairs (Mitchum et al., 2016;Undorf et al., 2024). ...

Reference:

Age Differences in the Reactivity Effect of Judgments of Learning on Recognition Memory
Individual differences in the reactivity effect of judgments of learning: Cognitive factors
  • Citing Article
  • February 2025

Journal of Memory and Language

... Before continuing, it is important to note that this problem is not specific to the contextual cuing paradigm. Hundreds of studies on unconscious mental processes rely on the same approach: If performance on a particular task is independent from participants' awareness of the crucial regularities or the stimuli driving their performance, this is systematically taken as evidence that the cognitive processes involved must be unconscious, almost always without paying adequate attention to the reliabilities of the dependent measures (Hernández-Gutiérrez et al., 2024;Lee & Shanks, 2023;Shanks et al., 2021). We have already referred to some classic studies, but this correlational approach has been used more recently, for instance, in studies on unconscious syntactic priming (Berkovitch & Dehaene, 2019) and unconscious memory suppression (Salvador et al., 2018). ...

The Conscious Side of ‘Subliminal’ Linguistic Priming: A Systematic Review With Meta-Analysis and Reliability Analysis of Visibility Measures

Journal of Cognition

... However, learners frequently hold flawed mental models of how learning and memory function, resulting in inaccurate metacognitive assessments and ineffective management of their own learning (Bjork et al. 2013;Kornell and Bjork 2009). For instance, learners tend to prefer studying category exemplars in blocks and erroneously believe that blocked learning is more beneficial for category induction than interleaved learning, even though interleaved learning is actually more effective (Li et al. 2025;Sun et al. 2022). This underscores the need to understand why learners frequently misjudge their knowledge mastery level, identify potential factors affecting metacognitive monitoring accuracy, and develop effective interventions to promote the accuracy of self-assessments (Rhodes and Tauber 2011;Yang et al. 2021b). ...

Instructional Intervention Effects on Interleaving Preference and Distance During Self-Regulated Inductive Learning

... If any of the measures were not normally distributed, we used Spearman correlations instead. Furthermore, as suggested by studies evaluating the reliability of Experimental measures, data preprocessing decisions that work well for group-level inferences are less optimal for correlational studies (Garre-Frutos et al., 2024;Parsons, 2022;Vadillo et al., 2024). For that reason, for our correlational analysis, we also filtered RTs 3 SDs above or below each participant's distribution (see Garre-Frutos et al., 2024). ...

Mapping the Reliability Multiverse of Contextual Cuing

... As such, there is considerable variability in their pre-processing across studies/lab groups which can impact their outcomes and reliability. For example, studies have shown that a priori decisions on the removal of outliers in reaction time distributions (e.g. using the mean vs. median, removing reaction times greater than 2 or 3 standard deviations around the individual mean) can impact the reliability of a widely used task to measure attentional bias (the Visual Probe task: Jones et al., 2018;Price et al., 2015), but also the Stroop and flanker tasks (Parsons, 2020), and contextual cueing tasks (Vadillo et al., 2023). However, other methods of removal also exist, such as transformation (e.g. ...

Mapping the reliability multiverse of contextual cuing
  • Citing Preprint
  • July 2023

... The lack of specific research on STEM college students is notable, given the significant impact implicit biases can have on career choice Page 3 of 15 Beroíza-Valenzuela International Journal of STEM Education (2025) 12:20 and retention in these disciplines (Montgomery et al., 2024;Sebastián-Tirado et al., 2023). This study provides an empirical basis for understanding how implicit beliefs influence STEM students and helps develop strategies to promote gender equity in these fields (Howell et al., 2024;Shanks et al., 2024). By addressing these biases, students can make decisions that are more aligned with their personal values, thus mitigating automatic prejudices and fostering more inclusive, equitable environments (Marini & Banaji, 2022). ...

A re-evaluation of gender bias in receptiveness to scientific evidence of gender bias

... In the past, statisticians have focused on the details of the methodological procedure (cf. the discussion between the captain and the sailor in Figure 1) while ignoring the more fundamental problems of model myopia and bias [20]; the captain does not realize that his view on the iceberg is warped and incomplete, and this means that his recommendations, however well-intentioned and statistically sophisticated they may be, amount to nothing more than rearranging the deck chairs on the Titanic. The cure for model myopia may appear straightforward: conduct a many-analysts study, assess the heterogeneity across analysis teams, and judge the degree to which the qualitative conclusions are fragile or robust [see also 21]. However, many-analysts studies take considerable time and effort to coordinate, and this prohibits their routine application. ...

Subjective evidence evaluation survey for many-analysts studies

... If the increment is inversely related to an item's strength (Bjork & Bjork, 1992;Storm et al., 2008), then the increment for HCMs might be expected to be greater than that for HCCRs, and their strength for the second test will then be more likely to be greater than that of HCCRs, giving rise to the residual memory effect for HCMs. A problem with this account is that, for the other single-item recognition rating categories (e.g., items receiving a "2-medium confidence new" rating), the expected strength of nonstudied items is lower than that of studied items under the UVSD model (discussed further in Lee et al., 2024). If the increment these items receive from the first test is also inversely related to strength, the nonstudied items would be expected to receive a greater strength increment than the studied items receiving the same rating. ...

Kelley's Paradox and strength skewness in research on unconscious mental processes
  • Citing Presentation
  • July 2024

... It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted December 30, 2024. ;https://doi.org/10.1101/2024 doi: bioRxiv preprint were no significant differences in the average breakthrough times between objects manipulated under different sensorimotor conditions (BF 10 < 0.333 indicate substantial evidence for the null hypothesis). ...

Studying unconscious processing: towards a consensus on best practices

... Consequently, it is emphasized that test anxiety should be seriously addressed by both educators and educational institutions. This indicates that test anxiety represents a crucial area for intervention that requires attention to mitigate its adverse effects on learning and success (Liu et al., 2024). ...

Effects of Test Anxiety on Self-Testing and Learning Performance

Educational Psychology Review