Tom Stafford’s research while affiliated with The University of Sheffield and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (29)


Changes in cognition of ADRD patients: effects of social isolation proxies, utilising data from UK electronic health records
  • Article
  • Full-text available

January 2025

·

1 Read

·

Tom Stafford

·

Background Marital status and living status are components of social isolation (SI), a modifiable factor thought to impact cognitive resilience, which has the potential to impact cognition throughout the course of Alzheimer’s and related dementia (ADRD) diagnosis. Electronic health records (EHRs) offer access to large scale clinical data, capable of longitudinal analyses. Method Cognitive function measurement – Montreal Cognitive Assessment (MoCA) – data, demographic (including marital and living status as SI proxies) data and ADRD diagnosis data from patients aged 50+ years from Oxford Health NHS Foundation Trust (UK) were extracted using natural language processing algorithms from EHRs dated 1995 to 2022. Longitudinal multilevel models were used to predict cognition as a function of the interaction between diagnosis duration and SI proxies, controlling for age, sex and diagnosis cause. Result ‘Lifelong single’ marital status significantly predicted reduced cognition intercept scores for the MoCA dataset (𝛽 = ‐1.61, SE = 0.67, t = ‐2.42, p = 0.016). No significant marital status predictors for slope were found. Living in supported accommodation significantly predicted steeper slopes for cognition (𝛽 = ‐2.37, SE = 0.33, t = ‐7.20, p < 0.001). No living status levels significantly predicted slopes. Conclusion Worldwide ADRD incidence is predicted to increase dramatically within the next 30 years, therefore studies investigating the impact of modifiable factors on the rates of cognitive change in ADRD patients are valuable to enhancing understanding of patient care. SI data extracted from EHRs can be used to predict differences in patient cognition scores.

Download

Figure 6: Subject accuracy vs. conformity level of Llama-3-8B-Instruct over 57 subjects in MMLU.
Conformity in Large Language Models

October 2024

·

96 Reads

Xiaochen Zhu

·

Caiqi Zhang

·

Tom Stafford

·

[...]

·

The conformity effect describes the tendency of individuals to align their responses with the majority. Studying this bias in large language models (LLMs) is crucial, as LLMs are increasingly used in various information-seeking and decision-making tasks as conversation partners to improve productivity. Thus, conformity to incorrect responses can compromise their effectiveness. In this paper, we adapt psychological experiments to examine the extent of conformity in state-of-the-art LLMs. Our findings reveal that all models tested exhibit varying levels of conformity toward the majority, regardless of their initial choice or correctness, across different knowledge domains. Notably, we are the first to show that LLMs are more likely to conform when they are more uncertain in their own prediction. We further explore factors that influence conformity, such as training paradigms and input characteristics, finding that instruction-tuned models are less susceptible to conformity, while increasing the naturalness of majority tones amplifies conformity. Finally, we propose two interventions--Devil's Advocate and Question Distillation--to mitigate conformity, providing insights into building more robust language models.


Figure 1: Zool Redimensioned (image retrieved from Games Industry.biz website)
Figure 2: Histogram of players discontinuing their gameplay. Players that reached level 28 were defined as censored events where the event (dropout) did not happen.
Figure 3: Figure 3: Survival probability for an interaction between warning setting and game over event. Groups illustrated in red (Control group without game over outcome) and green (Experimental group without game over outcome) colour overlap and are least likely to stop their engagement with the game. Groups illustrated in purple (Experimental group with a game over outcome) and blue (Control group with a game over outcome) have a higher likelihood of dropping out from the game.)
Longitudinal Cox regression coefficients Estimate Hazard ratio Standard error z statistics p-value 95% CI (hazard ratio)
Experiments in games: modding the Zool Redimensioned warning system to support players' skill acquisition and attrition rate

The scientific potential of digital game studies in psychology is limited by the observational nature of the data that they investigate. However, digital environments present us with a perfect opportunity to incorporate experimental paradigms in complex interactive and multivariate worlds where each decision made by participants can be tracked and recorded. In this study, we demonstrate an industry-academic research collaboration that offers a proof-of-the-concept on how minor modifications of the game settings could be used to test psychological research questions. We modify the settings of the Zool platform game, where players allocated to the experimental group are provided with more information when in danger of dying in the game. Results of the study show that manipulation does not influence behaviour in the game, such as achieved score or number of deaths, but it changes the overall player's response of whether they will continue playing the game after the disappointing event of losing all their lives, game over event. In line with previous studies, the additional information provided through the experimental manipulation made death in the game more informative to the players.


Where next for partial randomisation of research funding? The feasibility of RCTs and alternatives

May 2024

·

3 Reads

·

1 Citation

We outline essential considerations for any study of partial randomisation of research funding, and consider scenarios in which randomised controlled trials (RCTs) would be feasible and appropriate. We highlight the interdependence of target outcomes, sample availability and statistical power for determining the cost and feasibility of a trial. For many choices of target outcome, RCTs may be less practical and more expensive than they at first appear (in large part due to issues pertaining to sample size and statistical power). As such, we briefly discuss alternatives to RCTs. It is worth noting that many of the considerations relevant to experiments on partial randomisation may also apply to other potential experiments on funding processes (as described in The Experimental Research Funder’s Handbook. RoRI, June 2022).


DeliData: A Dataset for Deliberation in Multi-party Problem Solving

October 2023

·

6 Reads

·

2 Citations

Proceedings of the ACM on Human-Computer Interaction

Group deliberation enables people to collaborate and solve problems, however, it is understudied due to a lack of resources. To this end, we introduce the first publicly available dataset containing collaborative conversations on solving a well-established cognitive task, consisting of 500 group dialogues and 14k utterances. In 64% of these conversations, the group members are able to find a better solution than they had identified individually, and in 43.8% of the groups who had a correct answer as their final solution, none of the participants had solved the task correctly by themselves. Furthermore, we propose a novel annotation schema that captures deliberation cues and release all 14k utterances annotated with it. Finally, we use the proposed dataset to develop and evaluate two methods for generating deliberation utterances. The data collection platform, dataset and annotated corpus are publicly available at https://delibot.xyz.


Unsupervised identification of internal perceptual states influencing psychomotor performance

September 2023

·

80 Reads

A bstract When humans perform repetitive tasks over long periods, their performance is not constant. People may drift in and out of states that might be loosely categorised as engagement, disengagement or ‘flow’ and these will be reflected in multiple aspects of their performance (for example, reaction time, accuracy, criteria shifts and potentially longer-term strategy) but until recently it has been challenging to relate these behavioural states to the underlying neural mechanisms that generate them. Here, we took Magnetoencephalograpy recordings of participants performing an engaging task that required rapid, strategic behavioural responses. In this way we acquired both high density neural data and contemporaneous, dense behavioural data. Specifically, participants played a laboratory version of Tetris which collects detailed recordings of player input and game-state throughout performance. We asked whether it was possible to infer the presence of distinct behavioural states from the behavioural data and, if so, whether these states would have distinct neural correlates. We used hidden Markov modelling to segment the behavioural time series into states with unique behavioural signatures, finding that we could identify three distinct and robust behavioural states. We then computed occipital alpha power across each state. These within-participant differences in alpha power were statistically significant, suggesting that individuals shift between behaviourally and neurally distinct states during complex performance, and that visuo-spatial attention change across these states.


Where next for partial randomisation of research funding? The feasibility of RCTs and alternatives

July 2023

·

4 Reads

·

2 Citations

We outline essential considerations for any study of partial randomisation of research funding, and consider scenarios in which randomised controlled trials (RCTs) would be feasible and appropriate. We highlight the interdependence of target outcomes, sample availability and statistical power for determining the cost and feasibility of a trial. For many choices of target outcome, RCTs may be less practical and more expensive than they at first appear (in large part due to issues pertaining to sample size and statistical power). As such, we briefly discuss alternatives to RCTs. It is worth noting that many of the considerations relevant to experiments on partial randomisation may also apply to other potential experiments on funding processes (as described in The Experimental Research Funder’s Handbook . RoRI, June 2022).


Figure 1: A dialogue excerpt from our dataset about veganism between a participant (P) and a Wizard (W).
Figure 3: Ratings for chat experiences for the argubot. The y-axis corresponds to the proportion of the dialogues, the x-axis corresponds to chat experiences and the different colors refer to the ratings on the 7-point Likert scale, where 1=strongly disagree and 7=strongly agree.
The percentage of dialogues that have zero, positive or negative OUM scores in the three OUM categories.
Opening up Minds with Argumentative Dialogues

January 2023

·

63 Reads

Recent research on argumentative dialogues has focused on persuading people to take some action, changing their stance on the topic of discussion, or winning debates. In this work, we focus on argumentative dialogues that aim to open up (rather than change) people's minds to help them become more understanding to views that are unfamiliar or in opposition to their own convictions. To this end, we present a dataset of 183 argumentative dialogues about 3 controversial topics: veganism, Brexit and COVID-19 vaccination. The dialogues were collected using the Wizard of Oz approach, where wizards leverage a knowledge-base of arguments to converse with participants. Open-mindedness is measured before and after engaging in the dialogue using a questionnaire from the psychology literature, and success of the dialogue is measured as the change in the participant's stance towards those who hold opinions different to theirs. We evaluate two dialogue models: a Wikipedia-based and an argument-based model. We show that while both models perform closely in terms of opening up minds, the argument-based model is significantly better on other dialogue properties such as engagement and clarity.


Figure 3: Illustration of Graham's Hierarchy of disagreement.
Statistics of the three pilot annotation rounds, including the number of utterances, the average rebuttal level assigned per annotator, and the Cohen's Kappa and Pearson's r.
How to disagree well: Investigating the dispute tactics used on Wikipedia

December 2022

·

850 Reads

Disagreements are frequently studied from the perspective of either detecting toxicity or analysing argument structure. We propose a framework of dispute tactics that unifies these two perspectives, as well as other dialogue acts which play a role in resolving disputes, such as asking questions and providing clarification. This framework includes a preferential ordering among rebuttal-type tactics, ranging from ad hominem attacks to refuting the central argument. Using this framework, we annotate 213 disagreements (3,865 utterances) from Wikipedia Talk pages. This allows us to investigate research questions around the tactics used in disagreements; for instance, we provide empirical validation of the approach to disagreement recommended by Wikipedia. We develop models for multilabel prediction of dispute tactics in an utterance, achieving the best performance with a transformer-based label powerset model. Adding an auxiliary task to incorporate the ordering of rebuttal tactics further yields a statistically significant increase. Finally, we show that these annotations can be used to provide useful additional signals to improve performance on the task of predicting escalation.


Mind the gap: Distributed practice enhances performance in a MOBA game

October 2022

·

177 Reads

·

5 Citations

Understanding how humans master complex skills has the potential for wide-reaching societal benefit. Research has shown that one important aspect of effective skill learning is the temporal distribution of practice episodes (i.e., distributed practice). Using a large observational sample of players (n = 162,417) drawn from a competitive and popular online game (League of Legends), we analysed the relationship between practice distribution and performance through time. We compared groups of players who exhibited different play schedules using data slicing and machine learning techniques, to show that players who cluster gameplay into shorter time frames ultimately achieve lower performance levels than those who space their games across longer time windows. Additionally, we found that the timing of intensive play periods does not affect final performance—it is the overall amount of spacing that matters. These results extend some of the key findings in the literature on practice and learning to an ecologically valid environment with huge n. We discuss our work in relation to recent studies that have examined practice effects using Big Data and suggest solutions for salient confounds.


Citations (15)


... Graesser et al. (2018) describes the need for team members to externalize their knowledge. Karadzhov et al. (2022) explores how deliberation may lead to team members changing their minds, which is critical for building group common ground (Stalnaker, 1978). ...

Reference:

“Any Other Thoughts, Hedgehog?” Linking Deliberation Chains in Collaborative Dialogues
What makes you change your mind? An empirical investigation in online group decision-making conversations
  • Citing Conference Paper
  • January 2022

... (3) We investigate whether different prompting styles, such as zero-shot, few-shot, and chain-of-thought (CoT), improve moderation. (4) We introduce a new dataset of real-world deliberations, filling the gap left by existing datasets like DELIDATA [11], which are not about real-world deliberations to solve problems. ...

DeliData: A Dataset for Deliberation in Multi-party Problem Solving
  • Citing Article
  • October 2023

Proceedings of the ACM on Human-Computer Interaction

... Similar situation was explained by Fuentes-Garcıa, Pulido, Morales & Menayo [18], training that was carried out continuously can affect the results of learning motion. Meanwhile, distributed training has advantages in terms of taking advantage of rest breaks to analyze the movements that have been learned, such as which movements are right and wrong and can prepare energy to face the next exercise [32]. The results of this study were also reinforced by previous research which reported that both methods had positive effectiveness in teaching hitting skills in tennis [17]. ...

Mind the gap: Distributed practice enhances performance in a MOBA game

... A total of 12 studies published from 2019 to 2023 were analyzed and summarized (Altay et al., 2022;Amith et al., 2019Amith et al., , 2020Brand & Stafford, 2022;EI Ayadi et al., 2022;Hong et al., 2021;Kobayashi et al., 2022Kobayashi et al., , 2023Lee et al., 2022;Luk et al., 2022;Wang et al., 2023;Weeks et al., 2023) (Appendix C). Regarding the time of the studies, two were conducted before 2020 (before the COVID-19 pandemic), and ten were conducted in 2020 or the following years (during the COVID-19 pandemic). ...

Using dialogues to increase positive attitudes towards COVID-19 vaccines in a vaccine-hesitant UK population

... In light of this point, we propose that directly measuring AVG performance (we refer to this as AVG proficiency) may better reflect the individual differences in skill gained through playing an AVG over experience alone 26,27 . This is due to proficiency more directly indexing the cognitive and perceptual processes that are being honed while playing AVGs. ...

Maximizing the Potential of Digital Games for Understanding Skill Acquisition

Current Directions in Psychological Science

... Online video game platforms, in particular, provide an unprecedented opportunity to reach more than 400 million intrinsically motivated people who voluntarily hone their cognitive, perceptual (Li et al., 2009;Appelbaum et al., 2013), and motor skills over weeks, months and even years (Green and Bavelier, 2003;Eichenbaum et al., 2014). Specifically, first person shooter games, where players use tools and weapons as intuitive extensions of themselves in a virtual 3D environment, provide an ideal testbed for studying motor skill acquisition over a long duration of practice (Stafford and Vaci, 2021). Successful shooters are those who can efficiently identify and localize relevant visual targets (Green and Bavelier, 2006), as well as rapidly and accurately hit their targets -a refined motor skill. ...

Digital games as a platform for understanding skill acquisition from novice to expert

... It is important because it leads to unrealistically high performance [65]. A study [21] also considered the inputs and adapted natural language processing and deep learning to achieve an AUC of 0.91. In recent history, neural networks have been common in implementing predictive models to determine PD. ...

Identifying Robust Markers of Parkinson's Disease in Typing Behaviour Using a CNN-LSTM Network

... At least, it is mistaken as a statement about what criminal justice can, and should, achieve-as opposed to what it currently achieves. Philosophical work on epistemic injustice has shown that epistemic oppression-that is, any unjust exclusion that prevents individuals or groups from participating in interpersonal epistemic enterprises-is 71 See Sullivan (2017, p. 297) and Owusu-Bempah (2022a, p. 148 (2017) and Scaife et al. (2020). The latter study shows that a blaming response to an individual's implicit bias can reduce this individual's IAT score in the short term and can motivate the individual to change behaviour influenced by implicit bias. ...

To Blame? The Effects of Moralized Feedback on Implicit Racial Bias

Collabra Psychology

... Firstly, a motor-cognitive interference effect may have occurred due to the increased cognitive effort required to engage in AO+MI compared to the baseline and control conditions. Proficient typists, like the participants in this experiment, are likely to use automatic processes to execute habitual typing (Bannard et al., 2019;Rieger, 2004), but the AO+MI conditions forced participants to think consciously about their typing. This process may have engaged more executive processes and frontal neural networks than would have otherwise been engaged in habitual typing, such as the SMA, and disrupted typing execution in line with the constrained action hypothesis (Wulf et al., 2001). ...

Reduced habit-driven errors in Parkinson’s Disease

... Sin embargo, otros autores asignan al concepto una connotación negativa, es decir, para ellos se trata de un concepto normativo (Stafford et al., 2018). Así, los sesgos son vinculados a evaluaciones distorsionadas y negativas de determinados sujetos o grupos. ...

Confronting bias in judging: A framework for addressing psychological biases in decision making
  • Citing Preprint
  • September 2018