ArticlePDF Available

A multi-modal panel dataset to understand the psychological impact of the pandemic

Authors:

Abstract and Figures

Besides far-reaching public health consequences, the COVID-19 pandemic had a significant psychological impact on people around the world. To gain further insight into this matter, we introduce the Real World Worry Waves Dataset (RW3D). The dataset combines rich open-ended free-text responses with survey data on emotions, significant life events, and psychological stressors in a repeated-measures design in the UK over three years (2020: n = 2441, 2021: n = 1716 and 2022: n = 1152). This paper provides background information on the data collection procedure, the recorded variables, participants’ demographics, and higher-order psychological and text-derived variables that emerged from the data. The RW3D is a unique primary data resource that could inspire new research questions on the psychological impact of the pandemic, especially those that connect modalities (here: text data, psychological survey variables and demographics) over time.
This content is subject to copyright. Terms and conditions apply.
1
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
A multi-modal panel dataset to
understand the psychological
impact of the pandemic
Isabelle van der Vegt
 ✉ & Bennett Kleinberg





n =n =
n =





Since the start of the pandemic, social and behavioural scientists have collected data on the psychological impact
on individuals of COVID-19 and the measures introduced around it. e global health crisis severely impacted
lives around the world. At the same time, it enabled social scientists across disciplines to study the response
of humans to unprecedented circumstances. Several papers and associated datasets have emerged as a result
of this, including those that adopted a psychological perspective. For instance, the COVIDiSTRESS Global
Survey includes measures such as perceived stress, trust in authorities, and compliance with anti-COVID meas-
ures collected between 30 March and 30 May 2020 from 173,426 individuals across 39 countries and regions1.
Similarly, the PsyCorona dataset consists of data collected at the start of the pandemic (n = 34,526) from 41 societies
worldwide, measuring psychological variables and behaviours such as leaving the home and physical distancing2.
at dataset has been used in follow-up studies to measure, for example, cooperation and trust across societies3
and associations between emotion and risk perception of COVID-194. Others have studied the concept of ‘pan-
demic fatigue’ (i.e., the perceived inability to “keep up” with restrictions), for which there are data available from
eight countries5. Associations between pandemic fatigue and the severity of restrictions were found, in addition
to pandemic fatigue eliciting political discontent.
Of particular promise to understand how individuals fared during and in the aermath of the pandemic are
free-text responses, which allow for more depth and coverage of topics than targeted survey-style data collec-
tion. Some initiatives have used and made available linguistic data on the consequences of the pandemic, usually
from Twitter6,7. In another study, Reddit and survey data were analysed to measure shis in psychological states
throughout the pandemic8. However, both modalities of data were collected from dierent participants, which
does not allow for deeper exploration of ground truth psychological states of text authors by connecting survey
and text modalities. Collecting text and survey data from the same participants is desirable for several reasons.
Firstly, free-text responses enable participants to report their experiences in the pandemic in an unconstrained
manner, potentially oering deeper insight into psychological processes. Secondly, simultaneously obtained sur-
vey responses oer ground truth measures on the psychological variables potentially underlying what is written
about in text. irdly, advances made in the area of natural language processing allow for in-depth quantitative
analyses of the text data, thereby making text data a resource that reaches beyond qualitative analyses typically
1Utrecht University, Department of Sociology, Utrecht University, 3584 CH, Utrecht, The Netherlands. 2Tilburg
University, Department of Methodology and Statistics, 5037 AB, Tilburg, The Netherlands. 3University College
London, Department of Security and Crime Science, London, WC1E 6BT, UK. 4These authors contributed equally:
Isabelle van der Vegt, Bennett Kleinberg. e-mail: i.w.j.vandervegt@uu.nl


Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
conducted manually. However, collecting data that connect the textual dimension to survey data is costly as
it requires primary data collection and cannot be realised through “found data” (e.g., posts on social media).
Consequently, to date, such datasets are scarce and the lack thereof has impeded how we study the psychological
impact of the pandemic.
e current paper lls that gap and introduces the Real World Worries Waves Dataset (RW3D) oering the
unique combination of ground-truth survey data on emotions with free-text responses describing emotions in
relation to the pandemic. e richness of this dataset allows us to examine, for example, emotional responses
and the content of worries as a consequence of COVID-19. Given the broad scope of potential research ques-
tions and the scarcity and necessity of these data sources, we make this dataset available to the research commu-
nity. Hereaer, we provide detailed background on the data collection procedure, recorded variables, participant
demographics as well as an attrition analysis and descriptive statistics. We also provide evidence for latent clus-
ters of how participants’ emotions changed over time and to what extent they were realistically or overly worried
about various concerns in their lives. Our aim with this paper is to oer detail on a unique resource that could
inspire plenty of research questions.
Methods
Ethics. e data collection was approved by the departmental ethics review board at University College
London. No personal data were collected from participants and all participants provided informed consent for
participation and for their data to be shared.
 e dataset was collected in three waves in April of 2020, 2021 and 2022. Data collection started
in April 2020 on the crowdsourcing platform Prolic with an initial sample size of n = 2500. We then contacted
the same participants through the crowdsourcing platform one year later about a follow-up data collection and
made participation slots available for all participants whose data were collected in the rst wave. at procedure
was repeated another year later with those participants whose data were collected in wave 2. is resulted in
sample sizes of n = 1839 in 2021, and n = 1227 in 2022. See Fig.1 for an overview of the data collection procedure
and retention across waves.
In all data collection phases, participants were informed about the purpose of the study, namely, to collect
data about emotions and worries regarding the pandemic (see Supplemental Materials Table1 for the full task
intro and debrief). Participants started with the self-rated emotions questionnaire and the single emotion selec-
tion, then proceeded to the textual expressions and nally provided control variables (wave 1 and 2) and life
events and psychological stressor variables (wave 3 only, see Fig.1).
Fig. 1 Data collection procedure and retention across waves.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
Only UK-based Prolic users who used Twitter (at least once a month) as per Prolic’s prescreening were
eligible for participation. Upon completion of the survey, each participant was paid GBP 0.50. Even though
the eective time spent on the task was somewhat longer than anticipated, we did not adjust the rewards so as
not to introduce a change in reward as a confounding variable for the repeated-measures design. e task was
administered through Qualtrics.
Timing and societal context. e rst wave of data collection occurred in early April 2020, when the UK
was under lockdown with death tolls increasing. Queen Elizabeth II had just addressed the nation and then Prime
Minister Boris Johnson was admitted to hospital due to COVID-19 symptoms9. In wave 2 (April 2021), many
people in the UK had been vaccinated, and schools, retail and the hospitality sectors were (partially) re-opening.
e delta-variant of the Coronavirus had just been identied at this time10. Finally, in wave 3 (April 2022) all
travel restrictions for those entering the UK had been lied, the Omicron variant was surging and news around
the Partygate aair (i.e., a political scandal surrounding parties held at Downing Street during lockdown) was
ongoing11.
Demographic variables. We obtained participants’ demographics from Prolic. ese are data that reg-
istered participants volunteered to provide and consist of their age, gender, country of birth, nationality, rst
language, employment status, student status, country of birth, country of residence as well as their participation
on the crowdsourcing platform (number of tasks completed and approved). We have added one demographic
question in the survey about their native language (as this may dier from their rst language).
Participants were on average 37.10 years old (SD = 11.98) in April 2022, of which 68.4% were female (31.4%
male, remaining: prefer not to say, see Supplemental Materials Table2). e vast majority (90.5%) indicated
the UK as their country of birth and as their current country of residence (99.7%), which matches the recruit-
ment pre-selection that we made. Regarding their employment status, in 2020, 52.4% indicated being full-time
employed, 22.7% in part-time work and 10.5% not in paid work (e.g., retired). Interestingly, the percentage of
people in full-time work decreased somewhat in 2022 (42.4%). Similarly, the percentage of students decreased
from 16.9% in 2020 to 10.9% in 2022.
Emotion data. Self-rated emotions. Participants were asked to indicate on a 9-point scale how worried they
were about the Corona situation (with labels at 1 = not worried at all; 5 = moderately worried; 9 = very worried)
and how they felt at this moment about the Corona situation. For the latter, they indicated how strongly they felt
each of the following eight emotions (1 = none at all; 5 = moderately; 9 = very much): anger, disgust, fear, anxiety,
sadness, happiness, relaxation, desire12. e scale judgments were indicated using a slider in steps of 1 with labels
at the extremes and in the middle for orientation.
Single emotion selection. Of the eight emotions listed above (i.e., excluding worry), each participant was asked
“If you have to choose just one, which of the emotions below best characterises how you feel at this moment?”.
Table1 shows the descriptive statistics for the emotion variables (self-rated scale values and discrete choice).
While the pattern overall suggests improvement, in that the positive emotions increase while the negative ones
decrease, there are latent patterns at play. Previous work using earlier waves of this data found clusters of par-
ticipants in how their emotion scores changed from 2020 to 202110 and we provide additional evidence for
sub-groups below.
Text data. We elicited two textual responses from each participant. e rst text data were obtained through
the following instruction: “Please write in a few sentences how you feel about the Corona situation at this very
moment. is text should express your feelings at this moment. Participants typed their response in a text eld
and received a prompt if their response was shorter than 500 characters. e second text response was obtained
directly thereaer aimed at eliciting a shorter, Tweet-length text as follows: “Suppose you had to express your
current feeling about the Corona situation in a Tweet (max. 280 characters). Please write in the text box below.
In this case, the participants were prompted if their text input was shorter than 10 or longer than 280 characters.
Emotion MWav e1 SDWa ve 1 prop.Wa ve 1 MWav e2 SDWav e2 prop.Wa ve 2 MWave 3 SDWav e3 prop.Wav e3
worry 6.67 1.70 5.07 2.03 3.98 2.16
anger 3.76 2.18 0.04 3.47 2.35 0.08 2.85 2.19 0.06
disgust 3.06 2.12 0.01 2.79 2.16 0.02 2.39 2.03 0.03
fear 5.63 2.30 0.09 3.77 2.30 0.02 2.85 2.05 0.02
anxiety 6.51 2.30 0.58 5.05 2.52 0.36 4.09 2.45 0.30
sadness 5.55 2.31 0.15 4.64 2.57 0.19 3.48 2.35 0.13
happiness 3.55 1.84 0.01 4.29 1.98 0.05 4.76 2.10 0.07
relaxation 3.83 2.05 0.12 4.54 2.25 0.23 5.14 2.35 0.38
desire 2.73 1.90 0.01 3.42 2.19 0.05 3.22 2.09 0.02
Tab le 1. Descriptive statistics per wave (M, SD) for the self-rated emotions (scale: 1 = not at all; 5 = moderately;
9 = very much) and the proportion of individuals who chose the respective emotion as “best tting” emotion.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
e corpus descriptives (Supplementary Materials Table3) show a stable length of both long and short
texts over the three waves. In total, the corpus consists of 430,751 tokens (2020: 145,348; 2021: 144,191; 2022:
141,212). Figure2 shows example texts written over all three waves.
 In the rst two waves (April 2020 and 2021), we recorded two sets of
control variables: the self-rated ability to express emotions in text and Twitter usage. We decided to drop these
from the third wave. e rationale for dropping these variables was that we assumed these to change little within
the individual and we already had two measurements (wave 1 and 2) that correlated substantially (see Table2).
Emotion expression. As a potential control for the link between self-reported emotions in a survey and the
expression of emotion in text, we asked participants to indicate on a 9-point scale (1 = not at all; 5 = moderately;
9 = very well) how well they (i) could express their feelings in general, (ii) how well in the Tweet-size text, and
(iii) how well in the longer text.
Twitter usage. As an additional potential confounding variable specically for the Tweet-size text we asked
about participants’ Twitter usage. Using a 9-point scale (1 = never; 5 = every month; 9 = every day), participants
indicated how oen they (i) are on Twitter, (ii) send Tweets themselves, and (iii) participate in conversations on
Twitter.
 e most recent wave (April 2022) included two additional constructs that
replaced the control variables from the previous waves. To better understand potential moderating variables of
participants’ emotional adjustment in the pandemic and their textual expression, we collected data on important
life events during the pandemic and used a crisis coping questionnaire13.
Life events. Participants were asked retrospectively about any important events or changes in their life that
have happened to them over the past two years. First, they were asked whether anything - positive or negative
- in [their] life [has] over the past two years impacted how [they] dealt with the Corona situation?” ose who
answered yes were then asked to describe the event, date the event (month and year) and rate the event’s impact
on a scale from 10 (very negative) to +10 (very positive). If there was an additional event, participants could
also submit one more (for a maximum of two events). All life events were subsequently qualitatively coded by
the authors to arrive at overarching categories. For instance, being red, changing jobs, and obtaining a rst
job aer college were mapped to the category ‘job’; getting married, nding a partner, and a break-up were all
mapped to ‘romantic’.
A third of the participants (33.9%) reported a signicant life event during data collection. e most com-
mon life event category was ‘death’ (e.g., a death in the family), which was almost exclusively rated as a negative
life event (97.6%). Life events related to work (e.g., a job change) were also common, which most participants
(69.9%) rated with a positive intensity (Table3). Other life events such as ‘mental health’ (e.g., experiencing
panic attacks, receiving a mental health diagnosis) and ‘nancial’ (e.g., paying o loans, loss of income) show a
more ambivalent pattern and were rated as positive and negative with approximately equal proportion. Most life
events occurred in December 2021 (median). See Table4 for examples of each life event category.
Fig. 2 Text data of a single participant (long text and Tweet-size text).
Var iabl e MWave1 SDWa ve 1 MWav e2 SDWav e2 r
Emotion expression general 6.90 1.72 6.84 1.71 0.45 [0.40; 0.49]
Emotion expression short text 5.95 2.15 6.02 2.06 0.35 [0.30; 0.40]
Emotion expression long text 7.05 1.84 7.08 1.77 0.39 [0.34; 0.44]
Using Twitter 6.24 2.79 5.94 2.86 0.68 [0.65; 0.71]
Sending Tweets 3.77 2.55 3.45 2.49 0.73 [0.70; 0.75]
Conversations on Twitter 3.53 2.43 3.29 2.38 0.68 [0.65; 0.71]
Tab le 2. Descriptives for the control variables collected in wave 1 and 2 (M, SD). All variables correlated
substantially albeit somewhat stronger for the Twitter usage variables than for emotion expression ratings.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
Stressors during crisis. To measure psychological stressors, we used a part of the Crisis Coping Assessment
Questionnaire (CCAQ)13. Specically, we asked several items from two perspectives: how worried they were
about a range of concerns over the past two years (we refer to this below as the worry score) and how problematic
each of the concerns turned out to be (the problem score). For each perspective, participants answered on a
9-point scale (1 = did not worry me at all/not problematic at all; 9 = worried me extremely/extremely problem-
atic) to the following 12 concerns: their own physical health, mental health, and safety, the physical and mental
health and safety of people they love, losing their job, not having enough money to survive, getting basic every-
day things (food, etc.), social unrest, separation from their family, a close person being violent.
Responses to the CCAQ showed that participants were most worried about the physical safety and mental
health of their loved ones. e extent to which these stressors occurred in reality showed that participants’ own
mental health and that of their loved ones were impacted (Table5). For all concerns measured, the worry score
was never exceeded by the actual problem score. at is, participants were consistently more worried about an
issue than that it turned out to be a problem. We see that such a worry-problem discrepancy is not evenly dis-
tributed across concerns from the CCAQ; below we provide evidence for two latent clusters of participants on
that worry-problem discrepancy.
Data Records
e RW3D dataset is available on the Open Science Framework at https://osf.io/9b85r/14.e repository also
contains all supplementary materials and a variable code book with detail and naming conventions for the full
dataset.
e dataset contains columns for emotion ratings, long and short texts, linguistic metadata (number of
characters, punctuation) and demographics separated per wave, indicated by the suxes ‘_wave1’, ‘_wave2’,
‘_wave3’. For data collected in wave 3, we additionally provide - where applicable - up to two descriptions of life
events and their associated impact ('life event’ variables), as well as all participants’ responses to the CCAQ scale
(‘ccaq’ variables). Please see the codebook for a full description of each column.
Technical Validation
is section describes (i) the steps taken to ensure data quality through participant exclusion criteria and (ii)
how data derivatives were obtained.
Data retention and exclusion. Aer each wave of data collection, we excluded participants based on
two text-based criteria: if the long text was not written in the English language, as determined with the cld R
package (https://cran.r-project.org/web/packages/cld3/index.html) or contained more than 20% punctua-
tion tokens, participants were excluded. e latter was applied to remove participants who lled their textual
response with superuous continuous punctuation (e.g., dots, commas, exclamation marks) to reach the charac-
ter length requirement. Both criteria were deemed necessary to ensure text data quality. For the third wave, the
English-language criterion resulted in the exclusion of 38 participants and the punctuation criterion in a further
four participants to be excluded (aer the English-language criterion was already applied). e retention over the
years was 70.3% and 67.1% in the second and third wave, respectively (see Supplementary Materials Table2 for
sample descriptives over the three waves).
Data derivatives. We obtained two kinds of derivatives from the data, one based on the text data and the
other on the emotion and CCAQ questionnaires. From the text data, we arrived at higher-order topics that pro-
vide an overarching theme for each written text and can be used to study what participants are writing about.
e psychological variables (emotion scales and CCAQ) were mapped to higher-order psychological constructs
Event Prop. M (SD) intensity Median intensity Prop. neg. intensity Prop. pos. intensity
no life event 66.15 NA NA NA NA
death 7.20 8.43 (2.83) 10 [10;6] 97.59 2.41
job 7.20 3.08 (6.3) 6 [10;10] 28.92 69.88
romantic 3.12 0.33 (7.95) 4 [10;10] 52.78 47.22
family 3.04 1.88 (7.48) 6 [10;10] 65.71 37.14
reproduction 3.04 5.94 (5.97) 8 [10;10] 17.14 82.86
health 2.52 5 (6.38) 8 [10;10] 75.86 24.14
move 2.34 5.85 (4.93) 8 [8;10] 11.11 88.89
health of family 1.91 7.91 (2.58) 8 [10;0] 95.45 0.00
mental health 1.48 1.18 (7.65) 4 [10;10] 52.94 47.06
education 0.69 2.75 (5.23) 5 [8;4] 62.5 37.5
nancial 0.61 0.29 (7.61) 4 [10;10] 57.14 42.86
lifestyle 0.43 9.6 (0.89) 10 [8;10] 0.00 100.00
friendship 0.26 1.33 (9.87) 6 [10;8] 33.33 66.67
Tab le 3. Summary of the life events data collected during the third wave with intensity (M, SD, Median) and
proportion of participants who indicated a positive and negative intensity, between 10 (very negative) to +10
(very positive).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
characterised by latent clusters of participants on the emotion change (from 2020 to 2021 and from 2021 to 2022)
as well as the discrepancy between their worry and problem score (i.e., the extent to which their worry about a
concern was aligned with how problematic that concern turned out to be).
Topics. To capture overarching themes in the text data, we constructed a correlated topic model using the stm
R package15 for the text data for each data collection wave. is probabilistic model is based on the assumption
that a piece of text consists of a mix of topics, which in turn are a mix of words with probabilities of belonging
to a topic15,16. Table6 shows the top three most prevalent topics per wave for the long texts (see Supplementary
Materials Tables4, 5 for a full list of topics and terms for long and short texts). We have assigned labels to each
topic based on the most frequent terms per topic.
Higher-order psychological clusters. Earlier work found evidence for latent clusters within the data in the
change of emotions from wave 1 to wave 210. We assessed whether there were additional emotion clusters in
this extended dataset and also in the discrepancy between the worry score and the actual problem score on the
concerns listed in the CCAQ. For each concept, we proceeded as follows (Fig.3): we took the delta value of the
emotion ratings for two time shis (wave 2 minus wave 1, and wave 3 minus wave 2) and used the delta between
the CCAQ worry score and problem score (worry minus actual problem score). For each change - emotion
change from 2020 to 2021, emotion change from 2021 to 2022 - and the worry-problem discrepancy, we then
ran k-means clustering17. We decided on the number of clusters through convergence of the scree plot and the
Silhouette method. For all three delta values, there was evidence of two clusters.
e emotion clusters (Table7) for the change from 2020–2021 were characterised by one group of partic-
ipants (44.4% of the sample) showing a marked improvement in emotional well-being, while another group
(55.6%) showed emotional responses resembling resignation (i.e., these participants reported higher anger,
Event category Example 1 Example 2 Example 3
death
We have lost 8 people, family and
friends, to covid only one with
underlying health condition. Very sad
time.
Death of grandparents My father passed away
job My wife got a new job
I got a new job during lockdown
which meant I was no longer forced
into Work 5 days a week but stay at
home half the week - I felt safer
Job loss due to government lockdown
romantic Getting into a committed relationship I have been dating the most beautiful
woman ever. breakup
family Having to move back home and isolate
caused me to realise how much I dont
like my family my son moved out My Mother in Law accusing me of not
letting her son have a vaccine.
reproduction Grandughter born has given us reason
to feel positive 2 years ago I had a baby We had our 2nd child a few days before
we went into the rst national lockdown
in March 2020
health
Getting Covid was a huge thing for me
and it scared me enough to know that I
never want to take risks and get it again
having been in hospital with it.
Finally had an operation I had been
waiting 2 1/2 years for. I’ve gained a lot of weight and my health
has suered too quite a lot
move I moved home to a new
apartment,needed a change of scenery
and life is good here. I bought a house with my husband moving to a dierent city
health of family
My father was ill, I live in Scotland and
he lives in England. I have to be careful
as I am immunocompromised, but
seeing my father was more important
than the risk to myself, so I travelled
My nan took seriously I’ll and we’ve
missed 2 years of being able to
visit and spend time with her she
has dementia and now no longer
remembers us
My father went blind and broke a bone
in his back in a fall which robbed him of
mobility as well as sight. It led to his latest
fall and a positive COVID test.
mental health I was diagnosed with depression. I started taking antidepressants
I stopped caring and putting my time and
energy into being invested in the daily
news of corona. Made my anxiety and
depression more stable.
education Started teacher training Pgce Going to university Finishing uni
nancial Loss of income Initial reaction of nancial markets
to the pandemic. I lost a large chunk
of my retirement savings.
Work dried up for my partner and he is
self employed so had an impact on our
daily life.
lifestyle I changed my lifestyle, started on a
low-carb diet, started exercising and
meditating and making time for myself.
It give me the push to do better with
my running and lose some weight
e key thing is to be kind to myself. Try
to eat healthily. Take some exercise where
we can. I had stick to sleep routine. Pace
myself. Take time to do the things I enjoy.
Even if I can’t go outside.
friendship
e groups of friends I felt close by
had moved away during the pandemic,
in the country I had been living in for
3 years.
End of a relationship with a best
friend, le one day and didnt see
them again. Break down of friendships
Tab le 4. Examples (verbatim)of life events per category.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
disgust and sadness but also lower worry and fear). e subsequent year’s clusters on the change from 2021
to 2022 showed again a well-coping group of participants (40.5%) with a very similar pattern to the earlier
well-coping cluster, with the exception of no increase in desire. is was juxtaposed with a maladjusted group
of participants (59.6%) who overlapped somewhat with the resignation cluster but which we termed dierently
due to this group’s increase in fear and anxiety and decrease in desire.
With regards to the worry-problem discrepancy clustering (Table8), the larger of the two clusters (58.2%)
was characterised by a markedly stronger “over-worry” (i.e., they indicated to worry about the various concerns
much more than that they turned out to be a problem). Over-worrying was particularly evident on questions
about the physical health and safety of loved ones. In contrast, the group that we termed the realistic worriers
(41.8%) show consistently lower worry-problem discrepancies. Merely on the questions about someone close
being violent (domestic violence) both groups were in agreement (low worry and low problem score).

Understanding and addressing the psychological impact of the COVID-19 pandemic, and possibly preparing
for the impact of future global crises, remains an ongoing research challenge. One of the impediments is high
quality data that connects dierent modalities of how individuals experienced the pandemic. e current dataset
paper introduced the RW3D, a repeated-measures dataset of UK participants, combining psychological varia-
bles examined via survey methods with rich textual responses. e explanatory relationships of coping in the
pandemic are yet poorly understood. With the RW3D, we can examine via panel models to what extent life
events, concerns raised in text data or socio-demographics changes (e.g., job loss) and variables (e.g., gender),
explain changes over time into the various emotional response styles. Gaining insights into these complex rela-
tionships could also be a way forward to target interventions at those who most need it. Importantly, the inclu-
sion of various control variables in the dataset allows researchers to control for potential confounds.
Moreover, we can also learn about some fundamental aspects of the relationship between text data and psy-
chological variables. By connecting the modalities, we can test to what extent ground truth emotions are pre-
dictable from text data and whether a lagged design can help anticipate emotion changes at a later moment
based on text data in previous years. Similarly, since we know about participants’ life events and stressors, we
can assess how these are retrievable from the text data. One implicit assumption of plenty of applied text-based
research is that these psychological variables are apparent from text data, but rich datasets to critically assess that
assumption are scarce.
Var iabl e Mworry SDworr y Mactual SDactual
Own physical safety 5.16 2.23 3.58 2.34
Own mental state 5.69 2.40 4.92 2.63
Own safety 4.73 2.24 2.72 1.96
Physical safety loved ones 6.85 1.94 4.33 2.45
Mental health loved ones 6.31 2.09 4.77 2.40
Safety loved ones 6.22 2.20 3.57 2.36
Losing job 3.45 2.52 2.45 2.24
Financial problems 5.06 2.59 3.88 2.63
Getting basics 4.65 2.32 3.55 2.33
Social unrest 4.45 2.14 3.10 2.13
Being separated from family 5.13 2.63 4.13 2.65
Violence close person 1.67 1.49 1.44 1.30
Tab le 5. Summary of worries about psychological stressors and how problematic each stressor turned out to
be (M, SD) on a scale of 1–9 (1 = did not worry me at all/not problematic at all; 9 = worried me extremely/
extremely problematic).
wave topic % documents terms
wave 1 rule following 10.29 peopl, feel, see, mani, die, govern, rule, think, wil l, follow
wave 1 how long will this last? 10.16 will, worri, feel, famili, normal, back, long, know, life, hope
wave 1 worry about loved ones 9.53 worri, also, famili, friend, time, anxious, work, home, feel, will
wave 2 hope for normality 15.04 will, feel, normal, hope, back, get, vaccin, thing, look, forward
wave 2 missing normality 14.38 work, want, friend, famili, feel, see, miss, time, abl, home
wave 2 anxiety 11.91 will, worri, vaccin, concern, also, feel, covid, anxious, eect, virus
wave 3 normal life 9.99 normal, back, now, life, live, covid, worri, feel, get, can
wave 3 still worried 8.70 covid, feel, still, worri, get, peopl, test, though, aect, don’t
wave 3 new variants 8.67 still, variant, vaccin, will, case, virus, new, concern, number, peopl
Tab le 6. Top 3 most prevalent topics for the long texts in each wave, with assigned labels and most frequent terms.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/
Limitations. Some limitations need to be considered when making use of this dataset. First of all, the data
were collected from UK participants only. While this allows for a rich analysis of the UK due to country-specic
circumstances (e.g., infection spread, government responses), the results may not be generalisable to other pop-
ulations. Second, some variables were collected retrospectively requiring participants to report signicant life
events up to 2 years aer they happened and to think back about worries and actual problems of crisis coping.
ird, while there is considerable spread in demographics, the dataset does not make use of nationally represent-
ative sample. A way to mitigate that concern post-hoc might be to weigh sample characteristics according to their
prevalence in the UK population18.
Fig. 3 Clustering approach for two emotion changes and for the worry-problem discrepancy. e k-means
algorithm showed an ideal number of clusters for k = 2 as per the Elbow method and the Silhoutte method.
Emotion 2021 well-coping 2021 resignation 2022 well-coping 2022 maladjusted
anger 1.68 0.83 2.38 0.56
anxiety 3.16 0.11ns 2.82 0.30
desire 1.24 0.25 0.27ns 0.52
disgust 1.26 0.53 1.74 0.51
fear 3.53 0.53 2.52 0.17
happiness 1.85 0.14ns 1.65 0.34
relaxation 2.21 0.48 2.24 0.53
sadness 2.77 0.56 3.27 0.28
worry 2.40 0.97 1.79 0.62
Size 44.4% 55.6% 40.5% 59.6%
Tab le 7. Means per emotion for the latent emotion clusters at the 2020–2021 and 2021–2022 change (all
sign. dierent from 0 at p < 0.01, two-sided, except for those with ns). A positive value denotes an increase in
the respective emotion score in the later wave, while a negative value denotes a decrease.e two wellcoping
clusters show similar patterns that suggest an increase in positive and a decrease in negative emotions. e
resignation (2021) and mal-adjusted cluster show some overlap but dier in their change of desire and fear.
Var iabl e Realistic worriers Over-worriers
Own physical safety 0.76 2.74
Own mental health 0.35 1.37
Own safety 1.17 3.17
Physical health loved ones 1.21 4.33
Mental health loved ones 0.69 2.72
Safety loved ones 1.35 4.47
Losing job 0.59 1.57
Financial problems 0.66 1.91
Getting basic needs 0.53 1.88
Social unrest 0.85 2.04
Separation from family 0.54 1.64
Domestic violence 0.22 0.24
Size 41.8% 58.2%
Tab le 8. Clustering on the worry-problem discrepancy (all sign. dierent from 0 at p < 0.01, two-sided).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
Scientific DATA | (2023) 10:537 | https://doi.org/10.1038/s41597-023-02438-y
www.nature.com/scientificdata
www.nature.com/scientificdata/

There is no custom code associated with this data descriptor. For data(pre)processing and obtaining data
derivatives, we used existing R packages. is included cld for English language checks, quanteda19 and stringr
(https://stringr.tidyverse.org/) for text metadata (number of characters, tokens, punctuation), the stm package15
for constructing topic models, and the factoextra package (https://rpkgs.datanovia.com/factoextra/) for the
determination of the number of clusters for obtaining the higher-order psychological constructs.
Received: 1 February 2023; Accepted: 2 August 2023;
Published: xx xx xxxx
References
1. Yamada, Y. et al. COVIDiSTESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbrea.
Scientic Data 8, 3, https://doi.org/10.1038/s41597-020-00784-9, Number: 1 Publisher: Nature Publishing Group (2021).
2. reienamp, J., Agostini, M., rause, J. & Pontus Leander, N. PsyCorona: A World of eactions to COVID-19. APS Observer 33 (2020).
3. omano, A. et al. Cooperation and Trust Across Societies During the COVID-19 Pandemic. Journal of Cross-Cultural Psychology 52,
622–642, https://doi.org/10.1177/0022022120988913, Publisher: SAGE Publications Inc (2021).
4. Han, Q. et al. Associations of ris perception of COVID-19 with emotion and mental health during the pandemic. Journal of
Aective Disorders 284, 247–255, https://doi.org/10.1016/j.jad.2021.01.049 (2021).
5. Jørgensen, F., Bor, A., asmussen, M. S., Lindholt, M. F. & Petersen, M. B. Pandemic fatigue fueled political discontent during the
covid-19 pandemic. Proceedings of the National Academy of Sciences 119, e2201266119 (2022).
6. Banda, J. M. et al. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientic esearch–An International Collaboration.
Epidemiologia 2, 315–324, https://doi.org/10.3390/epidemiologia2030024, Number: 3 Publisher: Multidisciplinary Digital
Publishing Institute (2021).
7. Naseem, U., azza, I., hushi, M., Elund, P. W. & im, J. COVIDSenti: A Large-Scale Benchmar Twitter Data Set for COVID-19
Sentiment Analysis. IEEE Transactions on Computational Social Systems 8, 1003–1015, https://doi.org/10.1109/TCSS.2021.3051189.
Conference Name: IEEE Transactions on Computational Social Systems (2021).
8. Ashoumar, A. & Pennebaer, J. Social media conversations reveal large psychological shis caused by COVID-19’ss onset across U.S.
cities https://doi.org/10.1126/sciadv.abg7843 (2021).
9. leinberg, B., van der Vegt, I. & Mozes, M. Measuring emotions in the covid-19 real world worry dataset. In Proceedings of the 1st
Worshop on NLP for COVID-19 at ACL 2020 (2020).
10. Mozes, M., van der Vegt, I. & leinberg, B. A repeated-measures study on emotional responses aer a year in the pandemic.
Scientic reports 11, 1–11 (2021).
11. Stric, . Partygate – a timeline of the Covid Downing Street parties scandal | Evening Standard (2022).
12. Harmon-Jones, C., Bastian, B. & Harmon-Jones, E. e discrete emotions questionnaire: A new tool for measuring state self-
reported emotions. PloS one 11, e0159915 (2016).
13. Lahlou, S. et al. Ccaq: A shared, creative commons crisis coping assessment questionnaire. World Pandemic esearch Networ
(2016).
14. Van Der Vegt, I. & leinberg, B. e eal World Worry Waves Dataset, Open Science Framewor, https://doi.org/10.17605/osf.
io/9b85r (2023).
15. oberts, M. E., Stewart, B. M. & Tingley, D. stm: An  pacage for structural topic models. Journal of Statistical Soware 91, 1–40,
https://doi.org/10.18637/jss.v091.i02 (2019).
16. Blei, D. M. & Laerty, J. D. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, 113–120
(2006).
17. Lias, A., Vlassis, N. & Verbee, J. J. e global -means clustering algorithm. Pattern recognition 36, 451–461 (2003).
18. Bradley, V. C. et al. Unrepresentative big surveys signicantly overestimated us vaccine uptae. Nature 600, 695–700 (2021).
19. Benoit, . et al. quanteda: An r pacage for the quantitative analysis of textual data. Journal of Open Source Soware 3, 774, https://
doi.org/10.21105/joss.00774 (2018).
Author contributions
Both authors contributed equally to the conceptualisation of the data collection, the study design, the analysis of
the data and the writing and nalising of the manuscript.

e authors declare no competing interests.
Additional information
Supplementary information e online version contains supplementary material available at https://doi.org/
10.1038/s41597-023-02438-y.
Correspondence and requests for materials should be addressed to I.v.d.V.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
© e Author(s) 2023
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... They also chose one emotion among anger, anxiety, disgust, desire, fear, happiness, relaxation, sadness. In 2023, the authors presented a 3-year dataset (van der Vegt and Kleinberg, 2023). ...
Article
Full-text available
Health authorities have highlighted “pandemic fatigue” as a psychological consequence of the COVID-19 pandemic and warned that “fatigue” could demotivate compliance with health-related policies and mandates. Yet, fatigue from following the policies of authorities may have consequences far beyond the health domain. Theories from the social sciences have raised that real and perceived costs of policies can also drive sentiments of discontent with the entire political establishment. Integrating theories from the health and social sciences, we ask how pandemic fatigue (i.e., perceived inability to “keep up” with restrictions) developed over the pandemic and whether it fueled political discontent. Utilizing longitudinal and panel surveys collected from September 2020 to July 2021 in eight Western countries (N = 49,116), we analyze: 1) fatigue over time at the country level, 2) associations between pandemic fatigue and discontent, and 3) the effect of pandemic fatigue on political discontent using panel data. Pandemic fatigue significantly increased with time and the severity of interventions but also decreased with COVID-19 deaths. When triggered, fatigue elicited a broad range of discontent, including protest support and conspiratorial thinking. The results demonstrate the significant societal impact of the pandemic beyond the domain of health and raise concerns about the stability of democratic societies, which were already strained by strife prior to the pandemic.
Article
Full-text available
Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox¹. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi–Facebook2,3 (about 250,000 responses per week) and Census Household Pulse⁴ (about 75,000 every two weeks). In May 2021, Delphi–Facebook overestimated uptake by 17 percentage points (14–20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11–17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios–Ipsos online panel⁵ with about 1,000 responses per week following survey research best practices⁶ provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework¹ to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.
Article
Full-text available
The introduction of COVID-19 lockdown measures and an outlook on return to normality are demanding societal changes. Among the most pressing questions is how individuals adjust to the pandemic. This paper examines the emotional responses to the pandemic in a repeated-measures design. Data ( n = 1698) were collected in April 2020 (during strict lockdown measures) and in April 2021 (when vaccination programmes gained traction). We asked participants to report their emotions and express these in text data. Statistical tests revealed an average trend towards better adjustment to the pandemic. However, clustering analyses suggested a more complex heterogeneous pattern with a well-coping and a resigning subgroup of participants. Linguistic computational analyses uncovered that topics and n -gram frequencies shifted towards attention to the vaccination programme and away from general worrying. Implications for public mental health efforts in identifying people at heightened risk are discussed. The dataset is made publicly available.
Article
Full-text available
The current research chronicles the unfolding of the early psychological impacts of coronavirus disease 2019 (COVID-19) by analyzing Reddit language from 18 U.S. cities (200,000+ people) and large-scale survey data (11,000+ people). Large psychological shifts were found reflecting three distinct phases. When COVID-19 warnings first emerged (“warning phase”), people’s attentional focus switched to the impending threat. Anxiety levels surged, and positive emotion and anger dropped. In parallel, people’s thinking became more intuitive rather than analytic. When lockdowns began (“isolation phase”), analytic thinking dropped further. People became sadder, and their thinking reflected attempts to process the uncertainty. Familial ties strengthened, but ties to broader social groups weakened. Six weeks after COVID-19’s onset (“normalization phase”), people’s psychological states stabilized but remained elevated. Most psychological shifts were stronger when the threat of COVID-19 was greater. The magnitude of the observed shifts dwarfed responses to other events that occurred in the previous decade.
Article
Full-text available
Cross-societal differences in cooperation and trust among strangers in the provision of public goods may be key to understanding how societies are managing the COVID-19 pandemic. We report a survey conducted across 41 societies between March and May 2020 (N = 34,526), and test pre-registered hypotheses about how cross-societal differences in cooperation and trust relate to prosocial COVID-19 responses (e.g., social distancing), stringency of policies, and support for behavioral regulations (e.g., mandatory quarantine). We further tested whether cross-societal variation in institutions and ecologies theorized to impact cooperation were associated with prosocial COVID-19 responses, including institutional quality, religiosity, and historical prevalence of pathogens. We found substantial variation across societies in prosocial COVID-19 responses, stringency of policies, and support for behavioral regulation. However, we found no consistent evidence to support the idea that cross-societal variation in cooperation and trust among strangers is associated with these outcomes related to the COVID-19 pandemic. These results were replicated with another independent cross-cultural COVID-19 dataset (N = 112,136), and in both snowball and representative samples. We discuss implications of our results, including challenging the assumption that managing the COVID-19 pandemic across societies is best modelled as a public goods dilemma.
Article
Full-text available
Social media (and the world at large) have been awash with news of the COVID-19 pandemic. With the passage of time, news and awareness about COVID-19 spread like the pandemic itself, with an explosion of messages, updates, videos, and posts. Mass hysteria manifest as another concern in addition to the health risk that COVID-19 presented. Predictably, public panic soon followed, mostly due to misconceptions, a lack of information, or sometimes outright misinformation about COVID-19 and its impacts. It is thus timely and important to conduct an ex post facto assessment of the early information flows during the pandemic on social media, as well as a case study of evolving public opinion on social media which is of general interest. This study aims to inform policy that can be applied to social media platforms; for example, determining what degree of moderation is necessary to curtail misinformation on social media. This study also analyzes views concerning COVID-19 by focusing on people who interact and share social media on Twitter. As a platform for our experiments, we present a new large-scale sentiment data set COVIDSENTI, which consists of 90 000 COVID-19-related tweets collected in the early stages of the pandemic, from February to March 2020. The tweets have been labeled into positive, negative, and neutral sentiment classes. We analyzed the collected tweets for sentiment classification using different sets of features and classifiers. Negative opinion played an important role in conditioning public sentiment, for instance, we observed that people favored lockdown earlier in the pandemic; however, as expected, sentiment shifted by mid-March. Our study supports the view that there is a need to develop a proactive and agile public health presence to combat the spread of negative sentiment on social media following a pandemic.
Article
Full-text available
Our previous study demonstrated increased expression of Heat shock protein (Hsp) 90 in the skin of patients with systemic sclerosis (SSc). We aimed to evaluate plasma Hsp90 in SSc and characterize its association with SSc-related features. Ninety-two SSc patients and 92 age-/sex-matched healthy controls were recruited for the cross-sectional analysis. The longitudinal analysis comprised 30 patients with SSc associated interstitial lung disease (ILD) routinely treated with cyclophosphamide. Hsp90 was increased in SSc compared to healthy controls. Hsp90 correlated positively with C-reactive protein and negatively with pulmonary function tests: forced vital capacity and diffusing capacity for carbon monoxide (DLCO). In patients with diffuse cutaneous (dc) SSc, Hsp90 positively correlated with the modified Rodnan skin score. In SSc-ILD patients treated with cyclophosphamide, no differences in Hsp90 were found between baseline and after 1, 6, or 12 months of therapy. However, baseline Hsp90 predicts the 12-month change in DLCO. This study shows that Hsp90 plasma levels are increased in SSc patients compared to age-/sex-matched healthy controls. Elevated Hsp90 in SSc is associated with increased inflammatory activity, worse lung functions, and in dcSSc, with the extent of skin involvement. Baseline plasma Hsp90 predicts the 12-month change in DLCO in SSc-ILD patients treated with cyclophosphamide.
Article
Full-text available
This N = 173,426 social science dataset was collected through the collaborative COVIDiSTRESS Global Survey-an open science effort to improve understanding of the human experiences of the 2020 COVID-19 pandemic between 30th March and 30th May, 2020. The dataset allows a cross-cultural study of psychological and behavioural responses to the Coronavirus pandemic and associated government measures like cancellation of public functions and stay at home orders implemented in many countries. The dataset contains demographic background variables as well as measures of Asian Disease Problem, perceived stress (PSS-10), availability of social provisions (SPS-10), trust in various authorities, trust in governmental measures to contain the virus (OECD trust), personality traits (BFF-15), information behaviours, agreement with the level of government intervention, and compliance with preventive measures, along with a rich pool of exploratory variables and written experiences. A global consortium from 39 countries and regions worked together to build and translate a survey with variables of shared interests, and recruited participants in 47 languages and dialects. Raw plus cleaned data and dynamic visualizations are available.
Article
Full-text available
As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated in the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique world-wide event into biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing. This open dataset will allow researchers to conduct a number of research projects relating to the emotional and mental responses to social distancing measures, the identification of sources of misinformation, and the stratified measurement of sentiment towards the pandemic in near real time.
Article
Background Although there are increasing concerns on mental health consequences of the COVID-19 pandemic, no large-scale population-based studies have examined the associations of risk perception of COVID-19 with emotion and subsequent mental health. Methods : This study analysed cross-sectional and longitudinal data from the PsyCorona Survey that included 54,845 participants from 112 countries, of which 23,278 participants are representative samples of 24 countries in terms of gender and age. Specification curve analysis (SCA) was used to examine associations of risk perception of COVID-19 with emotion and self-rated mental health. This robust method considers all reasonable model specifications to avoid subjective analytical decisions while accounting for multiple testing. Results : All 162 multilevel linear regressions in the SCA indicated that higher risk perception of COVID-19 was significantly associated with less positive or more negative emotions (median standardised β=-0.171, median SE=0.004, P<0.001). Specifically, regressions involving economic risk perception and negative emotions revealed stronger associations. Moreover, risk perception at baseline survey was inversely associated with subsequent mental health (standardised β=-0.214, SE=0.029, P<0.001). We further used SCA to explore whether this inverse association was mediated by emotional distress. Among the 54 multilevel linear regressions of mental health on risk perception and emotion, 42 models showed a strong mediation effect, where no significant direct effect of risk perception was found after controlling for emotion (P>0.05). Limitations Reliance on self-reported data. Conclusions : Risk perception of COVID-19 was associated with emotion and ultimately mental health. Interventions on reducing excessive risk perception and managing emotional distress could promote mental health.