ChapterPDF Available

Recommendations for Conducting Longitudinal Experience Sampling Studies

Authors:

Abstract

The Experience Sampling Method is used to collect participant self-reports over extended observation periods. These self-reports offer a rich insight into the individual lives of study participants by intermittently asking participants a set of questions. However, the longitudinal and repetitive nature of this sampling approach introduces a variety of concerns regarding the data contributed by participants. A decrease in participant interest and motivation may negatively affect study adherence, as well as potentially affecting the reliability of participant data. In this chapter, we reflect on a number of studies that aim to understand better participant performance with Experience Sampling. We discuss the main issues relating to participant data for longitudinal studies and provide hands-on recommendations for researchers to remedy these concerns in their own studies.
Recommendations for Conducting
Longitudinal Experience Sampling
Studies
Niels van Berkel and Vassilis Kostakos
Abstract The Experience Sampling Method is used to collect participant self-
reports over extended observation periods. These self-reports offer a rich insight
into the individual lives of study participants by intermittently asking participants
asetofquestions.However,thelongitudinalandrepetitivenatureofthissampling
approach introduces a variety of concerns regarding the data contributed by partici-
pants. A decrease in participant interest and motivation may negatively affect study
adherence, as well as potentially affecting the reliability of participant data. In this
chapter, we reflect on a number of studies that aim to understand better participant
performance with Experience Sampling. We discuss the main issues relating to par-
ticipant data for longitudinal studies and provide hands-on recommendations for
researchers to remedy these concerns in their own studies.
Keywords Experience sampling method ·Ecological momentary assessment ·
ESM ·EMA ·Self-report ·Data quality ·Reliability
1 Introduction
Responding to an increased interest in studying human life more systematically
than traditional surveys—and in a more realistic and longitudinal setting than pos-
sible through observations—Larson and Csikszentmihalyi introduced the Experi-
ence Sampling Method in 1983 [1]. Researchers using the Experience Sampling
Method (ESM) ask their participants to intermittently complete a short question-
naire assessing their current state, context, or experience over an extended period of
time (typically a couple of weeks). Questionnaires are typically designed to ensure
that participants focus on their current experience rather than to reflect over a longer
N. van Berkel (B
)
Aalborg University, Aalborg, Denmark
e-mail: nielsvanberkel@cs.aau.dk
V. Kostakos
The University of Melbourne, Melbourne, Australia
e-mail: vassilis.kostakos@unimelb.edu.au
©SpringerNatureSwitzerlandAG2021
E. Karapanos et al. (eds.), Advances in Longitudinal HCI Research,
Human–Computer Interaction Series, https://doi.org/10.1007/978-3- 030-67322- 2_4
59
60 N. van Berkel and V. Kostakos
period of time, thus minimising the effects of participants’ (in)ability to accurately
recollect past events [2].
Early ESM studies focused on capturing the daily activities and corresponding
experiences of study participants [3]. In those studies, participants were asked to
answer what they were currently doing repeatedly. Collecting self-reports at random
slots throughout the day, as opposed to a one-off survey or interview, ensured that
responses are collected during the participant’s “interaction with the material and
social environment”[3]. In other words, the idea to collect self-report data in situ
and thereby increase the ecological validity of a study was motivated by a desire to
increase the reliability of participant responses.
ArecentsurveyindicatedanincreasedadoptionoftheExperienceSampling
Method, with a focus on (personal) mobile devices [4]. The use of mobile devices as
opposed to paper-based questionnaires provides a number of advances in terms of
control over participant entries (e.g. prevent ‘parking lot compliance’ [5]), interactive
design opportunities [6,7], and contextual sensing possibilities [810]. We discuss
how these opportunities provided by mobile devices can be utilised in the assess-
ment, improvement, and analysis of the reliability of participant data in longitudinal
experience sampling studies.
1.1 Longitudinal Experience Sampling
The timescale of ESM studies varies significantly, with a recent literature review
(analysing 461 papers) reporting studies ranging between 1 and 365 days [4]. The
median duration of an ESM study was found to be 14 days, while a majority of
70.9% of studies reported a duration of less than one month [4]. The one-day studies
in the sample are mostly trials to investigate the (technological) feasibility of a
given study configuration (e.g. Westerink et al. [11]). The longest study, totalling a
year, investigated long term patterns in location sharing among a large sample of
Foursquare users [12]. The typical range of ESM studies is in the duration of weeks
rather than months as researchers aim to find a “balance between study duration and
intervention frequency”[13].
Longitudinal experience sampling is relatively short-term when compared to
cross-sectional repeated surveys (also called periodic surveys or simply a survey
using a longitudinal design), typically covering months or years [14]. These survey-
type designs are often used to investigate changes in attitudes or behaviours over
extended periods of time [14], for example in consumer research [15] or within pro-
fessional organisations [16]. In addition to their usual shorter duration, there are a
number of other key differences between repeated surveys and longitudinal experi-
ence sampling: the frequency of the questionnaires, the reflective nature of surveys vs.
the in-the-moment perspective of ESM questionnaires, and the fact that ESM ques-
tionnaires are collected ‘in the wild’ aiming to cover a variety of contexts. The ESM
shares many of the same challenges encountered in other methodologies employing
human sensing [10,17], such as citizen science or situated crowdsourcing.
Recommendations for Conducting Longitudinal Experience Sampling Studies 61
1.2 Challenges
The sustained effort required of participants over an extended period of time intro-
duces a number of challenges. First, the motivation of participants is likely to decrease
over time as initial interest drops. Techniques to maintain a base level of motivation,
whether through intrinsic or extrinsic motivation, are therefore key in enabling suc-
cessful longitudinal use of the ESM. Participant motivation, or lack thereof, plays
a key role in relation to data quality and quantity, the two remaining challenges.
Second, adherence to study protocol—typically quantified as the number of ques-
tionnaires that have been answered—has been shown to decline over time due to study
fatigue [18]. Another concern is the variance in the number of responses between
participants, which could skew the analysis of ESM results—a critical type of bias
introduced by such variance is ‘selective non-responses’, in which the responses of
specific groups of the study’s sample are over- or under-represented [19]. An anal-
ysis of four recent ESM studies reveals significant differences across participants
in terms of their response rate [20]. Third, ensuring a sufficient level of response
reliability is key in collecting participant responses, and critical in generating sound
study inferences. Novel sampling techniques and filtering mechanisms can support
the increase in reliability of participant responses.
Here, we discuss these three challenges in detail and provide concrete recom-
mendations for researchers to address these challenges in their own studies (Sects.2,
3,and4). Following this, we discuss analysis techniques specific to the analysis of
longitudinal response data (Sect. 5)aswellanumberofconcreteguidelinesforthe
design and subsequent reporting of ESM studies through a ‘checklist for researchers’
(Sect. 6). Finally, we present a number of future trends in the area of longitudinal
experience sampling studies (Sect. 7)andconcludethischapter(Sect.8).
2ParticipantMotivation
Larson and Csikszentmihalyi classify the “dependence on respondents’ self-reports
as the major limitation of the ESM, while simultaneously highlighting examples that
show how these self-reports are “averyusefulsourceofdata”[2]. Regardless of
whether we consider the quantity or quality of participant responses, participant
motivation is key in ensuring a successful study outcome. Given the longitudinal
and oftentimes burdensome nature of ESM studies, a number of research streams
have explored how to increase and maintain participant motivation over time and its
subsequent effects on participant responses. Here, we distinguish between intrinsic
and extrinsic means of motivation.
62 N. van Berkel and V. Kostakos
2.1 Intrinsic Motivation
Intrinsic motivation has simply been defined as “doing something for its own
sake”[21] rather than expecting a direct or indirect compensation. It is, however,
incorrect to state that researchers can therefore not (positively) influence a partic-
ipant’s intrinsic motivation. As already stated by Larson and Csikszentmihalyi in
their original publication on the Experience Sampling Method: “Most participants
find that the procedure is rewarding in some way, and most are willing to share their
experience. However, cooperation depends on their trust and on their belief that
the research is worthwhile”[1]. Here, Larson and Csikszentmihalyi refer to what
they later classify as ‘establishing a research alliance’. This research alliance aims to
establish a vested interest of the participant in the study and the research outcome.
However, identifying how to give concrete form to such a research alliance remains
under-explored in the current ESM literature. Related methodologies such as citi-
zen science face similar challenges and have investigated how to build and sustain
engagement among participants. These results show that interest and curiosity, per-
ceived self-competence, and enjoyment in the task all contribute to an individual’s
intrinsic motivation [22,23]. Furthermore, Measham and Barnett found that fulfilling
aparticipantsinitialmotivationforparticipationincreasesthedurationofapartic-
ipant’s engagement [24]. Although direct empirical evaluations of these factors are
scarce for the ESM, given the methodological overlap we can hypothesise that these
factors have a similar positive effect on participation motivation in ESM studies. We
note that the potential side effects of increasing participants’ motivation have not yet
been sufficiently explored, and could potentially influence study results.
Recommendation 1 Provide rich feedback regarding the study goals and the partic-
ipants’ contribution to those goals. Provide information throughout the study period.
Recommendation 2 Target participant recruitment to communities with a vested
interest in the study outcomes.
2.2 Extrinsic Motivation
Extrinsic motivation, which Reiss defines as “the pursuit of an instrumental goal
[21], consists of various methods of motivation, including (financial) rewards or a
competition between participants. Although earlier work in Psychology stated that
extrinsic motivators would undermine an individual’s intrinsic motivation (cf. the
self-determination theory [25]), recent work largely refutes this claim [21].
A(nancial)compensationofparticipantsiscommonforESMstudies,with
afixedcompensationattheendofthestudyperiodbeingthemostwidelyused
(45.7%) [4]. The effect of different financial compensation structures on participant
motivation has not been extensively explored, in part due to incomplete reporting of
study details [4]. These initial reports do highlight, however, that the use of micro-
compensations (a small payment for each completed response) motivates participants
Recommendations for Conducting Longitudinal Experience Sampling Studies 63
in responding to ESM questionnaires. Although already applied by Consolvo and
Walk e r i n 2003 [26], this compensation structure has not been widely adopted in the
HCI literature [4]. Mushtag et al. compare three different micro-compensation struc-
tures but do not contrast their results with, e.g. a fixed compensation [27]. Although
the use of micro-compensation warrants further investigation, we note that this com-
pensation structure may not be applicable to all studies due to potential negative
effects on the study’s ecological validity. As highlighted by Mushtag et al., partici-
pant reactivity to micro-compensation may confound self-reports in studies focusing
on participant affect. Stone et al. warn of using excessive financial incentives, which
could attract participants solely interested in the monetary reward rather than partic-
ipating in the study [28].
Recommendation 3 Avo id exc e ssive finan c ial co m p ensa t ion and c o nsid e r t he use
of micro-compensation when applicable.
The literature on the ESM has also explored a number of extrinsic motivation
techniques besides financial compensation, with promising results. Hsieh et al. show
that providing participants with visual information on their provided self-reports
increased participant adherence by 23% over a 25day period (study with desk-
top users) [6]. The visual feedback provided by Hsieh et al. allowed participants
to explore their prior answers on questions related to interruption or mood. The
authors state that such visualisations “makes the information personally relevant
and increases the value of the study to participants”[6]. Van Berkel et al. studied
the effect of gamification (e.g. points, leaderboard) on participant responses in a
between-subject study. Their results show that participants in the gamified condi-
tion significantly increased both their response quality (quantified through crowd-
evaluation) and their number of provided responses as compared to the participants
in the non-gamified condition [7].
Recommendation 4 Include interactive feedback mechanisms in the study protocol
to keep participants engaged and motivated.
3StudyAdherence
Participant adherence to protocol, i.e. the degree to which the questionnaire notifi-
cations are opened and answered, is critical in ensuring an informative study out-
come. In Experience Sampling, study adherence is typically quantified as ‘response
rate’ or ‘compliance rate’, defined as the “number of fully completed questionnaires
divided by the number of presented questionnaires”[4]. Unsurprisingly, studies typ-
ically report a decrease in study adherence over time, see for example [2931]. As
researchers can expect a decrease in participant adherence over time, it is key to
consider the trade-offs when designing a longitudinal study. Balancing the number
of daily questionnaires, number of questionnaire items, questionnaire scheduling,
64 N. van Berkel and V. Kostakos
and duration of the study, as well as other factors such as participant compensa-
tion and availability, in accordance with the research question is key. A number of
studies have aimed to systematically study the effect of these variables, see, e.g., a
recent study by Eisele et al. on the effect of notification frequency and questionnaire
length on participant responses [32], or Van Berkel et al.’s investigation on the effect
of notification schedules [31]. We argue that any researcher should consider these
study parameters in relation to their research question and population sample. As
such, there is not one study configuration that would be applicable to every study.
Below, we outline some of the decisions that can motivate the balancing of these
variables.
3.1 Questionnaire Scheduling
The literature describes three global techniques for questionnaire scheduling: signal
contingent, interval contingent, and event contingent [33]. In a signal contingent
schedule configuration, notification arrival is randomised over the course of a given
timespan. In an interval contingent configuration, notification schedules follow a
predefined interval, for example every other hour between 08:00 and 17:00. For
event contingent configurations a predefined event is determined which triggers the
notification (typically as recognised by the questionnaire system, but can also refer
to a ‘detection’ by the participant) [4,3335]. The use of an event-based notification
system enables more advanced study designs, and allows researchers to optimise the
moment of data collection to contexts which are most relevant.
In a direct comparison between the three aforementioned scheduling techniques,
results indicate that an interval-informed event contingent schedule, in which ques-
tionnaire notifications are presented upon smartphone unlock with a maximum num-
ber per given timespan, result in fewer total notifications sent but a higher over-
all number of completed responses as compared to a signal or interval contingent
schedule [31]. Kapoor & Horvitz use contextual information to predict participant
availability and find that using such a predictive model outperforms randomised
scheduling in terms of identifying the availability of participants [36]. Church et
al. recommend researchers to adjust the questionnaire schedule to match the par-
ticipant’s schedule [26]. Rather than imposing an identical start and end time on
all participants, this approach would allow for custom start and end times, e.g. in
the case of nightshift workers. Other work has explored more active-based schedul-
ing techniques, where the presentation questionnaires are determined based on the
participant’s current contextual information. For example, Rosenthal et al. calcu-
late individualised participant interruptibility costs [37], Mehrotra et al. expand on
this through the notion of interruptibility prediction models [38], and Van Berkel et
al. show that contextual information such as phone usage can be used to schedule
questionnaires at opportune moments [39].
Regardless of the chosen scheduling approach, the timing of questionnaires can
have a significant impact on participants’ ability to respond to a questionnaire and
Recommendations for Conducting Longitudinal Experience Sampling Studies 65
therefore the respective data being collected. The aforementioned scheduling tech-
niques all have their own strengths and weaknesses. Signal contingent scheduling
(i.e. randomised) can be used to capture participants spontaneous (psychological)
states but can be skewed towards commonly occurring events. An interval contingent
configuration is useful to capture events which are expected to occur regularly and
provides a consistent sampling strategy which allows for the modelling of time as a
factor in relation to the answers provided by the participant. Due to the regular sched-
ule with which notifications are presented, it increases the risk of (over)sampling the
same event (e.g. start of a lecture). Finally, event contingent configurations are use-
ful for capturing isolated or infrequently occurring events that can be detected either
through sensor data or manually by the participant. Event-based schedules can result
in an incomplete view of the participant’s life if the event of interest only occurs in
a limit variety of contexts [40].
Recommendation 5 Carefully consider the effect of the chosen questionnaire
scheduling approach on the selection of participant responses.
3.2 Study Duration
The literature on ESM study design has recommended roughly similar maximum
durations for ESM studies, e.g. a minimum duration of one week [1], two weeks [39],
and two–four weeks [28]. Determining an appropriate study duration is a careful
consideration that involves a variety of factors such as the frequency with which the
phenomenon of interest occurs, the required effort to complete the questionnaire,
and expected levels of motivation among the participant sample.
Researchers interested in longitudinal studies of extensive duration, e.g. months
or years, will find that participants are likely unable or unwilling to repeatedly answer
asetofquestionnairesforthedurationofthestudy.Giventheextensiveparticipant
burden in ESM studies, we advise against the collection of self-reports across the
entire duration of studies of this duration. Instead, researchers should consider the
collection of manual responses for a (number of) period(s) within the duration of the
entire longitudinal study—embedding the ESM within a larger study design. As such,
researchers can combine the insights obtained through frequent ESM questionnaires
with the information gained from repeated data collection over an extensive period of
time. This approach, which has been called as ‘wave-based’ experience sampling, has
been successfully employed in emotion research in a decade-long study consisting
of three one-week sampling periods investigating the effect of age on emotion [41].
Similarly, already in 1983 Savin-Williams & Demo ran a one-week ESM study with
acohortofparticipantsenrolledinasix-yearlongitudinalstudy[42].
The use of modern mobile devices allows researchers to passively collect an exten-
sive amount of sensor data from study participants [9]. This data is collected unob-
trusively and without additional burden to the participant, and can provide additional
insights to the researcher. The unobtrusive nature of this data collection stands in stark
66 N. van Berkel and V. Kostakos
contrast to the continuous effort required from participants in human contributions
and can provide a continuous long-term data stream simply not feasible with manual
data collection. As such, we recommend that researchers interested in extensive lon-
gitudinal studies combine both continuous passive sensing with intermittent periods
of extensive questionnaire collection. Recent development work shows the possi-
bility of changing ESM questionnaire schedules throughout the study period [43],
enabling the possibility of intermittent periods of questionnaires.
From a participant perspective, being enrolled in a longitudinal study makes it
easy to forget that sensor data is being collected. We stress that, given the poten-
tial sensitive nature of the unobtrusively (naturally following participant’s informed
consent) collected sensor data, researchers should aim to remind participants of any
ongoing data collection. A practical approach for this in the context of smartphone-
based studies is the continuous display of an icon in the smartphone’s notification
bar, reminding participants of their enrolment in the study and the active data col-
lection [18]. Researchers have also allowed participants to temporarily halt data
collection, see, e.g., Lathia et al. in which participants can (indefinitely) press a
button to pause data collection for 30 min [40].
Recommendation 6 Combine longitudinal passive sensing with focused periods of
ESM questionnaires to obtain both long-term and in-depth insights.
4 Response Reliability
AcoreideabehindtheintroductionoftheESMwastoincreasethereliabilityofself-
report data by reducing the time between an event of interest and the moment when
aparticipantprovidesdataonthisevent,thusreducingrelianceonaparticipants
ability to recall past events [1]. Although this approach has been widely embraced in
anumberofdisciplines,recentworkpointsoutthatthequalityofparticipantdatain
ESM studies cannot be expected to be consistently of high reliability [18]. This is an
important concern for longitudinal studies, as response quality reliability typically
degrades over time. As such, recent work in the HCI community has explored tech-
niques to infer and improve the reliability of participant responses. Here, we discuss
the use of the crowd, quality-informed scheduling techniques, and the application of
additional validation questions to infer response quality.
4.1 Use of the Crowd
Although the ESM traditionally collects data on observations or experiences as cap-
tured by participants individually, recent work has drawn out creative ways of com-
bining the contributions of multiple individuals to increase the reliability of the
collected data.
Recommendations for Conducting Longitudinal Experience Sampling Studies 67
One strain of work has explored the use of ‘peers’ to obtain multiple datapoints
on one individual. Using this approach, which has been labelled as ‘Peer-MA’ [44],
aselectednumberoftheparticipantspeersreportwhattheybelievetobethepartic-
ipant’s current state with regard to the concept of interest. As described by Berrocal
&Wac,thisapproach“has the potential to enrich the self-assessment datasets with
peers as pervasive data providers, whose observations could help researchers iden-
tify and manage data accuracy issues in human studies”[44]. Chang et al. show how
the use of peer-based data collection can also increase the quantity of the data col-
lected [45]. By recruiting a sufficiently large (and motivated) network of participant
peers, researchers may be able to distribute the burden of questionnaire notifications
and thereby sustain data input for a more extensive period of time—increasing the
prospective of longitudinal ESM studies. A critical open question with regard to this
novel approach is the assessment and interpretation of the contributions of peers
and the potential biases introduced through, e.g., different peer-relationships and the
(absence of) peer physical presence.
In contrast to the aforementioned perspective in which the crowd contributions
are focused on individuals, others have applied the crowd to increase the reliability
of observations. For example, the aforementioned work by Van Berkel et al. not
only asked participants to contribute a label regarding a given place, but also asked
participants to judge the relevance of the contributions of others [7]. Based on these
relevance labels, the quality of participant contributions can be quantified. Another
example is the work by Solymosi et al., in which participants generated a map indi-
cating a crowd’s ‘fear of crime’ through repeated and localised experience sampling
data collection [46]. A main advantage of this approach, in which the quality assess-
ment is done by participants, is that the quality of contributions can be assessed
without the need for a priori ground truth on the presented data. From a longitudinal
study perspective, integrating crowd assessment into the study design may enable
the study population to rotate, i.e. for participants to drop out and new participants
to join, as study fatigue emerges.
Recommendation 7 Consider whether participant data can be validated or augment
through the use of the crowd.
4.2 Quality-Informed Scheduling
Literature on questionnaire scheduling has primarily focused on participant avail-
ability following from a motivation to increase participant compliance. However, as
pointed out by Mehrotra et al., an ill-timed questionnaire might lead participants to
respond to a questionnaire without paying much attention, reducing the overall reli-
ability of respondents’ data [38]. In addition to increasing the quantity of responses,
researchers have therefore also explored how the scheduling of questionnaires can
affect the quality of participant responses. In the study by Van Berkel et al., par-
ticipants completed a range of questions (working memory, recall, and arithmetic)
68 N. van Berkel and V. Kostakos
while contextual data was being passively collected [39]. Their results show that
participants were more accurate when they were not using their phone the moment
aquestionnairearrived.Optimisingthequalityofresponsesbynotcollectingdata
when participants are actively using their phone may, however, negatively effect
the quantity of answered questionnaires. Previous work shows participants are more
likely to respond to questionnaires (i.e. focused on response quantity) when question-
naires are presented upon phone unlock (as compared to randomised or interval-based
schedules) [31].
Recommendation 8 Introduce intelligent scheduling techniques to avoid interrupt-
ing participants when they do not have time to respond.
4.3 Validation Questions
Here, we discern two types of validation questions: explicitly verifiable questions
(also known as ground truth questions) and reflective questions.
In order to assess the reliability and effort of online study participants, work
on crowdsourcing has recommended the use of ‘explicitly verifiable questions’, also
known as ‘golden questions’ [47]. These explicitly verifiable questions are often—but
not always—quantitative in nature, relatively easy to answer, and the responses can
be automatically assessed to be correct or incorrect. For example, Oleson et al. asked
crowdworkers to verify whether a given URL matched with a given local business
listing [48]. Kittur et al. describe two main benefits of using these questions. First,
explicitly verifiable questions allow researchers to easily identify and subsequently
exclude from data analysis those participants who do not provide serious input.
Second, by including these questions participants are aware of the fact that their
answers will be scrutinised, which Kittur et al. hypothesise may “play a role in both
reducing invalid responses and increasing time-on-task”[47].
Although widely used in crowdsourcing, the uptake of explicitly verifiable ques-
tions in ESM studies is thus far limited. A challenging aspect for the uptake of
explicitly verifiable questions in longitudinal ESM studies is the need to provide
participants with varying question content. This would require the creation of a
question database, use of an existing and labelled dataset, or automated genera-
tion of verifiable questions (see, e.g., Oleson et al. [48]). An earlier ESM study
with 25 participants included a simple, and randomly generated, arithmetic task as
means of verification [39]. In this task, participants were asked to add two numbers
together, both numbers were randomly generated between 10 and 99 for each self-
report questionnaire. Results showed a remarkably high accuracy of 96.6%, which
could be indicative of differences in motivation and effort between online crowd-
sourcing markets and the participant population often encountered in ESM studies.
However, whether the motivation of the respective study population indeed differs
between online crowdsourcing and ESM studies requires further investigation across
multiple studies as well as evaluation across a wider variety of explicitly verifiable
questions.
Recommendations for Conducting Longitudinal Experience Sampling Studies 69
Another approach which has seen recent uptake is the creation of verifiable ques-
tions based on participant sensor data [39]. This includes, for example, passive data
collection on the participants’ smartphone usage and subsequently asking partici-
pants to answer questions on, e.g., the duration of their phone use. The answer to this
question is verifiable, is variable (changes throughout the day), and often challenging
to answer correctly. Assessing the correctness of participant answers does, however,
also raise questions. In particular, answer correctness should not be quantified as a
binary state as it is unlikely that answers are completely correct.
Recent work has also explored the use of ‘reflective questions’ in increasing
the reliability of participant contributions. In this approach, participants reflect on
earlier events while supported by earlier data points—either collected actively by
the participant or passively through, e.g., smartphone sensors. Rabbi et al. introduce
‘ReVibe’, introducing assisted recall by showing participants an overview of their
location, activity, and ambience during the past day [49]. Their results show a 5.6%
increase in the participants’ recall accuracy. Intille et al. propose an image-based
approach, in which participants take a photo or short video and use this material to
reflect on past experiences [50]. This concept was further explored by Yue et al., who
note that the images taken by participants can also provide additional information
and insights to researchers [51].
Recommendation 9 Consider including additional questions (verifiable, ground
truth, or reflective) to increase the reliability of participant answers.
5 Analysing Longitudinal ESM Data
Longitudinal research faces a unique set of challenges in the analysis of participant
data not typically encountered in short-term or lab-based studies. The longitudi-
nal nature of a study can alter a participant’s perception or understanding of the
variables of interest, and may result in an increasing inequality of the number of
responses between participants and different contexts. Here, we discuss these three
challenges—respectively known as response shift, compliance bias, and contextual
bias—as faced in the analysis of longitudinal ESM studies.
5.1 Response Shift
Response shift can either refer to an individual’s change in meaning of a given con-
struct due to re-calibration (a change in internal standards), re-prioritisation (change
in values or priorities), or re-conceptualisation (change in the definition) [52,53].
As studies often focus on the same construct(s) for the entire study period, partic-
ipants may experience a shift in their assessment of this construct. As an example
by Ring et al. illustrates: “a patient rates her pre-treatment level of pain as 7 on a
70 N. van Berkel and V. Kostakos
10-point pain scale. She subsequently rates her post-treatment level of pain as 3.
This is taken to indicate that the treatment has caused an improvement of 4 points.
However, if she retrospectively rates her pre-treatment pain as having been a 5, the
actual treatment effect is 2. Likewise, if she retrospectively rates her pre-treatment
pain as having been 10, the actual treatment effect is 7.”[54]. Similar to a change in
aparticipantsinternalstandardsofagivenconstruct,aparticipantmayalsoevaluate
various constructs as carrying higher or lower importance as compared to the onset
of the study. By asking participants to rate the relative importance of individual con-
structs prior and following the study, the degree of re-prioritisation can be assessed.
Finally, re-conceptualisation can occur when participants re-evaluate the meaning
of a concept in relation to their personal circumstances. For example, a patient may
re-conceptualise their quality of life, either following their recovery or by adjusting
their perspective when confronted with a chronic disease.
A commonly used technique to identify the occurrence of response shift among
participants is the ‘thentest’, also known as the ‘retrospective pretest-posttest design’.
At the end of the study, participants complete a posttest questionnaire immediately
followed-up with a retrospective questionnaire asking participants to think back to
their perception of a construct at the start of the study. By collecting these data points
at almost the same time, participants share the same internal standards during ques-
tionnaire completion. Therefore, the mean change between these two questionnaires
gives insight into the effect of time or treatment. For more details on the thentest, we
refer to Schwartz & Sprangers’s guidelines [55].
Recommendation 10 Include a thentest in the design of your study when participant
perception of a given construct may change over the duration of the study.
5.2 Compliance Bias
Inevitable differences between participants’ availability and motivation will result
in a difference in the number of collected responses between participants. As such,
the experience of response participants can skew the overall study results, a phe-
nomenon known as compliance bias [20]. Participants with a higher than average
response rate may have a more vested interest in responding to notifications, for
example as they are personally affected by the phenomenon being investigated. Sim-
ilarly, participants with a high or low response rate may have different psychological
characteristics or simply different smartphone usage behaviours. It is not unlikely
that these factors are a confounding factor in relation to the phenomenon being
studied—capturing responses primarily from a subset of the study population may
therefore decrease the reliability of the results. Although not widely reported, recent
work that re-analysed four independent ESM studies finds substantial differences
between study participants in the number of responses collected [20]. Researchers
can reduce compliance bias by balancing data quantity between participants during
the study through intelligent scheduling techniques—i.e. increasing the likelihood
Recommendations for Conducting Longitudinal Experience Sampling Studies 71
that questionnaires will be answered by targeting notifications to arrive at a time and
context suitable to the participant. Although this requires considerable infrastructure
implementation and researcher ought to be careful not to introduce other biases,
reducing compliance bias can increase the usefulness and reliability of a collected
dataset.
Recommendation 11 Use intelligence scheduling techniques to improve response
rates among low-respondents to balance response quantity between participants.
Recommendation 12 Analyse and report the differences between the number of
participant responses post-data collection.
5.3 Contextual Bias
The schedule through which questionnaires are presented to participants, i.e. the cho-
sen sampling technique, can significantly bias the responses of participants towards
a limited number of contexts over time. As stated by Lathia et al., “[...] time-based
triggers will skew data collection towards those contexts that occur more frequently,
while sensor-based triggers [...] generate a different view of behaviour than more a
complete sampling would provide”[40]. These concerns are amplified for longitu-
dinal studies, in which researchers typically aim to cover a wide variety of contexts
and identify longitudinal trends. If participants, however, only provide self-reports
at contexts most convenient to them (e.g. by dismissing questionnaires arriving in
the early morning or while at work), resulting data can be heavily skewed towards
alimitednumberofcontextsandthereforediminishthevalueoflongitudinaldata
collection. The risk of contextual bias can be reduced by taking into account the con-
text of completed self-reports in the scheduling of questionnaires. By considering
to context in which individual participants have already answered questionnaires,
researchers can diversity the context of collected responses.
Recommendation 13 Diversify the context of collected responses by scheduling
questionnaires in contexts underrepresented in the existing responses of a participant.
6ResearcherChecklist
In order to increase a study’s replicability and allow for a correct interpretation of
presented results, it is critical that researchers report both the methodological choices
and the outcomes of a presented study in detail. Current practice does not align with
these standards, with prior work indicating that the majority of studies do not report
on, e.g., the compensation of participants [4]. As compensation can affect participant
motivation and compliance [28], it is important to report such metrics.
72 N. van Berkel and V. Kostakos
Building on previous work [4,26,56], we present a list of study design and result
decisions which should be considered by researchers. We hope that this ‘checklist’
proves a useful starting point for researchers designing their ESM studies, as well as
an overview of the variables we consider key in the reporting of the results of ESM
studies.
Study design
1. Consider the target participant population and their potential interest in partici-
pation.
2. Determine the duration of the study, taking into account the study fatigue of
prospective participants. Extensive longitudinal studies can combine longitudi-
nal passive sensing with focused periods of self-report data collection.
3. Determine the most suitable questionnaire schedule in light of the respective
trade-offs and benefits of scheduling techniques [31,40].
4. Determine the length and frequency of questionnaire items, aiming for a short
completion time of the questionnaire [18,26].
5. Determine the timeout time for individual questionnaires, especially when sam-
pling participant responses following a predetermined event as to reduce partic-
ipant recall time.
6. Consider whether it is valuable to assess response shift in participant responses
and consider including a thentest in the study design.
7. Consider the use of verifiable, ground truth, or reflective questionnaires to assess
the quality of participant responses.
8. Consider whether it is important to achieve a balanced number of responses
between participants. If desired, implement intelligent scheduling techniques to
increase response rates among low-respondents.
9. Assess how participants can be best motivated to enrol and maintain compliance
throughout the study period.
10. Assess the possibility of using the crowd to either assess or compare the contri-
butions of participants.
Study results
1. Report both the number of participants who completed and dropped out of the
study.
2. Report the (average) duration of participant enrolment.
3. Report the number of completed, dismissed, and timed-out responses.
4. Report the overall response rate.
5. Analyse and report the difference in response rate between participants [20].
6. Analyse and report any significant differences in the context of completed
responses (e.g. time or location of completion) [40].
7. If relevant, analyse and report on the (differences in the) accuracy of participants
on ground truth questions.
8. If relevant, analyse and report on any changes in the participants’ perception of
the study’s construct, e.g. with the help of the thentest [55].
Recommendations for Conducting Longitudinal Experience Sampling Studies 73
6.1 Overview of Recommendations
Finally, we present an overview of the recommendations introduced in this chapter
in Table 1.Theincludedreferencesofferadditionalinformationonthemotivation,
methods, and guidelines with regard to the respective recommendation.
Tabl e 1 Overview of recommendations with references for further reading
No. Recommendation References
1Provide rich feedback regarding the study goals and the participants’
contribution to those goals. Provide information throughout the
study period
[1,24]
2Target participant recruitment to communities with a vested interest
in the study outcomes
[21,23]
3Avoid excessive financial compensation and consider the use of
micro-compensation when applicable
[27,28]
4 Include interactive feedback mechanisms in the study protocol to
keep participants engaged and motivated
[6]
5 Carefully consider the effect of the chosen questionnaire scheduling
approach on the selection of participant responses
[31,39,40]
6Combine longitudinal passive sensing with focused periods of ESM
questionnaires to obtain both long-term and in-depth insights
[41,42]
7 Consider whether participant data can be validated or augment
through the use of the crowd
[7,44,45]
8Introduce intelligent scheduling techniques to avoid interrupting
participants when they do not have time to respond
[3639]
9Consider including additional questions (verifiable, ground truth, or
reflective) to increase the reliability of participant answers
[4951]
10 Include a thentest in the design of your study when participant
perception of a given construct may change over the duration of the
study
[55]
11 Use intelligence scheduling techniques to improve response rates
among low-respondents to balance response quantity between
participants
[20]
12 Analyse and report the differences between the number of
participant responses post-data collection
[20]
13 Diversify the context of collected responses by scheduling
questionnaires in contexts underrepresented in the existing responses
of a participant
[40]
74 N. van Berkel and V. Kostakos
7FutureTrends
Since the introduction of the Experience Sampling Method in the late 1970s [1],
its main use has been in the application of intensive but relatively short-term data
collection (i.e. weeks rather than months). In this foundational work, Larson & Csik-
szentmihalyi describe a typical ESM study to have a duration of one week. Technolog-
ical and methodological developments have had, and continue to have, a significant
impact on how the ESM is used by researchers throughout their projects. For exam-
ple, the introduction and widespread usage of smartphones has enabled researchers to
collect rich contextual information [8,9]. Similarly, researchers have come up with
novel scheduling techniques to increase the sampling possibilities offered through
the ESM. Following the impact of these developments on how the ESM is applied,
we expect future innovations to increase the ability for researchers to apply the ESM
in a longitudinal setting.
From a technological perspective, recent work has pointed to the further integra-
tion of self-report devices in the participants’ daily life. This includes (stationary)
devices physically located in a participant’s home or work location [57], integration
of questionnaires in mobile applications already frequently used by participants (e.g.
messaging applications [58]), or through the use of (tangible) wearables [59,60].
Although the effect of these alternative questionnaire delivery techniques on (sus-
tained) response rate and input accuracy still needs to be explored in more detail, these
alternative input methods can reduce participant strain as compared to a smartphone-
based approach (retrieving phone, unlocking, opening a specific application, locking
away the phone). Future studies can also consider the collection of questionnaires
across multiple platforms, such as the use of a stationary device at home and work,
combined with a mobile device or application for on-the-go.
Methodologically, a number of under-explored avenues may prove useful in
enabling longitudinal ESM studies. In Sect. 3.2,wereferto‘wave-based’experience
sampling, in which participants actively contribute only for a number of (discon-
tinuous) periods within a larger duration consisting of passive sensing. Although
already explored in the early days of the ESM [42], this approach has thus far not
been extensively applied. Furthermore, although prior work shows the positive effect
of including extrinsic motivators [6,7], the studies were limited to weeks. Further
works is required to study the impact of these incentives in longitudinal settings.
Finally, we note that an extensive amount of work has explored ways to infer partic-
ipant availability and willingness to answer a questionnaire, both within the scope
of ESM research [38,61]aswellasthebroaderresearchonattentionandavailabil-
ity [36,6264]. Translating these findings into practical and shareable implemen-
tations which can be readily used by other researchers remains a formidable chal-
lenge. Addressing this, e.g., by releasing the source code of these implementations,
allows for experimentation with advanced scheduling techniques while simultane-
ously enabling research groups to validate, compare, and extend these scheduling
algorithms.
Recommendations for Conducting Longitudinal Experience Sampling Studies 75
Numerous open questions regarding the use of the ESM beyond a couple of weeks
(e.g. covering months of active data collection) remain. In this chapter, we outlined
both practical suggestions which are applicable to researchers today when designing
their studies, as well as offer a number of potential areas for future work in the domain
of longitudinal self-report studies.
8 Conclusion
The Experience Sampling Method has enabled researchers to collect frequent and
rich responses from study participants. Enabled by the wide uptake of mobile devices,
researchers can deliver a highly interactive and increasingly intelligent research tool
straight into the hands of participants. Our overview shows that the introduction of
smaller and more connected mobile hardware alone is not sufficient in enabling a
push towards truly longitudinal studies. In order to extend the viable duration of ESM
studies, further development of methodological practices is required. Investigating
the effect of novel hardware solutions and study design configurations, both in the
lab and in situ, will require a focused effort from the research community.
References
1. Larson R, Csikszentmihalyi M (1983) The experience sampling method. New Directions
Methodol Soc Behav Sci
2. Csikszentmihalyi M, Larson R (1987) Validity and reliability of the Experience-Sampling
method. J Nerv Ment Dis 175:526–536
3. Csikszentmihalyi M, Larson R, Prescott S (1977) The ecology of adolescent activity and
experience. J Youth Adolescence 6:281–294
4. van Berkel N, Ferreira D, Kostakos V (2017) The experience sampling method on mobile
devices. ACM Comput Surv 50(6):93:1–93:40
5. Smyth JM, Stone AA (2003) Ecological momentary assessment research in behavioral
medicine. J Happiness Stud 4:35–52
6. Hsieh G, Li I, Dey A, Forlizzi J, Hudson SE (2008) Using visualizations to increase compliance
in experience sampling. In: Proceedings of the 10th international conference on ubiquitous
computing, UbiComp ’08, (New York, NY, USA). ACM, pp 164–167
7. van Berkel N, Goncalves J, Hosio S, Kostakos V (2017) Gamification of mobile experience
sampling improves data quality and quantity. In: Proceedings of the ACM on interactive, mobile,
wearable and ubiquitous technologies (IMWUT), vol 1, no 3, pp 107:1–107:21
8. Raento M, Oulasvirta A, Eagle N (2009) Smartphones: an emerging tool for social scientists.
Sociol Methods Res 37(3):426–454
9. Ferreira D, Kostakos V, Schweizer I (2017) Human sensors on the move. Springer International
Publishing, pp 9–19
10. van Berkel N, Goncalves J, Wac K, Hosio S, Cox AL (2020) Human accuracy in mobile data
collection. Int J Hum-Comput Stud, p 102396
11. Westerink J, Ouwerkerk M, de Vries G, de Waele S, van den Eerenbeemd J, van Boven M
(2009) Emotion measurement platform for daily life situations. In: 3rd international conference
on affective computing and intelligent interaction and workshops, pp 1–6
76 N. van Berkel and V. Kostakos
12. Guha S, Wicker SB (2015) Spatial subterfuge: an experience sampling study to predict decep-
tive location disclosures. In: Proceedings of the 2015 ACM international joint conference on
pervasive and ubiquitous computing, UbiComp ’15, (New York, NY, USA). Association for
Computing Machinery, pp 1131–1135
13. Heron KE, Smyth JM (2010) Ecological momentary interventions: incorporating mobile tech-
nology into psychosocial and health behaviour treatments. Brit J Health Psychol 15(1):1–39
14. Shaughnessy JJ, Zechmeister EB, Zechmeister JS (2011) Research methods in psychology.
McGraw-Hill, New York
15. Armantier O, Topa G, Van der Klaauw W, Zafar B (2017) An overview of the survey of
consumer expectations. Econ Policy Rev 23–2:51–72
16. Stein RE, Horwitz SM, Storfer-Isser A, Heneghan A, Olson L, Hoagwood KE (2008) Do
pediatricians think they are responsible for identification and management of child mental
health problems? Results of the AAP periodic survey. Ambulatory Pediatr 8(1):11–17
17. van Berkel N, Budde M, Wijenayake S, Goncalves J (2018) Improving accuracy in mobile
human contributions: an overview. In: Adjunct proceedings of the ACM international joint
conference on pervasive and ubiquitous computing, pp 594–599
18. van Berkel N (2019) Data quality and quantity in mobile experience sampling. Phd thesis, The
University of Melbourne
19. Hektner JM, Schmidt JA, Csikszentmihalyi M (2007) Experience sampling method: measuring
the quality of everyday life. Sage
20. van Berkel N, GoncalvesJ, Hosio S, Sarsenbayeva Z, VellosoE, Kostakos V (2020) Overcoming
compliance bias in self-report studies: a cross-study analysis. Int J Hum-Comput Stud 134:1–12
21. Reiss S (2012) Intrinsic and extrinsic motivation. Teach Psychol 39(2):152–156
22. Eveleigh A, Jennett C, Blandford A, Brohan P, Cox AL (2014) Designing for dabblers and
deterring drop-outs in citizen science. In: Proceedings of the SIGCHI conference on human
factors in computing systems, CHI ’14, (New York, NY, USA). ACM, pp 2985–2994
23. Rotman D, Preece J, Hammock J, Procita K, Hansen D, Parr C, Lewis D, Jacobs D (2012)
Dynamic changes in motivation in collaborative citizen-science projects. In: Proceedings of
the ACM 2012 conference on computer supported cooperative work, CSCW ’12, (New York,
NY, USA). ACM, pp 217–226
24. Measham TG, Barnett GB (2008) Environmental volunteering: motivations, modes and out-
comes. Australian Geographer 39(4):537–552
25. Deci E, Ryan RM (1985) Intrinsic motivation and self-determination in human behavior.
Springer, Berlin
26. Consolvo S, Walker M (2003) Using the experience sampling method to evaluate ubicomp
applications. IEEE Pervas Comput 2:24–31
27. Musthag M, Raij A, Ganesan D, Kumar S, Shiffman S (2011) Exploring micro-incentive
strategies for participant compensation in high-burden studies. In: Proceedings of the 13th
international conference on ubiquitous computing, UbiComp ’11, (New York, NY, USA).
ACM, p p 435–444
28. Stone AA, Kessler RC, Haythomthwatte JA (1991) Measuring daily events and experiences:
decisions for the researcher. J Personal 59(3):575–607
29. Shih F, Liccardi I, Weitzner D (2015) Privacy tipping points in smartphones privacy preferences.
In: Proceedings of the 33rd annual ACM conference on human factors in computing systems,
CHI ’15, (New York, NY, USA). Association for Computing Machinery, pp 807–816
30. Tollmar K, Huang C (2015) Boosting mobile experience sampling with social media. In:
Proceedings of the 17th international conference on human-computer interaction with mobile
devices and services, MobileHCI ’15, (New York, NY, USA). Association for Computing
Machinery, pp 525–530
31. van Berkel N, Goncalves J, Lovén L, Ferreira D, Hosio S, Kostakos V (2019) Effect of expe-
rience sampling schedules on response rate and recall accuracy of objective self-reports. Int J
Hum-Comput Stud 125:118–128
32. Eisele G, Vachon H, Lafit G, Kuppens P, Houben M, Myin-Germeys I, Viechtbauer W (2020)
The effects of sampling frequency and questionnaire length on perceived burden, compliance,
and careless responding in experience sampling data in a student population
Recommendations for Conducting Longitudinal Experience Sampling Studies 77
33. Wheeler L, Reis HT (1991) Self-recording of everyday life events: origins, types, and uses. J
Personal 59(3):339–354
34. Barrett LF, Barrett DJ (2001) An introduction to computerized experience sampling in psy-
chology. Soc Sci Comput Rev 19(2):175–185
35. Bolger N, Davis A, Rafaeli E (2003) Diary methods: capturing life as it is lived. Ann Rev
Psychol 54(1):579–616
36. Kapoor A, Horvitz E (2008) Experience sampling for building predictive user models: a com-
parative study. In: Proceedings of the SIGCHI conference on human factors in computing
systems, CHI ’08, (New York, NY, USA). Association for Computing Machinery, pp 657–666
37. Rosenthal S, Dey AK, Veloso M (2011) Using decision-theoretic experience sampling to build
personalized mobile phone interruption models. In: Lyons K, Hightower J, Huang EM (eds)
Pervasive computing. Springer, Berlin, Heidelberg, pp 170–187
38. Mehrotra A, Vermeulen J, Pejovic V, Musolesi V (2015) Ask, but don’t interrupt: the case for
interruptibility-aware mobile experience sampling. In: Adjunct proceedings of the 2015 ACM
international joint conference on pervasive and ubiquitous computing and proceedings of the
2015 ACM international symposium on wearable computers, UbiComp/ISWC’15 Adjunct,
(New York, NY, USA). Association for Computing Machinery, pp 723–732
39. van Berkel N, Goncalves J, Koval P, Hosio S, Dingler T, Ferreira D, Kostakos V (2019) Context-
informed scheduling and analysis: improving accuracy of mobile self-reports. In: Proceedings
of ACM SIGCHI conference on human factors in computing systems, pp 51:1–51:12
40. Lathia N, Rachuri KK, Mascolo C, Rentfrow PJ (2013) Contextual dissonance: design bias in
sensor-based experience sampling methods. In: Proceedings of the 2013 ACM international
joint conference on pervasive and ubiquitous computing, UbiComp ’13, (NewYork, NY, USA).
Association for Computing Machinery, pp 183–192
41. Carstensen LL, Turan B, Scheibe S, Ram N, Ersner-Hershfield H, Samanez-Larkin GR, Brooks
KP, Nesselroade JR (2011) Emotional experience improves with age: evidence based on over
10 years of experience sampling. Psychol Aging 26(1):21–33
42. Savin-Williams RC, Demo DH (1983) Situational and transituational determinants of adoles-
cent self-feelings. J Personal Soc Psychol 44(4):824
43. Bailon C, Damas M, Pomares H, Sanabria D, Perakakis P, Goicoechea C, Banos O (2019)
Smartphone-based platform for affect monitoring through flexibly managed experience sam-
pling methods. Sensors 19(15):3430
44. Berrocal A, Wac K (2018) Peer-vasive computing: leveraging peers to enhance the accuracy
of self-reports in mobile human studies. In: Proceedings of the 2018 ACM international joint
conference and 2018 international symposium on pervasive and ubiquitous computing and
wearable computers. ACM, pp 600–605
45. Chang Y-L, Chang Y-J, Shen C-Y (2019) She is in a bad mood now: leveraging peers to increase
data quantity via a chatbot-based ESM. In: Proceedings of the 21st international conference
on human-computer interaction with mobile devices and services, MobileHCI ’19 (New York,
NY, USA). Association for Computing Machinery
46. Solymosi R, Bowers K, Fujiyama T (2015) Mapping fear of crime as a context-dependent
everyday experience that varies in space and time. Legal Criminol Psychol 20(2):193–211
47. Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Pro-
ceedings of the SIGCHI conference on human factors in computing systems, CHI ’08, (New
York, NY, USA). Association for Computing Machinery, pp 453–456
48. Oleson D, Sorokin A, Laughlin G, Hester V, Le J, Biewald L (2011) Programmatic gold:
targeted and scalable quality assurance in crowdsourcing. In: Workshops at the Twenty-Fifth
AAAI conference on artificial intelligence
49. Rabbi M, Li K, Yan HY, Hall K, Klasnja P, Murphy S (2019) Revibe: a context-assisted evening
recall approach to improve self-report adherence. In: Proceedings of the ACM Interaction
Mobile Wearable Ubiquitous Technology, vol 3
50. Intille S, Kukla C, Ma X (2002) Eliciting user preferences using image-based experience
sampling and reflection. In: CHI ’02 extended abstracts on human factors in computing systems,
CHI EA ’02, (New York, NY, USA). Association for Computing Machinery, pp 738–739
78 N. van Berkel and V. Kostakos
51. Yue Z, Litt E, Cai CJ, Stern J, Baxter KK, Guan Z, Sharma N, Zhang GG (2014) Photographing
information needs: the role of photos in experience sampling method-style research. In: Pro-
ceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, (New
York, NY, USA). Association for Computing Machinery, pp 1545–1554
52. Sprangers MA, Schwartz CE (1999) Integrating response shift into health-related quality of
life research: a theoretical model. Soc Sci Med 48(11):1507–1515
53. Schwartz CE, Sprangers MA, Carey A, Reed G (2004) Exploring response shift in longitudinal
data. Psychol Health 19(1):51–69
54. Ring L, Höfer S, Heuston F, Harris D, O’Boyle CA (2005) Response shift masks the treatment
impact on patient reported outcomes (PROs): the example of individual quality of life in
edentulous patients. Health Qual Life Outcomes 3(1):55
55. Schwartz CE, Sprangers MA (2010) Guidelines for improving the stringency of response shift
research using the thentest. Qual Life Res 19(4):455–464
56. Christensen TC, Barrett LF, Bliss-Moreau E, Lebo K, Kaschub C (2003) A practical guide to
experience-sampling procedures. J Happiness Stud 4(1):53–78
57. Paruthi G, Raj S, Gupta A, Huang C-C, Chang Y-J, Newman MW (2017) Heed: situated and
distributed interactive devices for self-reporting. In: Proceedings of the 2017 ACM interna-
tional joint conference on pervasive and ubiquitous computing and proceedings of the 2017
ACM international symposium on wearable computers, UbiComp ’17, (New York, NY, USA).
Association for Computing Machinery, pp 181–184
58. Gong Q, He X, Xie Q, Lin S, She G, Fang R, Han R, Chen Y, Xiao Y, Fu X et al (2018) LBSLAB:
a user data collection system in mobile environments. In: Proceedings of the 2018 ACM
international joint conference and 2018 international symposium on pervasive and ubiquitous
computing and wearable computers, UbiComp ’18, (New York, NY, USA). Association for
Computing Machinery, pp 624–629
59. Adams AT, Murnane EL, Adams P, Elfenbein M, Chang PF, Sannon S, Gay G, Choudhury
T (2018) Keppi: a tangible user interface for self-reporting pain. In: Proceedings of the 2018
CHI conference on human factors in computing systems, CHI ’18 (New York, NY, USA).
Association for Computing Machinery
60. Hernandez J, McDuff D, Infante C, Maes P, Quigley K, Picard R (2016) Wearable ESM:
differences in the experience sampling method across wearable devices. In: Proceedings of the
18th international conference on human-computer interaction with mobile devices and services,
MobileHCI ’16, (New York, NY, USA). Association for Computing Machinery, pp 195–205
61. Liono J, Salim FD, van Berkel N, Kostakos V, Qin AK (2019) Improving experience sam-
pling with multi-view user-driven annotation prediction. In: IEEE international conference on
pervasive computing and communications (PerCom, pp 1–11
62. Pielot M, Vradi A, Park S (2018) Dismissed! a detailed exploration of how mobile phone users
handle push notifications. In: Proceedings of the 20th international conference on human-
computer interaction with mobile devices and services, MobileHCI ’18 (New York, NY, USA).
Association for Computing Machinery
63. Visuri A, van Berkel N, Okoshi T, Goncalves J, Kostakos V (2019) Understanding smartphone
notifications’ user interactions and content importance. Int J Hum-Comput Stud 128:72–85
64. Weber D, Voit A, Auda J, Schneegass S, Henze N (2018) Snooze! investigating the user-defined
deferral of mobile notifications. In: Proceedings of the 20th international conference on human-
computer interaction with mobile devices and services, MobileHCI ’18 (New York, NY, USA).
Association for Computing Machinery
... Nowadays, the number of ESM studies that use mobile devices for collecting data has increased [39]. Purchasing the devices and handing them out to the participants can be quite cost-intensive but allows the researcher to control the software that is used. ...
... However, ethical committees may require that all participants be compensated equally, which forbids such strategies. Other parameters such as the study's length, the time slots, the daily number of notifications and their delay, or the design of weekly interviews have an impact on participants motivation and on data's quality and quantity, but strongly depends on the specificities of the communities being studied [39], [41], [42]. Thus, deciding of these parameters has to rely on a pilot experiment. ...
Conference Paper
Full-text available
The goal of this article is to report on the development of a study of the acoustic factors driving the short-term annoyance experienced by neighbors of high-speed train lines. In fact, annoyance caused by high-speed trains (> 250 km/h) is not completely modeled by current indicators (e.g., LDEN). For example, the suddenness, spectral content, temporal fluctuations of pass-by noises, and the density of train passages seem to significantly impact annoyance in a way that is not fully understood. To investigate these aspects, we first review different approaches used to study annoyance caused by transportation noise. For railway noise, two main approaches are reported in the literature: in-situ social surveys exploring the contributions of acoustic and non-acoustic factors to long-term annoyance; laboratory experiments, in which controlled stimuli (individual pass-by noises) are played back to participants who rate their annoyance. Another approach is also used in transportation noise studies, but not yet for railway noise: diary, or experience sampling methods, whereby neighbors report their annoyance, at home, at different times over a longer time span (usually several weeks) while noise exposure is simultaneously recorded. Such in situ approach allows experimenters to focus on the precise characteristics of each pass-by noise and consider annoyance in the context of participants’ real environment and activities. Thus, this article then proposes an adaptation of the experience sampling method (ESM) to study annoyance caused by the passage of high-speed trains.
... ). Our experience sampling study had therefore statistically a longitudinal design, albeit it was very short-term in the traditional sense(van Berkel & Kostakos, 2021). Genuine F I G U R E 2 Design of the 1-day course for research on visual programming. ...
Article
Full-text available
Secondary school students (N = 269) participated in a daylong visual programming course held in a stimulating environment for start‐up enterprises. The tasks were application‐oriented and partly creative. For example, a wearable device with light‐emitting diodes, (ie, LEDs) could be applied to a T‐shirt and used for optical messages. Our research questions related to the control‐value model of achievement emotions. We measured experienced enjoyment four times and examined the dependence of enjoyment on the individual tasks. Experience of enjoyment was also tested for the prediction of students' self‐efficacy for programming. The results showed that momentary enjoyment was not significantly dependent on the task situation, but it was dependent on the general enjoyment of programming. However, students with lower enjoyment scores showed higher increases in enjoyment during the final tasks than those with higher initial scores. The emotion score of the girls increased more than those of the boys but the girls' overall enjoyment scores were lower than those of the boys. Students' self‐efficacy beliefs of both genders increased over the course, and some of the differences in beliefs can be explained by the enjoyment of the course. In conclusion, our teaching approach seemed beneficial for the motivation to learn programming, particularly among girls. Practitioner notes What is already known about this topic Lower secondary students often report a lack of self‐efficacy beliefs for visual programming, especially girls whose confidence in their abilities seems to be missing. Activities that show how programming can be used in everyday life or at work promote interest and enjoyment, especially among girls. What this paper adds Experiencing enjoyment did not depend on individual task types (more structured vs. more open), but proved to be stable across all tasks. The experience of positive emotions in our computer science course had an impact on the secondary school students' self‐efficacy beliefs. Implications for practice and/or policy The combination of smart textiles and programming was viewed as a motivating learning experience with the potential to foster secondary school students' confidence and problem‐solving skills in computer science. A guided sequence of learning to debug can provide a self‐enhancing foundation for the students' own activities with following tasks that are more open and creative approaches.
... The Mobile Experience Sampling (MES) method offers a compelling alternative (for studies on media use and effects, see, e.g., Otto et al., 2020;Schnauber-Stockmann & Karnowski, 2020;Siebers et al., 2021). In MES studies, the data is collected on a mobile device (e.g., smartphone) over an extended observation period through self-reports (van Berkel & Kostakos, 2021). Unlike traditional daily diary studies, the participants receive multiple daily prompts (so-called "beeps") to fill in the questionnaire, proactively triggered by the researchers (van Berkel et al., 2017). ...
Article
Full-text available
Mobile Experience Sampling (MES) is a promising tool for understanding youth digital media use and its effects. Unfortunately, the method suffers from high levels of missing data. Depending on whether the data is randomly or non-randomly missing, it can have severe effects on the validity of findings. For this reason, we investigated predictors of non-response in an MES study on displacement effects of digital media use on adolescents’ well-being and academic performance ( N = 347). Multilevel binary logistic regression identified significant influencing factors of response odds, such as afternoon beeps and being outside. Importantly, adolescents with poorer school grades were more likely to miss beeps. Because this missingness was related to the outcome variable, modern missing data methods such as multiple imputation should be applied before analyzing the data. Understanding the reasons for non-response can be seen as the first step to preventing, minimizing, and handling missing data in MES studies, ultimately ensuring that the collected data is fully utilized to draw accurate conclusions.
Article
Full-text available
Objective: The current study aims to understand how Apple Watch helped users maintain wellness routines during the COVID-19 lockdown period, where access to public gyms and spaces was curtailed. We explore the effectiveness of biofeedback engagement aspects of Apple Watch: goals, alerts and notifications, and sociability aspects of the device or social interaction with other users. Methods: We report the results of a 2-week digital diary study based in the United States with 10 adults with 6 months or longer exposure to Apple Watch, followed by online survey responses gathered from 330 additional users. Results: The study findings show how Apple Watch transforms notifications from distractions into positive wellness tools. Data suggests that personal context (custom goals and supported intent) combined with motivational nudges from alerts and notifications as well as contextually triggered nudges contribute to Apple Watch user adoption and satisfaction. Conclusion: This study highlights how Apple Watch transforms notifications from distractions into positive wellness tools; emphasizing the importance of balancing nudging with customization with user control. Sociability and privacy remain crucial, especially with biofeedback-enabled fitness trackers. We conclude that Apple Watch enhances user engagement by triggering context-relevant interactions, nudging users to achieve their goals through small, motivated behaviors.
Article
The success of human-AI teams (HATs) requires humans to work with AI teammates in trustful ways over a certain time period. However, how trust evolves and changes dynamically in response to human-AI team interactions is generally understudied. This work explores the evolvement of trust in HATs over time by analyzing 45 participants' experiences of trust or distrust in an AI teammate prior to, during, and after collaborating with AI in a three-member HAT. Our findings highlight that humans' expectations of AI's ability, integrity, benevolence, and adaptability influence their initial trust in AI before collaboration. However, this initial trust can be maintained or revised through the development of situational trust during collaboration in response to the AI teammate's communication behaviors. Further, the trust developed through collaboration can impact individuals' subsequent expectations of AI's ability and their collaborations with AI. Our findings also reveal the similarities and differences in the temporal dimensions of trust for AI and human teammates. We contribute to CSCW community by offering one of the first empirical investigations into the dynamic and temporal dimension of trust evolvement in HATs. Our work yields insights into the pathways to expanding the methodological toolkit for investigating the development of trust in HATs, formulating theories of trust for the HAT context. These insights further inform the effective design of AI teammates and provide guidance on the timing, content, and methods for calibrating trust in future human-AI collaboration contexts.
Article
Purpose With emerging technologies rapidly changing work processes, it is important to understand the skills and characteristics project managers (PMs) need to effectively manage projects in the digital era. This study determines the underlying competencies needed for digitalization among PMs in the construction industry. The study also identified the most significant competencies needed by PMs in the era of digitalization. Design/methodology/approach The methodology adopted for the research study was quantitative. It was founded on a thorough review of pertinent literature, which went through a pilot survey study from six project management experts in the construction industry. Based on the comment and feedback, a questionnaire survey was developed and distributed to participants through a convenience sampling technique. The data retrieved were from 100 professional PMs out of 130 questionnaires distributed in the Ghanaian construction industry. Data collected were analyzed using fuzzy synthetic evaluation (FSE). Findings Based on FSE, the three competency parameters (knowledge, skills, personal characteristics) generated significant indices indicating that all three competencies are significant among construction PMs in the digitalization era. Under the knowledge competency parameter, six sub-competencies comprising a total of 12 variables were identified. For skills, seven sub-competencies consisting of 23 variables were identified. Regarding personal attributes, six sub-competencies with 17 variables were highlighted. Under knowledge, technical knowledge was found to be the most important with an index of 4.212. For skills, leadership skills were rated highest with an index of 4.240. Regarding personal attributes, social or interpersonal skills were deemed most critical with an index of 4.199. Practical implications The results provide guidance to both industry and academic stakeholders. For PMs and their employers, the study highlights priority areas for competency development and training related to the era of digitalization. It also informs educational institutions on how to structure project management curricula to best prepare students for jobs of the future. This study gives more insight into the competencies that need more attention for PMs in the digitalization era. As a result, firms that adopt the identified competency will benefit from implementing digitalization in project delivery. Originality/value This study makes an original contribution as one of the first to empirically investigate the competencies required of construction PMs in the construction industry in the digitalization era. By focusing on the developing country context of Ghana, the study extends knowledge to an under-researched region and market. It provides a foundation for future comparative research across diverse global contexts.
Article
Full-text available
Work-nonwork balance is an important aspect of workplace well-being with associations to improved physical and mental health, job performance, and quality of life. However, realizing work-nonwork balance goals is challenging due to competing demands and limited resources within organizational and interpersonal contexts. These challenges are compounded by technologies that blur the boundaries of work and nonwork in the always-on work cultures. At an individual level, such challenges can be subsided through the effective application of self-regulation techniques, such as implementation intentions and mental contrasting (IIMC). Further supporting these techniques through reflection on personal data, we implement the idea of data-driven IIMC into a self-tracking and behavior planning system and evaluate it in a three-week between-participant study with 43 information workers who used our system for improving work-nonwork balance. We find evidence that reflection on personal data improves awareness of behavior plan compliance and rescheduling, which are important in realizing work-nonwork balance goals. We also observe the value of micro-reflection, reflection on limited data of the very recent past, for IIMC. Our findings highlight opportunities for automation in data collection and sense-making and for further exploring the role of data-driven IIMC as boundary negotiating artifacts in support of work-nonwork balance goals.
Preprint
Full-text available
Introduction. Adolescent psychology is embracing intensive longitudinal methods, such as diaries and experience sampling techniques, to investigate real-life experiences. However, participants might perceive the repetitive self-reporting in these data collection techniques as burdensome and demotivating, resulting in decreased compliance rates. In this tutorial paper, we present a user-centered approach aimed at making participation in experience sampling and daily diary studies a meaningful and fun experience for adolescents. Methods. In three major research projects that took place between 2019–2023,more than 4,000 Dutch adolescents participated (12 -25 years old). To improve the participants’ user journey, adolescents were invited to co-design our studies and share their expertise in interviews(n= 459), focus groups(n= 101), design decisions(i.e., A/B tests, n= 107), pilots(n= 163), exit interviews(n= 167),and by answering user experience questionnaires(n= 2,109). Results. Across projects, we discovered five different main intrinsic and extrinsic motives to participate in intensive longitudinal studies: (1) rewards,(2) fun and interest,(3) helping science or the greater good, (4) helping the scientist or another person, and (5) gaining self-insight. We provide concrete examples of how we tailored our study designs to address these specific motives to optimize youth engagement. Conclusions. The engagement of adolescents in intensive longitudinal studies can be enhanced by making it a meaningful and enjoyable experience, aligned with their own motives.
Article
Full-text available
Given the increasing number of studies in various disciplines using experience sampling methods, it is important to examine compliance biases because related patterns of missing data could affect the validity of research findings. In the present study, a sample of 592 participants and more than 25,000 observations were used to examine whether participants responded to each specific questionnaire within an experience sampling framework. More than 400 variables from the three categories of person, behavior, and context, collected multi-methodologically via traditional surveys, experience sampling, and mobile sensing, served as predictors. When comparing different linear (logistic and elastic net regression) and non-linear (random forest) machine learning models, we found indication for compliance bias: response behavior was successfully predicted. Follow-up analyses revealed that study-related past behavior, such as previous average experience sampling questionnaire response rate, was most informative for predicting compliance, followed by physical context variables, such as being at home or at work. Based on our findings, we discuss implications for the design of experience sampling studies in applied research and future directions in methodological research addressing experience sampling methodology and missing data.
Article
Full-text available
Currently, little is known about the association between assessment intensity, burden, data quantity, and data quality in experience sampling method (ESM) studies. Researchers therefore have insufficient information to make informed decisions about the design of their ESM study. Our aim was to investigate the effects of different sampling frequencies and questionnaire lengths on burden, compliance, and careless responding. Students (n = 163) received either a 30- or 60-item questionnaire three, six, or nine times per day for 14 days. Preregistered multilevel regression analyses and analyses of variance were used to analyze the effect of design condition on momentary outcomes, changes in those outcomes over time, and retrospective outcomes. Our findings offer support for increased burden and compromised data quantity and quality with longer questionnaires, but not with increased sampling frequency. We therefore advise against the use of long ESM questionnaires, while high-sampling frequencies do not seem to be associated with negative consequences.
Preprint
Full-text available
Currently, little is known about the association between assessment intensity, burden, data quantity, and data quality in experience sampling method (ESM) studies. Researchers therefore have insufficient information to make informed decisions about the design of their ESM study. Our aim was to investigate the effects of different sampling frequencies and questionnaire lengths on burden, compliance, and careless responding.Students (n = 164) received either a 30- or 60-item questionnaire three, six, or nine times per day for 14 days. Preregistered multilevel regression analyses and ANOVAs were used to analyze the effect of design condition on momentary outcomes, changes in those outcomes over time, and retrospective outcomes.Our findings offer support for increased burden and compromised data quantity and quality with longer questionnaires, but not with increased sampling frequency. We therefore advise against the use of long ESM questionnaires, while high sampling frequencies do not seem to be associated with negative consequences.
Article
Full-text available
The collection of participant data ‘in the wild’ is widely employed by Human-Computer Interaction researchers. A variety of methods, including experience sampling, mobile crowdsourcing, and citizen science, rely on repeated participant contributions for data collection. Given this strong reliance on participant data, ensuring that the data is complete, reliable, timely, and accurate is key. Although previous work has made significant progress on ensuring that a sufficient amount of data is collected, the accuracy of human contributions has remained underexposed. In this article we argue for an emerging need for an increased focus on this aspect of human-labelled data. The articles published in this special issue demonstrate how a focus on the accuracy of the collected data has implications on all aspects of a study – ranging from study design to the analysis and reporting of results. We put forward a five-point research agenda in which we outline future opportunities in assessing and improving human accuracy in mobile data collection.
Article
Full-text available
Besides passive sensing, ecological momentary assessments (EMAs) are one of the primary methods to collect in-the-moment data in ubiquitous computing and mobile health. While EMAs have the advantage of low recall bias, a disadvantage is that they frequently interrupt the user and thus long-term adherence is generally poor. In this paper, we propose a less-disruptive self-reporting method, "assisted recall," in which in the evening individuals are asked to answer questions concerning a moment from earlier in the day assisted by contextual information such as location, physical activity, and ambient sounds collected around the moment to be recalled. Such contextual information is automatically collected from phone sensor data, so that self-reporting does not require devices other than a smartphone. We hypothesized that providing assistance based on such automatically collected contextual information would increase recall accuracy (i.e., if recall responses for a moment match the EMA responses at the same moment) as compared to no assistance, and we hypothesized that the overall completion rate of evening recalls (assisted or not) would be higher than for in-the-moment EMAs. We conducted a two-week study (N=54) where participants completed recalls and EMAs each day. We found that providing assistance via contextual information increased recall accuracy by 5.6% (p = 0.032) and the overall recall completion rate was on average 27.8% (p < 0.001) higher than that of EMAs.
Thesis
Full-text available
The widespread availability of technologically-advanced mobile devices has brought researchers the opportunity to observe human life in day-to-day circumstances. Rather than studying human behaviour through extensive surveys or in artificial laboratory situations, this research instrument allows us to systematically capture human life in naturalistic settings. Mobile devices can capture two distinct data streams. First, the data from sensors embedded within these devices can be appropriated to construct the context of study participants. Second, participants can be asked to actively and repeatedly provide data on phenomena which cannot be reliably collected using the aforementioned sensor streams. This method is known as Experience Sampling. Researchers employing this method ask participants to provide observations multiple times per day, across a range of contexts, and to reflect on current rather than past experiences. This approach brings a number of advantages over existing methods, such as the ability to observe shifts in participant experiences over time and context, and reducing reliance on the participant’s ability to accurately recall past events. As the onus of data collection lies with participants rather researchers, there is a firm reliance on the reliability of participant contributions. While previous work has focused on increasing the number of participant contributions, the quality of these contributions has remained relatively unexplored. This thesis focuses on improving the quality and quantity of participant data collected through mobile Experience Sampling. Assessing and subsequently improving the quality of participant responses is a crucial step towards increasing the reliability of this increasingly popular data collection method. Previous recommendations for researchers are based primarily on anecdotal evidence or personal experience in running Experience Sampling studies. While such insights are valuable, it is challenging to replicate these recommendations and quantify their effect. Furthermore, we evaluate the application of this method in light of recent developments in mobile devices. The opportunities and challenges introduced by smartphone-based Experience Sampling studies remain underexplored in the current literature. Such devices can be utilised to infer participants’ context and optimise questionnaire scheduling and presentation to increase data quality and quantity. By deploying our studies on these devices, we explore the opportunities of mobile sensing and interaction in the context of mobile Experience Sampling studies. Our findings illustrate the feasibility of assessing and quantifying participant accuracy through the use of peer assessment, ground truth questions, and the assessment of cognitive skills. We empirically evaluate these approaches across a variety of study goals. Furthermore, our results provide recommendations on study design, motivation and data collection practices, and appropriate analysis techniques of participant data concerning response accuracy. Researchers can use our findings to increase the reliability of their data, to collect participant responses more evenly across different contexts in order to reduce the potential for bias, and to increase the total number of collected responses. The goal of this thesis is to improve the collection of human-labelled data in ESM studies, thereby strengthening the role of smartphones as valuable scientific instruments. Our work reveals a clear opportunity in the combination of human and sensor data sensing techniques for researchers interested in studying human behaviour in situ.
Article
Full-text available
The identification of daily life events that trigger significant changes on our affective state has become a fundamental task in emotional research. To achieve it, the affective states must be assessed in real-time, along with situational information that could contextualize the affective data acquired. However, the objective monitoring of the affective states and the context is still in an early stage. Mobile technologies can help to achieve this task providing immediate and objective data of the users’ context and facilitating the assessment of their affective states. Previous works have developed mobile apps for monitoring affective states and context, but they use a fixed methodology which does not allow for making changes based on the progress of the study. This work presents a multimodal platform which leverages the potential of the smartphone sensors and the Experience Sampling Methods (ESM) to provide a continuous monitoring of the affective states and the context in an ubiquitous way. The platform integrates several elements aimed to expedite the real-time management of the ESM questionnaires. In order to show the potential of the platform, and evaluate its usability and its suitability for real-time assessment of affective states, a pilot study has been conducted. The results demonstrate an excellent usability level and a good acceptance from the users and the specialists that conducted the study, and lead to some suggestions for improving the data quality of mobile context-aware ESM-based systems.
Conference Paper
Full-text available
The Experience Sampling Method is used to capture high-quality in situ data from study participants. This method has become popular in studies involving smartphones, where it is often adapted to motivate participation through the use of gamification techniques. However, no work to date has evaluated whether gamification actually affects the quality and quantity of data collected through Experience Sampling. Our study systematically investigates the effect of gamification on the quantity and quality of experience sampling responses on smartphones. In a field study, we combine event contingent and interval contingent triggers to ask participants to describe their location. Subsequently, participants rate the quality of these entries by playing a game with a purpose. Our results indicate that participants using the gamified version of our ESM software provided significantly higher quality responses, slightly increased their response rate, and provided significantly more data on their own accord. Our findings suggest that gamifying experience sampling can improve data collection and quality in mobile settings.
Conference Paper
Full-text available
A fundamental challenge in real-time labelling of activity data is user burden. The Experience Sampling Method (ESM) is widely used to obtain such labels for sensor data. However, in an in-situ deployment, it is not feasible to expect users to precisely label the start and end time of each event or activity. For this reason, time-point based experience sampling (without an actual start and end time) is prevalent. We present a framework that applies multi-instance and semi-supervised learning techniques to perform to predict user annotations from multiple mobile sensor data streams. Our proposed framework estimates users' annotations in ESM-based studies progressively, via an interactive pipeline of co-training and active learning. We evaluate our work using data collected from an in-the-wild data collection.
Article
A popular methodology used for in situ observations is the Experience Sampling Method (ESM), in which participants intermittently answer short questionnaires. We analyse a set of recent ESM studies and find substantial differences in the number of collected responses across participants. These differences amount to ‘compliance bias’, as the experiences of responsive participants skew the results. Our work develops ways for researchers to ensure the collection of an adequate number of responses across participants. Through a cross-study analysis of ESM studies, we construct a model that describes the effect of contextual, routine, and study-specific factors on participants’ response rate. In addition to previous work, which aims to maximise the number of total responses, this work also aims to achieve a more equal distribution of responses between participants. In order to achieve this goal, we analyse which contextual cues can be personalised to achieve a higher response rate. Our results highlight a number of factors that have a strong effect on participants’ response rate and can guide the design of future experiments.
Conference Paper
The experience sampling method (ESM) is widely used for collecting in situ experiences in various domains. One known limitation, however, is its reliance on participants being receptive to ESM questionnaires at the sampled moments. At moments when participants cannot notice or respond to an ESM questionnaire, researchers cannot obtain a response. In this research, we explored the feasibility of inviting peers to provide information about participants in an ESM study. Results from a two-week experiment with a total of 27 participants and 82 peers showed that including peers' ESM responses increased ESM data quantity. Furthermore, the agreement between the peers' and the participants' responses could be maintained by asking peers' confidence. Even considering only data with high confidence could increase data quantity. Moreover, inviting peers had a positive impact on the participant's compliance to respond. These results suggest that using peer-ESM to obtain more in-situ data about participants is promising.