ArticlePDF Available

Somebody's Watching Me: Smartphone Use Tracking and Reactivity


Abstract and Figures

Like all media use, smartphone use is mostly being measured retrospectively with self-reports. This leads to misjudgments due to subjective aggregations and interpretations that are necessary for providing answers. Tracking is regarded as the most advanced, unbiased, and precise method for observing smartphone use and therefore employed as an alternative. However, it remains unclear whether people possibly alter their behavior because they know that they are being observed, which is called reactivity. In this study, we investigate first, whether smartphone and app use duration and frequency are affected by tracking; second, whether effects vary between app types; and third, how long effects persist. We developed an Android tracking app and conducted an anonymous quasi-experiment with smartphone use data from 25 people over a time span of two weeks. The app gathered not only data that were produced after, but also prior to its installation by accessing an internal log file on the device. The results showed that there was a decline in the average duration of app use sessions within the first seven days of tracking. Instant messaging and social media app use duration show similar patterns. We found no changes in the average frequency of smartphone and app use sessions per day. Overall, reactivity effects due to smartphone use tracking are rather weak, which speaks for the method's validity. We advise future researchers to employ a larger sample and control for external influencing factors so reactivity effects can be identified more reliably.
Content may be subject to copyright.
Computers in Human Behavior Reports 4 (2021) 100142
Available online 29 September 2021
2451-9588/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
Somebodys Watching Me: Smartphone Use Tracking and Reactivity
Roland Toth
, Tatiana Trifonova
Freie Universit¨
at Berlin, Germany
Max-Planck-Institute for Human Development, Germany
Like all media use, smartphone use is mostly being measured retrospectively with self-reports. This leads to
misjudgments due to subjective aggregations and interpretations that are necessary for providing answers.
Tracking is regarded as the most advanced, unbiased, and precise method for observing smartphone use and
therefore employed as an alternative. However, it remains unclear whether people possibly alter their behavior
because they know that they are being observed, which is called reactivity. In this study, we investigate rst,
whether smartphone and app use duration and frequency are affected by tracking; second, whether effects vary
between app types; and third, how long effects persist. We developed an Android tracking app and conducted an
anonymous quasi-experiment with smartphone use data from 25 people over a time span of two weeks. The app
gathered not only data that were produced after, but also prior to its installation by accessing an internal log le
on the device. The results showed that there was a decline in the average duration of app use sessions within the
rst seven days of tracking. Instant messaging and social media app use duration show similar patterns. We found
no changes in the average frequency of smartphone and app use sessions per day. Overall, reactivity effects due
to smartphone use tracking are rather weak, which speaks for the methods validity. We advise future researchers
to employ a larger sample and control for external inuencing factors so reactivity effects can be identied more
Smartphones are used by a large portion of society and have become
pervasive in everyday media use (Newzoo, 2019). The device also found
its place in media use research and introduced new methodological
challenges (e.g., Bayer, Campbell, & Ling, 2016; Harari et al., 2019;
Kaye, Orben, Ellis, Hunter, & Houghton, 2020). As with many other
subjects of interest, asking people about media use retrospectively in
questionnaires is the dominant data collection tool in the social sciences
(Grifoen, Rooij, Lichtwarck-Aschoff, & Granic, 2020; e.g.,; Guthrie,
2010). However, with technological advancements, more sophisticated
assessment methods were developed and employed.
Tracking is the technologically assisted, automatic, passive, and
precise observation of behavior while or shortly after it occurs. It can
take different forms for example, log le analysis is used for assessing
websites visited (Scharkow, 2016) and phone system logs document the
use of smartphones in general and specic apps, which is a function that
is already implemented in operating systems like Android (Harari et al.,
2019). Research consistently showed that questionnaire data on the
frequency and duration of media use differ from such tracking data to a
worrying extent, which also applies to smartphone use (for an overview,
see: Parry et al., 2021). This suggests that questionnaires do not repre-
sent media use adequately given the assumption that tracking is the
objective baseline all other assessment methods need to be checked
against. However, this is not necessarily true, as people tend to alter
their behavior when they are aware of being observed. In psychology,
this effect is known as reactivity (Gittelsohn, Shankar, West, Ram, &
Gnywali, 1997).
Smartphone tracking data may therefore also be biased because
people react to being observed in the rst place. Then again, smartphone
use is oftentimes initiated habitually, and thus, unconsciously (e.g.,
Schnauber-Stockmann & Naab, 2019), which speaks against this
conclusion. Therefore, we investigate the following question: Are
smartphone use tracking data biased due to reactivity?
To address this issue, we conducted a quasi-experimental, anony-
mous tracking study with 25 Android users for two weeks. Tracking was
performed with an Android app that was developed specically for this
study. It does not only capture recent use after installation, but also past
use ocurring prior to it. This allowed us to juxtapose and compare un-
biased use before and potentially biased use after installation.
* Corresponding author. Garystrasse 55, 14195, Berlin, Germany.
E-mail address: (R. Toth).
Contents lists available at ScienceDirect
Computers in Human Behavior Reports
journal homepage:
Received 27 June 2021; Received in revised form 15 September 2021; Accepted 23 September 2021
Computers in Human Behavior Reports 4 (2021) 100142
1. Smartphone use measures
In questionnaires, media use is mostly assessed with measures of
frequency and duration of use. Frequency is operationalized in terms of
subjective assessments, for example, never to all the time (Marty-Dugas,
Ralph, Oakman, & Smilek, 2018), days per week (Lopez-Fernandez,
o, K¨
ainen, Grifths, & Kuss, 2018) or as an absolute fre-
quency within a specied time frame (van Berkel et al., 2019). Duration
is operationalized with regard to xed time frames and a frame of
reference, for example, less than 30 min to more than 3 h per day (Chang
et al., 2018) or, similarly to frequency, in minutes per day or week
(Lemola, Perkinson-Gloor, Brand, Dewald-Kaufmann, & Grob, 2014).
However, there is a major problem with this approach. Questionnaire
data on media use lack absolute ground truth (van Berkel et al., 2019) as
people have to aggregate lots of information from a possibly long period
of time and countless use episodes in order to answer such questions.
This goes along with multiple cognitive issues. Respondents need to 1)
understand the question properly, 2) recall respective behavior, 3) infere
and estimate the frequency or duration of use, 4) allocate their estima-
tion within the scale of the question, and might 5) ultimately still answer
in a biased way due to social desirability (Schwarz & Oyserman, 2001).
As a result, answers are often quite different from the information that is
actually of interest and results in under- or overestimations (e.g., Naab,
Karnowski, & Schlütz, 2018; Valkenburg & Peter, 2013, p. 200). This is
especially problematic when assessing smartphone use due to the typi-
cally high frequency and short duration of use episodes. With techno-
logical advancements, more objective, passive observation techniques
were developed.
Passive observation is regarded as the most valid method for
assessing media use (Vandewater & Lee, 2009, p. 9), as it enables the
collection of behavioral data without the need of subjective assessment.
Tracking can be considered its modern implementation that does not
require researchers or obvious observation tools like cameras to be
present. Study participants are only made aware of the observation
setting in the beginning of the study, but not during data collection.
Methodologically, it is therefore a blend of an undisguised naturalistic
observation, where the participants are made aware of the researcher
presence and monitoring of their behaviorand a disguised naturalistic
observationwhere researchers make their observations as unobtru-
sively as possible so that participants are not aware that they are being
studied(Price, Jhangiani, Chiang, Leighton, & Cuttler, 2017, p. 121).
As opposed to advanced survey methods like time diaries (Thulin &
Vilhelmson, 2007) or the Experience Sampling Method (ESM) (Csiks-
zentmihalyi, 2014), tracking is supposed to be unobtrusive and inde-
pendent of peoples own perception and interpretation and can
therefore be considered the best option for assessing quantitative met-
rics of media use, namely duration and frequency (e.g., Boase & Ling,
2013; Scharkow, 2016).
Phone use tracking was not only possible since the smartphones
release. For example, network providers always kept track of customers
incoming and outgoing calls and messages and such data were used in
research before (e.g., Cohen & Lemish, 2003). With the advent of
smartphone technology, however, researchers soon recognized the de-
vices potential not only for interpersonal and mass communication per
se, but also from a methodological perspective, as it enabled the auto-
mated collection of more rened use data (Raento, Oulasvirta, & Eagle,
2009). Since the iPhone was introduced in 2007 and marked the
inception of the smartphone as we know it today (Jackson, 2018),
tracking was employed in many studies for collecting precise data on
smartphone use with regard to total use and the use of specic appli-
cations. These data were then used to check relationships with other
concepts of interest, such as personality traits, college course perfor-
mance, and social connectedness (e.g., Andrews, Ellis, Shaw, & Piwek,
2015; Harari et al., 2019; Rosen et al., 2018; Walker, Koh, Wollersheim,
& Liamputtong, 2015). Among others, it was shown that smartphone use
is characterized by frequent and short use episodes because the device is
basically permanently active and in use (e.g., Klimmt, Hefner, Reinecke,
Rieger, & Vorderer, 2019; Rosen et al., 2018).
While smartphone use tracking is used in studies for various purposes
and applied reasonably, it is rarely questioned or evaluated regarding its
validity. One of the very few studies that explicitly investigated meth-
odological implications of methods that take advantage of mobile de-
vices dealt with ESM. ESM is an advanced survey method that focuses on
in-situ measurement of situational and emotional contexts while or
shortly after they occur, minimizing the risk of retrospection biases
(Csikszentmihalyi, 2014). In a comparison between survey and ESM
data on time use, Sonnenberg, Riediger, Wrzus, and Wagner (2011)
rightfully noted that interpretations of the differences between both
methods implicitly assume the superiority of ESM, although in principle
it might well be possible that ESM data are actually more error-prone
than survey data. However, as the authors argue, it is hard to falsify
this possibility as there is a lack of empirical evidence on the perfor-
mance of ESM(p. 24). Analogous to this issue, we would hereby like to
provide such evidence for the tracking method. While many studies have
shown that questionnaire data on the frequency and duration of media
use differ from respective tracking data, much less attention was
devoted to the more general problem of the validity of tracking data
themselves namely, whether even tracking data are valid representa-
tions of the concepts they are supposed to measure. In other words, the
question is not whether tracking is the most accurate method for
time-related use measures in theory (which it should be, due to the lack
of participants active involvement) but whether the method produces
biased data to begin with.
2. Reactivity
Reactivity can constitute a potential threat to the validity of research
results. It usually occurs when actors change their behavior due to the
presence of an observer(Gittelsohn et al., 1997, p. 182).
Several forms of reactivity can be distinguished. Each highlights a
particular aspect of behavior change (Barnes, 2010). The most promi-
nent form is social desirability, which especially applies to sensitive
topics assessed in questionnaires or interviews. It emerges when par-
ticipants intentionally demonstrate (allegedly) positive behavior and
conceal behavior that they perceive as socially inappropriate (e.g., Fang,
Wen, & Prybutok, 2014; Jensen & Hurley, 2005; Krumpal, 2013).
Another form is the Hawthorne Effect, which originally suggested that
factory workersproductivity increased when observed, regardless of
manipulation or experimental condition (Adair, 1984; Barnes, 2010;
Lied & Kazandjian, 1998). However, in numerous studies, the Haw-
thorne Effect is used to refer to any change in participantsbehavior and
therefore as an equivalent to the term reactivity (Barnes, 2010;
McCambridge, Wilson, Attia, Weaver, & Kypri, 2019).
Lots of research on reactivity dealt with subjects such as medical
personnel professional behavior (Eckmanns, Bessert, Behnke, Gastme-
ier, & ; Leonard & Masatu, 2006; Mangione-Smith, Elliott, McDonald,
& McGlynn, 2002), patientsperformance (Berthelot, Nizard, & Mau-
gars, 2019; Bouchet, Guillemin, & Briançon, 1996; Feil, Grauer,
Gadbury-Amyot, Kula, & McCunniff, 2002; McCambridge et al., 2019),
academic performance (Adair, 1984; Cook, 1962; Haddad, Nation, &
Williams, 1975), and voting behavior (Gerber, Green, & Larimer, 2008;
Granberg & Holmberg, 1992). An improvement of peoples behavior due
to observation was also shown concerning indoor air pollution (Barnes,
2010) and electricity consumption (Schwartz, Fischhoff, Krishnamurti,
& Sowell, 2013).
There is only little research on reactivity concerning media use. For
instance, it was shown that both children and adults alter their television
viewing behavior in presence of parents or researchers, respectively
(Christakis & Zimmerman, 2009; Otten, Littenberg, & Harvey-Berino,
2010). Also, former Nielsen panelists were used as research subjects in
order to rule out reactivity due to the panelistsadjustment to being
observed (Taneja & Viswanathan, 2014).
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
As mentioned before, the smartphone became a crucial and ubiqui-
tous device for digital communication in the course of few years. Also,
tracking of smartphone use emerged as an accessible and rich data
source that is strongly intertwined with peoples everyday lives and
should therefore provides access to behavioral data that are not biased
by subjective assessment. At rst glance, it is tempting to trust in the
quality of such data. However, it remains unclear whether the tracking
itself leads to a disruption of usual use patterns and collected data
therefore do not reect usual use.
Smartphones offer two layers of use the use of the device itself and
the use of apps, as representations of gratications and possibilities
offered by the device (Schnauber-Stockmann & Naab, 2019; Turkle,
2008). We therefore ask the following research question:
RQ 1: Does smartphone use tracking lead to reactivity concerning
smartphone and app use?
Some media uses are more sensitive than others. For example, it was
shown that social desirability affects self-reports about the use of
pornography (Valkenburg & Peter, 2013, p. 200). Chances are that uses
of certain app types (e.g., social media, dating apps, gaming, enter-
tainment) might be more affected by reactivity than others due to social
desirability. We are not aware of existing research that investigated
perceived social desirability, intimacy or privacy concerns for specic
app types. However, it was shown that smartphone users perceive per-
missions granted for accessing the phone features multimedia storage,
SMS, camera, microphone, and GPS sensor as particularly sensitive
(Furini, Mirri, Montangero, & Prandi, 2020). While many apps nowa-
days require access to these features, they are most prominently
requested by instant messaging (e.g., WhatsApp), social media (e.g.,
Facebook), and gaming (e.g., Pok´
emon Go) apps.
We do not yet know whether these app types are associated with
more reactivity while being tracked due to the sensible permissions, or
even less reactivity due to the desensitization regarding privacy invasion
experienced by users of these app types anyways. Considering that
investigating app types was shown to be more feasible than investigating
specic apps (David, Roberts, & Christenson, 2018, p. 271), we there-
fore pose the following research question:
RQ 2: Is reactivity regarding the use of instant messaging, social
media, and gaming apps different to overall reactivity?
Lastly, research suggests that reactivity effects decrease with time
due to participantshabituation to the setting (Cousens, Kanki, Toure,
Diallo, & Curtis, 1996; Harris, 1982; Wu, 2013). Some studies demon-
strate that the change in participant behavior occurs on the rst day of
observation and fades away over the course of some days (Gittelsohn
et al., 1997; Leonard & Masatu, 2006; Schmitz, Stanat, Sang, & Tasche,
1996). Therefore, reactivity effects are more likely at the beginning of
the observation period than later (McCambridge, Witton, & Elbourne,
2014) which does not appear to vary with observational duration and
frequency (Harris, 1982). However, there is also evidence for reactivity
effects with no signs of habituation whatsoever (Harris, 1982; Kypri,
Langley, Saunders, & Cashell-Smith, 2007; Murray, Swan, Kiryluk, &
Clarke, 1988).
The smartphone is a highly versatile device with many and short use
episodes which may quickly distract people from the observation
setting. Therefore, it is likely that potential smartphone use tracking
reactivity effects do not last long. How long the time frame of reactivity
is, though, is up for debate. If the app types mentioned in RQ 2 are
affected by reactivity, we will investigate the persistence of these effects,
too. This leads to our last research question:
RQ 3: How long do reactivity effects on smartphone and app use
Duration and frequency were shown to measure different elements of
smartphone use quantity (Andrews et al., 2015; Wilcockson, Ellis, &
Shaw, 2018). For this reason, we investigate all research questions with
regard to both duration and frequency.
3. Method
In this study, we used phone system logs for data collection. Still, we
use the term tracking for better readability and as a reference to passive
observation methods in general.
For answering our research questions, we developed an Android app,
A Tricky Tracker (ATT), that collected data that were produced before as
well as after its installation. This way, we could compare them to each
other and derive insights on behavioral alterations caused by tracking.
Due to the complex structure of the data involved, our overall aim was to
exclude data only when absolutely necessary and separately for each
individual analysis, so we could leverage all available data the best way
possible. We report all data exclusions, all manipulations, and all
3.1. Procedure
ATT accessed a log le implemented within the Android operating
system which stores all actions occurring on the device, so-called events.
Events are further categorized by event types, which are listed in the
ofcial Android documentation (Google, 2019c). See Table 1 for an
overview of event types relevant for our analysis.
We describe the use of
each of these event types for specic measures in the section Measures.
After installation, ATT regularly synchronized event data from said
log le and produced one data set each that contained all events that
were captured since the last synchronization. Additionally, it actively
and regularly registered whether the device was locked, unlocked, shut
down, or booted, with a respective time stamp, as this information was
not logged in the form of events on devices running Android 8 and
lower. In these versions, it is therefore impossible to identify precisely
when the device was locked, unlocked, shut down, or booted in the past.
Devices running Android 9, however, logged locking and unlocking in-
stances without the need of additional implementation (Google, 2019c).
For this reason, we had to limit ourselves to data from Android 9 devices.
All resulting data were saved in the format .json. For this study, ATT
was set up in a way that all data were regularly synchronized with a
virtual Linux server. We used pseudonymized, unique identiers for
each individual device so we could tell them apart in the analysis while
at the same time preserving participantsanonymity.
However, the benet of ATT did not only lie in capturing current use
during tracking. For this study, it was of crucial importance to assess
data that were denitely not biased due to observation and that were
comparable to data produced after participation began. The Android log
le typically contains data from up to two weeks in the past. Therefore,
Table 1
Event types used for data preparation.
Event type Description
1 Activity resumed
2 Activity paused
17 Keyguard shown
18 Keyguard hidden
26 Device shutdown
27 Device booted
Keyguard corresponds to the phone lock screen (Google, 2020b). In Android
versions below 9, activity resumed corresponds to activity moved to the foreground
and activity paused to activity moved to the background.
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
right after participants installed ATT and accepted all policies, ATT
accessed event data that were produced earlier. Participants were
informed about this feature before and during installation, but given the
opportunity to cancel the installation and participation anytime. How-
ever, as they neither knew of the study itself nor ATTs features before
participation, these data can be considered truly free from any kind of
methodological bias. As such, they represent the baseline for regular
behavior in this study and allow for holding subsequent behavior (after
the installation of ATT) against it. While retrospective smartphone use
data were used in research before (e.g., David et al., 2018) they were not
yet used for investigating reactivity to tracking by juxtaposing them
with subsequent, real-timetracking data.
The main view of the app contained a timer that counted down from
14 days to let participants know when the study would end. In a menu,
participants could view the researcherscontact information and review
the data privacy statement.
We used ATT to conduct an anonymous quasi-experiment with 25
participants, which is a similar sample size used in previous smartphone
use tracking studies (e.g., Caine, 2016; van Berkel et al., 2019). Data
collection took place between December 12, 2019 and January 11,
3.2. Participants
We were interested in a rather unspecic issue that potentially af-
fects any smartphone user and did therefore not impose any restrictions
concerning participantssocial characteristics. We recruited a conve-
nience sample through different channels, including the SoSci Panel
(SoSci Panel, 2020), survey websites like, advertise-
ments at the local university, and the researchersprivate (social) net-
works. Due to nancial limitations, we could not provide incentives for
participation. We would like to note that recruitment for a study that
involves tracking methods is aggravated by a high inhibition threshold
and effort necessary for the installation of an app and agreeing to being
observed for weeks, even when offered an incentive (Andrews et al.,
We set up a dedicated website containing important information
about the study, measures of data protection, and a registration form.
People who registered automatically received an Email containing a
link to an anonymous questionnaire with questions concerning de-
mographic features for sample description, a link to the installation le
of ATT (in .apk format), and a detailed installation guide. Unfortu-
nately, many more people participated in the survey than ultimately in
the tracking procedure. It was not possible for us to link tracking and
questionnaire data because this would have made individual identi-
cation possible and therefore not complied with anonymity. For this
reason, we could not identify which of the completed questionnaires
actually belonged to persons who participated in the tracking. The
questionnaire results indicated that people interested in participation
were 48% female, with a mean age of 36 years (SD =44.26). During
installation, participants were again informed about the study pro-
cedure and required to accept a data privacy statement. After instal-
lation, participants were not notied or interrupted by ATT at all
during the period of data collection, eliminating a potential additional
source of reactivity (van Ballegooijen et al., 2016). After two weeks of
data collection, participants received a notication from the app
asking them to uninstall it.
3.3. Measures
3.3.1. Smartphone and app use session
A use session indicated the time span between the rst and the last
events of a consistent use episode. With regard to smartphone use, event
types 18 (keyguard hidden) and 27 (device booted) were considered rst
events, and event types 17 (keyguard shown) and 26 (device shutdown)
With regard to app use, event type 1 (activity resumed) was
considered rst events, and event types 2 (activity paused), 17 (key-
guard shown), and 26 (device shutdown) last. As many apps feature
multiple activities (Google, 2019a), any consistent sequence of activities
performed within a single app without interruption was considered part
of the same app use session.
3.3.2. Duration
We represented the duration of use sessions by calculating the dif-
ference between the time stamps of the start and the end of a use session.
We calculated duration in seconds, minutes, and hours for different
3.3.3. Frequency
The frequency of use sessions was the number of use sessions per
device occurring within a specied time period.
3.3.4. Time frame
The time frame indicated whether a use session took place before (0)
or after (1) the installation of ATT. To ensure readability, we call the
former pre-installation and the latter post-installation smartphone/app use
from here on out. We assigned each smartphone and app use session a
time frame by checking whether the time stamp of the start of the session
was smaller or larger than the time stamp of the installation.
3.3.5. Day
We assigned each use session an integer that represented the day it
took place relative to the installation date of ATT (e.g., -8 for the eighth
day before installation; 5 for the fth day after installation).
3.3.6. App type
We automatically assigned all apps a type according to the Google
Play Store, which was done similarly for iPhone use data before (David
et al., 2018). To achieve this, we used the Python library
Google-Play-Scraper (JoMingyu, 2020). Apps that could not be catego-
rized were assigned the type Other.
3.4. Data preparation
We programmed a parser in Python that extracted all relevant data
from the .json les and then transformed and merged them into a single
data frame. Each row of this data frame represented a single event on
one device (e.g., moving a specic app to the foreground). Variables
included the Android version of the device, the event type and time
stamp of the event, and the package name (Google, 2019b) of the app
performing it.
As data collection took place at the end of the calendar year, some
data were generated during or between the Christmas holidays and New
Years Eve (December 2426, December 31 - January 1). It is likely that
mobile communication behavior is different during these time spans
(Vandewater & Lee, 2009, p. 10). People may use their smartphones
more than usual for communicating with family and friends then
again, they might use them less than usual so they can enjoy some
quality time with their peers in person. To be on the safe side and to
account for the possibility that smartphone use might be affected the day
before Christmas, too, we marked all smartphone use sessions that took
place between and including December 23 and January 1. As such, we
were able to investigate and account for possible noise in the data during
We then iteratively aggregated and transformed the data frame such
that each row represented one use session. Our approach was already
applied similarly by Harari et al. (2019). Two data frames were created
this way the rst containing smartphone use sessions, the second con-
taining app use sessions, as dened in the section Measures.
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
3.4.1. Smartphone use
Data from four participants were excluded from the smartphone use
data set. One did not provide pre-installation data at all. One accounted
for most exceedingly long, seemingly uninterrupted smartphone use
sessions (up to 12 h). Two participated only for (part of) the rst day
after having installed ATT.
Finally, we excluded all data from the rst and last days provided by
each participant in order to omit incomplete data for these days, espe-
cially with regard to use frequency.
3.4.2. App use
Regarding app use, we assumed that we could use data from all
Android versions, as moving an app to the foreground or background is
always captured without additional implementation (Google, 2019c).
For good measure, we additionally considered the data on screen ac-
tivity and shutdowns/boots captured by ATT itself.
Following this approach, even data generated after the installation of
ATT contained (seemingly) uninterrupted app use sessions that lasted
extremely long (e.g., 12 h). Further investigation showed that in-
terruptions of app uses through screen locks and shutdowns were
probably not always captured properly. Andrews et al. (2015) faced
similar problems. For this reason, we created another version of this data
set that only included data from Android 9 devices. Hence, we could use
existing event types that indicate locks, unlocks, shutdowns and boots
without accessing the checks implemented in ATT for both pre- and
post-installation use.
Even then, there were very long, consistent app uses without in-
terruptions (up to 9 h). Those were probably still instances where in-
ternal Android mechanisms failed to register screen locks. In the end, we
settled for a cutoff value of 5 h, which Andrews et al. (2015) considered
very long use. Below that, there were still long use sessions however,
they took place in apps where long, uninterrupted use sessions are
reasonable, e.g., Pok´
emon GO, YouTube, or Twitch.
We excluded events regarding some system-related apps and func-
tions as recommended in previous research (Jones, Ferreira, Hosio,
Goncalves, & Kostakos, 2015). We then applied the same trans-
formations and data exclusions already applied to smartphone use, save
the device that accounted for most exceedingly long smartphone use
sessions. Finally, we merged both into a single data frame for analysis
and tagged them with a dedicated variable for distinction.
3.5. Data analysis
RQ 1 deals with the question whether smartphone and app use are
affected by participation in tracking. We investigated this question for
duration and frequency.
We analyzed smartphone and app use duration on the basis of single
use sessions. However, in our data, these were not independent from one
another as each was associated with a specic device. This does not meet
the assumptions of regression modeling (Field, Miles, & Field, 2012, p.
957). Multilevel modeling was neither applicable, as a higher number of
top-level units (around 30) is recommended for this analysis method
(Hox & McNeish, 2020). Also, while we expected individual use differ-
ences between devices, we were not interested in explaining them in this
study. For this reason, we accounted for all variance due to the differ-
ences between devices by adding one dummy variable each to multiple
regression, resulting in a xed-effects model. We did the same regarding
app types for analyses of app use duration, as each app use session is not
only tied to an individual, but also an app type. As it turned out, 10.34%
of the data were generated during the time span declared as the holiday
season. In order not to discard valuable data, we decided to control for
the holidays in analyses of duration instead of omitting them.
As a second step, we investigated possible reactivity concerning the
frequency of smartphone and app use. Frequency depends on a time
frame of reference, which is why we could not investigate it on the level
of individual use sessions the same way we did with duration. Instead,
we aggregated the average number of smartphone and app use sessions
per day for pre- and post-installation use separately for each participant.
Due to the non-normal distribution of the difference between pre- and
post-installation use frequencies and the low number of observations/
days within the aggregated data set, we performed the non-parametric
Wilcoxon signed-rank test for paired samples, which has less strict as-
sumptions than a corresponding t-test (Field et al., 2012, p. 957). Due to
the necessary aggregations, we excluded all holiday data from analyses
of use frequency and we could not control for individuals and app types.
This led to yet another problem: Some participants generated data that
were assigned to both holidays and non-holidays on the same day, as
days were operationalized in relation to the exact installation instance in
this study, not actual calender days. Therefore, similarly to the exclu-
sions of the rst and last days of data collection per participant, we
excluded all corresponding data for frequency analyses as to avoid
biases due to partial data removals per day. For consistency, we also
applied these same exclusions to duration data for visualizations and
descriptive analyses, but not for regression analysis.
RQ 2 asks whether effects found in instant messaging, social media
and gaming apps were different to the general ndings of RQ 1. Due to
the low sample size, we rst checked whether these types were among
the app types used the longest and the most so that sufcient occurrence
was given (see section Results). We employed a similar approach as for
RQ 1, but only considered app use data. For minimizing the problem of
multiplicity, we calculated false discovery rates (FDR).
RQ 3 is concerned with the persistence of potential reactivity effects.
Considering the sample size, we decided to apply the same tests from RQ
1 to two-day-intervals of post-installation use and compare each of them
to overall pre-installation use to identify possible patterns of effect
changes. We chose two-day-intervals as a compromise between statis-
tical power gained through a higher number of observations and the
granularity necessary for identifying changes. Again, we controlled for
multiplicity by calculating FDR.
After the exclusion of data from Android versions other than 9 and
data cleansing, data from 12 devices were left, which happens to be a
common sample size in studies on Computer-Human Interaction (CHI)
(Caine, 2016). In total, the data comprised 14,330 smartphone use
sessions and 43,053 app use sessions over a time span of up to 23 days.
This time frame was shown to be more than sufcient for capturing both
typical weekly usage and short, habitual checking behaviors (Wilcock-
son et al., 2018).
For analyses, coding, and typesetting, we used R [Version 4.0.2; R
Core Team (2020)] and the R-packages ggplot2 [Version 3.3.2; Wickham
(2016)], papaja [Version; Aust and Barth (2020)], and tidy-
verse [Version 1.3.0; Wickham et al. (2019)]. All data, analysis code and
a visualization of individual participantssmartphone and app use over
time can be found in the online supplementary material (OSM).
4. Results
On average, participants used their smartphones for 3.47 h (SD =
2.33) and 57.28 times (SD =40.75) per day, excluding the holiday
season. The distribution of session duration was strongly right-skewed
(γphone =10.35, γapps =16.68) as a great majority of use sessions was
fairly short (Mdnphone =46.04 s, Mdnapps =11.64 s). Therefore, we used
median values for visualizing central tendencies with regard to average
use session duration per day (see Fig. 1). For all analyses, we applied log
transformation to duration, which resulted in a distribution much more
similar to a normal distribution. This is a technique that was used in
research with similar data before (e.g., van Berkel et al., 2019).
See Fig. 2 for a visualization of smartphone and app use frequencies
per day. On Days 11, 12 and 13, only a single participant provided data
(excluding data from the holiday season). For this reason, we excluded
these days from all analyses of frequency.
Day 5 of app use strongly deviated from all other days concerning
duration as well as frequency. Further investigation showed that this is
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
due to a single participant who registered both the shortest and, at the
same time, the majority of all app use sessions that day by far (Med =
0.03 s, n =2094). This was not the case for that same participants
smartphone use. While their average app use duration that day only
deviated from the mean by 1.45 standard deviations, their average app
use frequency deviated by 3.05 standard deviations. For this reason, we
decided to exclude this participants app use data from Day 5.
See Fig. 3 for an overview of the ve app types used the longest and
most frequently. Consistent with previous research (David et al., 2018),
gaming and watching videos took the longest and instant messaging and
social media were accessed most frequently on average.
4.1. RQ 1
RQ 1 deals with the reactivity of participants due to tracking with
regard to smartphone and app use duration and frequency.
Controlling for differences between individuals and the holidays,
results showed that post-installation smartphone use session duration
was signicantly higher than pre-installation smartphone use session
duration, b=0.09, 95% CI [0.03, 0.14], t(14316) = 3.12, p=.002. The
effect was rather weak and corresponded to an increase of about 9%
(note that duration was log-transformed for analyses, hence the trans-
formation and interpretation of the coefcient as percent change). Post-
installation app use session duration, while additionally controlling for
app type, was signicantly lower than pre-installation app use session
duration, b= − 0.04, 95% CI [ − 0.08, 0.00], t(43009) = − 2.22, p=
.027. The effect corresponded to a decrease of 4%.
Post-installation smartphone use frequency (Mdn =54.12) did not
signicantly differ from pre-installation smartphone use frequency
(Mdn =56.46), r = − 0.23, p =.266. Neither did post-installation app use
frequency (Mdn =119.60) differ from pre-installation app use frequency
(Mdn =120.20), r = − 0.17, p =.376.
4.2. RQ 2
RQ 2 deals with reactivity effects regarding instant messaging, social
media, and gaming apps. We conducted separate analyses for session
duration and frequency, respectively.
Results show that the duration of instant messaging app use
increased signicantly after the installation of ATT, b=0.07, 95% CI
[0.01, 0.12], t(13753) = 2.23, p=.026, q =0.038, which corresponds
to an increase of 17%. The effect was positive as opposed to the overall
effect on app use duration. In contrast, social media app use duration
decreased signicantly, b= − 0.18, 95% CI [ − 0.26, 0.11],
Fig. 1. Median session duration per day relative to the installation date. The
dotted line represents the installation instance of ATT. Error bars represent
median absolute deviation.
Fig. 2. Mean number of sessions per day relative to the installation date. The
dotted line represents the installation instance of ATT. Error bars represent
bootstrapped standard errors of the mean.
Fig. 3. Top ve app types by median duration and mean frequency of use per
device. Only apps that were used by at least half of participants were included,
so as to increase external validity and decrease outlier inuence.
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
t(9980) = − 5.12, p< .001, q <0.001, which corresponds to a decrease
of 24%. The effect was also negative, but signicantly larger than the
overall effect on app use duration. We did not nd a signicant effect of
tracking on gaming app use duration, b=0.13, 95% CI [ − 0.07, 0.33],
t(1570) = 1.30, p=.194, q =0.194. None of the three app types were
affected by tracking with regard to frequency.
4.3. RQ 3
Finally, RQ 3 asks how long potential reactivity effects persist. See
Table 2 for the results concerning smartphone and app use duration,
Table 3 for smartphone and app use frequency, and Table 4 for instant
messaging and social media app use duration.
Most notably, app use duration rst decreased before increasing
again after seven days. Smartphone use duration constantly increased
after three days, although only two effects were signicant. Regarding
smartphone and app use frequency, we found consistently negative ef-
fects, none of which were signicant. Instant messaging app use dura-
tion rst decreased but then increased after ve days. Social media app
use duration rst decreased but then returned to usual levels after nine
5. Discussion
In this study, we investigated whether tracking peoples smartphone
use leads to changes in the quantitative metrics, namely duration and
frequency, of that use. Further, we were curious whether changes in use,
if present, affect specic app types in different ways. Lastly, we checked
how long potential changes persist. We developed an Android app that
could access smartphone use data from before and after its installation.
We then conducted a quasi-experiment by tracking 25 peoples smart-
phone use for up to 14 days.
We found that the duration of smartphone use sessions was slightly
higher after ATT was installed. The duration of app use, however, was
lower. Reasons for these rather small and contradictory overall effects
can be seen in the analysis of effect persistence and reactivity effects
regarding instant messaging and social media apps.
During the rst two days, smartphone use duration was not affected
by tracking at all. During some of the following days, it was increased
and during others, it was not affected. In sum, this resulted in the small,
positive overall effect we found. App use duration rst decreased during
the rst two days, then increased as compared to pre-installation use
after about seven days. The initial decrease outweighed the later in-
crease, which is why the overall effect we found was negative.
It is likely that the decrease in app use duration shortly after the
installation of ATT was caused by the tracking, which is consistent with
previous research on reactivity (Gittelsohn et al., 1997; e.g.,; Harris,
1982; Wu, 2013). If it had made no difference, there would not have
been a reason for any detectable decrease whatsoever during (and only
during) the rst few days and app use duration would either have stayed
on the same level or already increased shortly after tracking started, just
like smartphone use did. One might argue that day-to-day variations of
use between individuals or the specic day of the week the installation
took place introduced variance that led to effects caused by chance
alone. However, considering that a dozen people provided data for the
analysis and introduced variance, that installation dates varied, and that
we controlled for differences between individuals and considered
q-values, we argue that inter-individual changes should hardly be the
cause of the effects we found.
The fact that the immediate decrease in app use duration was fol-
lowed by a consistent and signicant increase (up to 35%) with low FDR
seven days later supports our assumption. It is, however, unlikely that
this increase was associated with the tracking. Although we controlled
for the inuence of the holiday season, this extraordinary time frame
probably still affected data that were produced longer than one day
before it began.
Table 2
Regression results for time frame predicting (log) duration of post-installation
smartphone and app use sessions.
Days n b 95% CI t(8315)p q
Smartphone use
12 8329 0.03 [ − 0.12, 0.06] 0.61 .542 .542
34 8170 0.16 [0.06, 0.25]3.20 .001 .008
56 8150 0.05 [ − 0.05, 0.15]1.00 .319 .479
78 8140 0.14 [0.04, 0.23]2.78 .005 .016
910 7855 0.05 [ − 0.08, 0.17]0.74 .460 .542
1112 7822 0.13 [0.00, 0.25]1.99 .047 .094
App use
12 24437 0.17 [ − 0.23, 0.11] 5.81 <.001 .000
34 24348 0.09 [ − 0.15, 0.03] 3.12 .002 .002
56 23535 0.30 [ − 0.37, 0.24] 8.78 <.001 .000
78 23443 0.12 [0.05, 0.18]3.49 <.001 .001
910 22589 0.30 [0.21, 0.39]6.62 <.001 .000
1112 22895 0.28 [0.20, 0.37]6.44 <.001 .000
Note. Days represents the post-installation days considered for the analysis. Q-
values represent p-values after FDR correction by use subset (smartphone or
Table 3
Wilcoxon signed-rank test results for average use session frequency per day
between pre- and post-installation use of smartphone and app use.
Days n V 95% CI r p q
Smartphone use
12 12 39 [ − 12.62, 25.1]0.00 1.000 1.000
34 12 46 [ − 8.9, 40.57] 0.10 .622 .778
56 11 43 [ − 6.6, 12.51] 0.17 .413 .778
78 10 35 [ − 6.28, 11.85] 0.15 .492 .778
910 6 19 [ − 8.5, 83.7] 0.48 .094 .469
App use
12 13 58 [ − 23.83, 41.3] 0.16 .414 .518
34 13 53 [ − 43.4, 47.75] 0.09 .635 .635
56 12 55 [ − 21.35, 28.42] 0.24 .233 .518
78 11 44 [ − 17, 61.03] 0.19 .365 .518
910 6 18 [ − 15.17, 218.56] 0.41 .156 .518
Note. Days represents the post-installation days considered for the analysis. Q-
values represent p-values after FDR correction by use unit (smartphone or app).
Table 4
Regression results for time frame predicting (log) duration of post-installation
instant messaging and social media app use sessions.
Days n b 95% CI t(7996)p q
Instant Messaging
12 8011 0.19 [ − 0.28, 0.09] 3.81 <.001 .000
34 7854 0.01 [ − 0.11, 0.09] 0.15 .880 .880
56 7613 0.16 [0.05, 0.26]2.83 .005 .007
78 7657 0.20 [0.09, 0.31]3.55 <.001 .001
910 7448 0.20 [0.06, 0.35]2.70 .007 .008
1112 7682 0.27 [0.15, 0.40]4.25 <.001 .000
Social Media
12 5319 0.27 [ − 0.37, 0.16] 5.04 <.001 .000
34 5460 0.21 [ − 0.31, 0.11] 4.05 <.001 .000
56 4923 0.29 [ − 0.43, 0.16] 4.18 <.001 .000
78 5054 0.17 [ − 0.30, 0.05] 2.74 .006 .009
910 4794 0.01 [ − 0.16, 0.14] 0.09 .925 .925
1112 4944 0.02 [ − 0.17, 0.20]0.19 .852 .925
Note. Days represents the post-installation days considered for the analysis. Q-
values represent p-values after FDR correction.
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
Smartphone and app use frequency were not affected by ATT. The
loss of statistical power due to the aggregation necessary for frequency
analysis possibly concealed further signicant effects. The question re-
mains whether effects on smartphone and app use frequency, if they
exists in the population, are consistent or change directions with time
like we found regarding app use duration. Naab and Schnauber (2016)
argue that habitual media use is usually initiated, but not necessarily
performed, unconsciously. Therefore, post-installation app use was
possibly still initiated unconsciously and therefore just as frequently as
pre-installation app use due to habituation. The conscious performance
of app use, however, was stopped sooner than usual as participants knew
that their use was being tracked, resulting in shorter app use duration on
average. On another note, every smartphone use session ultimately
consists of one or multiple app uses. Considering that post-installation
app use duration decreased for some days, app use frequency would
need to increase during the same time frame for smartphone use dura-
tion to stay the same or even increase. This would indicate that people
compensate for lower app use duration with higher frequency when they
assume that their social reputation or privacy with regard to the device
are in danger. These assumptions should be addressed in future research
and might yield interesting results with regard to privacy behavior and
use patterns.
One reason why tracking had more distinguishable effects on app
than smartphone use duration is the number of app use sessions
observed, which is roughly triple the number of smartphone use sessions
and results in higher statistical power. We also argue that the impor-
tance of smartphones in everyday communication attenuates reactivity
effects regarding the duration of their overall use and therefore shows in
the use of specic apps and app types rather than overall device use. We
investigated two of these app types. While instant messaging app use
duration was higher after the installation of ATT, social media use
duration was lower. Analysis of effect persistence showed that the same
patterns already seen in overall app use duration applied to both app
types. Instant messaging app use duration rst decreased, but then
increased after about ve days. Social media app use duration rst
decreased, but returned to regular levels after about nine days. The
reason for this might again be the holiday season: Conversing with
family members and (close) friends is not only an important part of
preparing for the upcoming holidays for many people, but also a popular
use of instant messengers like WhatsApp (Church & De Oliveira, 2013).
Meanwhile, people also use social media like Facebook to interact with
people they do not regularly see [and] chat with old acquaintances
(Whiting & Williams, 2013). As such, people might just use instant
messaging more than social media apps in the context of an upcoming
holiday that is centered around family and close friends, which is why
instant messaging app use duration suddenly increased as time passed.
The effects on social media apps, though, may therefore represent the
essence of reactivity in this study. This also explains why we found
tracking to affect app use duration negatively overall, but not smart-
phone use duration. As we could control for app types in the analysis of
app use duration, such differences between types due to the holidays
were canceled out and the results emphasize that the initial decrease in
app use duration was indeed caused by reactivity. Earlier research
showed that more common behavior seems to cause less reactivity while
less common behavior causes more (Cousens et al., 1996). Since we did
not analyze rarely used app types, we possibly overlooked other inter-
esting effects. It is also important to note that app types assigned in the
Google Play Store do not provide sufcient distinction between app
types. For example, there arguably is an overlap between the types video
and games and entertainment.
To sum it up, the results indicate that on average, the duration of app
use decreases due to tracking for about six days. Smartphone use is not
affected by tracking. One reason might be social desirability. People
probably assume that using their device for longer periods of time
without interruptions might be frowned upon. This especially holds for
heavy users, as research showed that symptoms of smartphone addiction
are positively associated with tendencies to fulll social desirability
(Herrero, Urue˜
na, Torres, & Hidalgo, 2019, p. 86). Hence, they might
end their use sessions slightly earlier than they usually would because
the installation of ATT rendered the issue particularly salient for them.
The effects found could also be a symptom of a lack of trust. Possibly,
participants did not truly believe that participation was anonymous and
no contents or private information were transmitted, resulting in less
app use duration. Generally, the effects we found that could be attrib-
uted to the tracking were rather weak, which speaks for low reactivity
and high convergent validity of smartphone use tracking data. This may
be associated with the absence of interaction (Cousens et al., 1996) and
minimum communication between researchers and participants
(Schwartz et al., 2013; Taneja & Viswanathan, 2014) in our study. Since
we did not perform observation in person and participants were not
notied at all during the whole process after ATT was installed, reac-
tivity could be expected to be low. This speaks for using smartphone use
tracking under these circumstances. Expected gain is also considered a
factor that contributes to the facilitation of reactivity effects (Barnes,
2010). No incentives were provided for participation in our study, which
possibly further decreased reactivity.
5.1. Limitations and future research
The greatest constraint of this study is low sample size. This is due to
the difculty of recruitment for a study that involves passive observa-
tion. Our sample size is comparable to sample sizes in other CHI studies,
which shows that many researchers in this eld seem to struggle with
this problem. Previous research showed that nancial incentives are
associated with higher willingness to participate in passive mobile
tracking studies (Keusch, Struminskaya, Antoun, Couper, & Kreuter,
2019). Future research should therefore consider following that sug-
gestion, keeping in mind possible negative consequences mentioned
before. Due to the nancial restrictions, we were dependent on people to
participate out of sheer interest in the subject matter. This may not have
mitigated reactivity effects too much, but it certainly impeded their
identication due to low statistical power. While we found reactivity
effects, they were only partially statistically signicant and showed high
FDR rates, where applicable. Especially regarding the investigation of
different app types, a larger sample size would allow for better estima-
tion of average use as app types are not always represented sufciently.
Another reason for our low sample size might be the requirement to
install the app manually using a le that had to be downloaded, which
possibly prevented less tech-savvy people from participating. This shows
in the fact that about 60 people originally lled out the form on the
project website and received the instruction email. Offering ATT on the
Google Play Store might have reduced the problem. However, we
avoided this approach in order to prevent possible participation in the
project without visiting the project website for information or misusing
the app for fun. The best compromise might be offering it on the Google
Play Store as a closed test (Google, 2020c) or tying participation to a
password sent out via Email. Also, it should be noted that the Google
Play Store is subject to strict regulations concerning the sensitive data
that are involved in tracking (Google, 2020d).
Future data collections of this kind should not immediately precede
or even overlap with a holiday season. Even though we found effects that
can reasonably be considered reactivity effects, it was technically
impossible to tell apart the inuence of the holiday season from the
inuence of tracking after some days.
Further, the lack of distinction between app types according to the
Google Play Store may have slightly biased some results with regard to
app types. We advise future research to either categorize apps manually
based on existing research, or double-check the assignments Google Play
Store provides.
Another caveat is the detection of locks, unlocks and shutdowns. As
we showed, one cannot reliably assess recent smartphone and app use
below Android version 9, even when actively tracking locks and unlocks
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
in the tracking app let alone pre-installation use where this is not
possible at all. Therefore, it is most feasible to restrict participation to
devices running Android versions 9 or higher. Also, it seems like Android
10 introduced new opportunities and caveats for tracking apps to come
(Google, 2020a).
Finally, we would like to offer a suggestion for future research
employing smartphone use tracking methods. When interested in
descriptive accounts of smartphone use quantities, researchers may
resort to simply using pre-installation use data right away to get rid of
reactivity completely. But when combining tracking with other methods
like ESM to assess additional information on psychological or situational
variables, or content analysis (e.g., De Vreese et al., 2017), tracking
recent use is indispensable. Our ndings therefore encourage the use of
tracking data that were produced at least seven days after installation,
just to be on the safe side. Of course, this number is not unrelated to our
study setting, effectively involving only 12 people and the holiday sea-
son. If anything, decreased app use duration due to tracking was over-
shadowed by increased duration due to the upcoming holiday season.
Accordingly, we advise researchers to wait even longer than seven days
whenever possible.
5.2. Conclusion
In recent years, the smartphone has become a crucial means of
communication and one of the most dominant ways to access digital
content on the Internet. As such, investigating the validity of its mea-
sures is of utmost importance for communication science and many
more elds of research. Smartphone use tracking allows for passive
observation of use behavior in a way that has not been possible for any
other medium before. The greatest advantages lie in the assessment of
past use, as all events are being documented by the operating system at
all times, and the ability to run tracking apps in the background. This
yields the opportunity to juxtapose past and recent use behavior in order
to investigate possible reactivity to the tracking, which has always been
a problem in research using observation methods.
We found that the average duration of app use sessions decreases for
some days due to the participation in tracking. The same applies to
instant messaging and social media apps. Future research is needed to
investigate effects of tracking on the average duration of smartphone use
sessions, other app types, and both smartphone and app use session
We hope this study motivates other researchers to give thought to the
subject of reactivity concerning smartphone use tracking and conduct
further investigations of possible implications for research on media use,
privacy behavior, and research methods.
Declaration of competing interest
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.
Author note
RT designed and conducted the study, the analysis, and wrote the
manuscript. TT was the lead developer of the app. We would like to
thank Prof. Daniela Schlütz from SoSci Panel for distributing our study
among panel participants, Polina Guseva for literature research and
inspiring discussions, and Harald Papp and Dennis Ricci for setting up
and administering our server and database.
Adair, J. G. (1984). The Hawthorne effect: A reconsideration of the methodological
artifact. Journal of Applied Psychology, 69(2), 334345.
Andrews, S., Ellis, D. A., Shaw, H., & Piwek, L. (2015). Beyond self-report: Tools to
compare estimated and real-world smartphone use. PloS One, 10(10), Article
e0139004. Retrieved from.
Aust, F., & Barth, M. (2020). papaja: Create APA manuscripts with R Markdown. Retrieved
van Ballegooijen, W., Ruwaard, J., Karyotaki, E., Ebert, D. D., Smit, J. H., & Riper, H.
(2016). Reactivity to smartphone-based ecological momentary assessment of
depressive symptoms (MoodMonitor): Protocol of a randomised controlled trial.
BMC Psychiatry, 16(1), 49.
Barnes, B. R. (2010). The Hawthorne Effect in community trials in developing countries.
International Journal of Social Research Methodology, 13(4), 357370.
Bayer, J. B., Campbell, S. W., & Ling, R. (2016). Connection cues: Activating the norms
and habits of social connectedness. Communication Theory, 26(2), 128149. https://
van Berkel, N., Goncalves, J., Lov´
en, L., Ferreira, D., Hosio, S., & Kostakos, V. (2019).
Effect of experience sampling schedules on response rate and recall accuracy of
objective self-reports. International Journal of Human-Computer Studies, 125
(December), 118128.
Berthelot, J. M., Nizard, J., & Maugars, Y. (2019). The negative Hawthorne effect:
Explaining pain overexpression. Joint Bone Spine, 86(4), 445449.
Boase, J., & Ling, R. (2013). Measuring mobile phone use: Self-report versus log data.
Journal of Computer-Mediated Communication, 18(4), 508519.
Bouchet, C., Guillemin, F., & Briançon, S. (1996). Nonspecic effects in longitudinal
studies: Impact on quality of life measures. Journal of Clinical Epidemiology, 49(1),
Caine, K. (2016). Local standards for sample size at CHI. Conference on Human Factors in
Computing Systems - Proceedings, 981992.
Chang, F. C., Chiu, C. H., Chen, P. H., Miao, N. F., Chiang, J. T., & Chuang, H. Y. (2018).
Computer/mobile device screen time of children and their eye care behavior: The
roles of risk perception and parenting. Cyberpsychology, Behavior, and Social
Networking, 21(3), 179186.
Christakis, D. A., & Zimmerman, F. J. (2009). Young children and media: Limitations of
current knowledge and future directions for research. American Behavioral Scientist,
52(8), 11771185.
Church, K., & De Oliveira, R. (2013). Whats up with WhatsApp? Comparing mobile
instant messaging behaviors with traditional SMS. In MobileHCI 2013 - proceedings of
the 15th international conference on human-computer interaction with mobile devices and
services (pp. 352361).
Cohen, A. A., & Lemish, D. (2003). Real time and recall measures of mobile phone use:
Some methodological concerns and empirical applications. New Media & Society, 5
(2), 167183.
Cook, D. L. (1962). The Hawthorne Effect in educational research. Phi Delta Kappan, 44
(5), 116122. Retrieved from
Cousens, S., Kanki, B., Toure, S., Diallo, I., & Curtis, V. (1996). Reactivity and
repeatability of hygiene behaviour: Structured observations from Burkina Faso.
Social Science & Medicine, 43(9), 12991308.
Csikszentmihalyi, M. (2014). Flow and the foundations of positive psychology: The
collected works of Mihaly Csikszentmihalyi. Flow and the Foundations of Positive
Psychology: The Collected Works of Mihaly Csikszentmihalyi, 1298.
David, M. E., Roberts, J. A., & Christenson, B. (2018). Too much of a good thing:
Investigating the association between actual smartphone use and individual well-
being. International Journal of Human-Computer Interaction, 34(3), 265275. https://
De Vreese, C. H., Boukes, M., Schuck, A., Vliegenthart, R., Bos, L., & Lelkes, Y. (2017).
LLinking survey and media content data: Opportunities, considerations, and pitfall.
Communication Methods and Measures, 11(4), 221244.
Eckmanns, T., Bessert, J., Behnke, M., Gastmeier, P., & Rüden, H. (2006). Compliance
with antiseptic hand rub use in I intensive care units: The Hawthorne Effect. Infection
Control & Hospital Epidemiology, 27(9), 931934.
Fang, J., Wen, C., & Prybutok, V. (2014). An assessment of equivalence between paper
and social media surveys: The role of social desirability and satiscing. Computers in
Human Behavior, 30, 335343.
Feil, P. H., Grauer, J. S., Gadbury-Amyot, C. C., Kula, K., & McCunniff, M. D. (2002).
Intentional use of the Hawthorne effect to improve oral hygiene compliance in
orthodontic patients. Journal of Dental Education, 66(10), 11291135. https://doi.
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Los Angeles; London;
New Delhi: Sage. Retrieved from
Furini, M., Mirri, S., Montangero, M., & Prandi, C. (2020). Privacy perception when using
smartphone applications. Mobile Networks and Applications, 25(3), 10551061.
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
Gerber, A. S., Green, D. P., & Larimer, C. W. (2008). Social pressure and voter turnout:
Evidence from a large-scale eld experiment. American Political Science Review, 102
(1), 3348.
Gittelsohn, J., Shankar, A. V., West, K. P., Ram, R. M., & Gnywali, T. (1997). Estimating
reactivity in direct observation studies of health behaviors. Human Organization, 56
(2), 182189.
Google. (2019a). Introduction to activities. Retrieved from
Google. (2019b). Set the application ID. Retrieved from
Google. (2019c). UsageEvents.Event. Retrieved from
Google. (2020a). Behavior changes: All apps. Retrieved from
Google. (2020b). KeyguardManager. Retrieved from
Google. (2020c). Set up an open, closed or internal test. Retrieved from https://support.
Google. (2020d). User data. Retrieved from
Granberg, D., & Holmberg, S. (1992). The Hawthorne effect in election studies: The
impact of survey participation on voting. British Journal of Political Science, 22(2),
Grifoen, N., Rooij, M. van, Lichtwarck-Aschoff, A., & Granic, I. (2020). Toward
improved methods in social media research. Technology, Mind, and Behavior, 1(1),
Guthrie, G. (2010). Basic research methods : An entry to social science research. New Delhi,
India: Sage Publications Pvt. Ltd.
Haddad, N. F., Nation, J. R., & Williams, J. D. (1975). Programmed student achievement:
A Hawthorne effect? Research in Higher Education, 3(4), 315322.
Harari, G. M., Müller, S. R., Stachl, C., Wang, R., Wang, W., Bühner, M., et al. (2019).
Sensing sociability: Individual differences in young adultsconversation, calling,
texting, and app use behaviors in daily life. Journal of Personality and Social
Harris, F. C. (1982). Subject reactivity in direct observational assessment: A review and
critical analysis. Clinical Psychology Review, 2(4), 523538. https://doi.
Herrero, J., Urue˜
na, A., Torres, A., & Hidalgo, A. (2019). Smartphone addiction:
Psychosocial correlates, risky attitudes, and smartphone harm. Journal of Risk
Research, 22(1), 8192.
Hox, J., & McNeish, D. (2020). Small samples in multilevel modeling. Small Sample Size
Solutions, (February), 215225.
Jackson, K. (2018). A brief history of the smartphone. Retrieved from https://sciencenode.
Jensen, J. D., & Hurley, R. J. (2005). Third-person effects and the environment: Social
distance, social desirability, and presumed behavior. Journal of Communication, 55
(2), 242256.
JoMingyu. (2020). Google-play-scraper. Retrieved from
Jones, S. L., Ferreira, D., Hosio, S., Goncalves, J., & Kostakos, V. (2015). Revisitation
analysis of smartphone app use. In UbiComp 2015 - proceedings of the 2015 ACM
international joint conference on pervasive and ubiquitous computing (pp. 11971208).
Kaye, L. K., Orben, A., Ellis, D. A., Hunter, S. C., & Houghton, S. (2020). The conceptual
and methodological mayhem of screen time. International Journal of Environmental
Research and Public Health, 17(10).
Keusch, F., Struminskaya, B., Antoun, C., Couper, M. P., & Kreuter, F. (2019). Willingness
to participate in passive mobile data collection. Public Opinion Quarterly, 83,
Klimmt, C., Hefner, D., Reinecke, L., Rieger, D., & Vorderer, P. (2019). The permanently
online and permanently connected mind. Permanently Online, Permanently Connected,
Krumpal, I. (2013). Determinants of social desirability bias in sensitive surveys: A
literature review. Quality and Quantity, 47(4), 20252047.
Kypri, K., Langley, J. D., Saunders, J. B., & Cashell-Smith, M. L. (2007). Assessment may
conceal therapeutic benet: Findings from a randomized controlled trial for
hazardous drinking. Addiction, 102(1), 6270.
Lemola, S., Perkinson-Gloor, N., Brand, S., Dewald-Kaufmann, J. F., & Grob, A. (2014).
Adolescentselectronic media use at night, sleep disturbance, and depressive
symptoms in the smartphone age. Journal of Youth and Adolescence, 44(2), 405418.
Leonard, K., & Masatu, M. C. (2006). Outpatient process quality evaluation and the
Hawthorne Effect. Social Science & Medicine, 63(9), 23302340.
Lied, T. R., & Kazandjian, V. A. (1998). A Hawthorne strategy: Implications for
performance measurement and improvement. Clinical Performance in Quality
Healthcare, 6, 201204.
Lopez-Fernandez, O., M¨
o, N., K¨
ainen, M., Grifths, M. D., & Kuss, D. J. (2018).
Mobile gaming and problematic smartphone use: A comparative study between
Belgium and Finland. Journal of Behavioral Addictions, 7(1), 8899.
Mangione-Smith, R., Elliott, M. N., McDonald, L., & McGlynn, E. A. (2002). An
observational study of antibiotic prescribing behavior and the Hawthorne effect.
Health Services Research, 37(6), 16031623.
Marty-Dugas, J., Ralph, B. C. W., Oakman, J. M., & Smilek, D. (2018). The relation
between smartphone use and everyday inattention. Psychology of Consciousness:
Theory Research, and Practice, 5(1), 4662.
McCambridge, J., Wilson, A., Attia, J., Weaver, N., & Kypri, K. (2019). Randomized trial
seeking to induce the Hawthorne effect found no evidence for any effect on self-
reported alcohol consumption online. Journal of Clinical Epidemiology, 108, 102109.
McCambridge, J., Witton, J., & Elbourne, D. R. (2014). Systematic review of the
Hawthorne effect: New concepts are needed to study research participation effects.
Journal of Clinical Epidemiology, 67(3), 267277.
Murray, M., Swan, A. V., Kiryluk, S., & Clarke, G. C. (1988). The Hawthorne effect in the
measurement of adolescent smoking. Journal of Epidemiology & Community Health, 42
(3), 304306.
Naab, T. K., Karnowski, V., & Schlütz, D. (2018). Reporting mobile social media use: How
survey and experience sampling measures differ. Communication Methods and
Measures, 13(2), 126147.
Naab, T. K., & Schnauber, A. (2016). Habitual initiation of media use and a response-
frequency measure for its examination. Media Psychology, 19(1), 126155. https://
Newzoo. (2019). Global mobile market report. Newzoo. Retrieved from https://newzoo.
Otten, J. J., Littenberg, B., & Harvey-Berino, J. R. (2010). Relationship between self-
report and an objective measure of television-viewing time in adults. Obesity, 18(6),
Parry, D. A., Davidson, B. I., Sewall, C. J. R., Fisher, J. T., Mieczkowski, H., &
Quintana, D. S. (2021). A systematic review and meta-analysis of discrepancies
between logged and self-reported digital media use. Nature Human Behaviour.
Price, P. C., Jhangiani, R. S., Chiang, I.-C. A., Leighton, D. C., & Cuttler, C. (2017).
Research methods in psychology. Retrieved from
R Core Team. (2020). R: A language and environment for statistical computing. Vienna,
Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-pro
Raento, M., Oulasvirta, A., & Eagle, N. (2009). Smartphones: An emerging tool for social
scientists. Sociological Methods & Research, 37(3), 426454.
Rosen, L. D., Mark Carrier, L., Pedroza, J. A., Elias, S., OBrien, K. M., Lozano, J., et al.
(2018). The role of executive functioning and technological anxiety (FOMO) in
college course performance as mediated by technology usage and multitasking
habits. Psicologia Educativa, 24(1), 1425.
Scharkow, M. (2016). The accuracy of self-reported internet usea validation study
using client log data. Communication Methods and Measures, 10(1), 1327. https://
Schmitz, B., Stanat, P., Sang, F., & Tasche, K. (1996). Reactive effects of a survey on the
television viewing behavior of a telemetric television audience panel: A combined
time-series and control-group analysis. Evaluation Review, 20(2), 204229. htt
Schnauber-Stockmann, A., & Naab, T. K. (2019). The process of forming a mobile media
habit: Results of a longitudinal study in a real-world setting. Media Psychology, 22(5),
Schwartz, D., Fischhoff, B., Krishnamurti, T., & Sowell, F. (2013). The Hawthorne effect
and energy awareness. Proceedings of the National Academy of Sciences of the United
States of America, 110(38), 1524215246.
Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior: Cognition,
communication, and questionnaire construction. American Journal of Evaluation, 22
(2), 127160.
Sonnenberg, B., Riediger, M., Wrzus, C., & Wagner, G. G. (2011). Measuring time use in
surveys. SOEPpapers on Multidisciplinary Panel Data Research, (390), 130. https
SoSci Panel. (2020). Willkommen beim SoSci panel. Retrieved from https://www.sosc
Taneja, H., & Viswanathan, V. (2014). Still glued to the box? Television viewing
explained in a multi-platform age integrating individual and situational predictors.
International Journal of Communication, 8, 21342159.
Thulin, E., & Vilhelmson, B. (2007). Mobiles everywhere: Youth, the mobile phone, and
changes in everyday practice. Young, 15(3), 235253.
Turkle, S. (2008). Always-On/Always-On-You: The tethered self. In J. E. Katz (Ed.),
Handbook of mobile communication studies (pp. 121138).
Valkenburg, P. M., & Peter, J. (2013). Five challenges for the future of media-effects
research. International Journal of Communication, 7(1), 197215.
Vandewater, E. A., & Lee, S.-J. (2009). Measuring childrens media use in the digital age:
Issues and challenges. American Behavioral Scientist, 52(8), 11521176. https://doi.
R. Toth and T. Trifonova
Computers in Human Behavior Reports 4 (2021) 100142
Walker, R., Koh, L., Wollersheim, D., & Liamputtong, P. (2015). Social connectedness and
mobile phone use among refugee women in Australia. Health and Social Care in the
Community, 23(3), 325336.
Whiting, A., & Williams, D. (2013). Why people use social media: A uses and
gratications approach. Qualitative Market Research: An International Journal, 16(4),
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New
York. Retrieved from
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., et al.
(2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.
Wilcockson, T. D. W., Ellis, D. A., & Shaw, H. (2018). Determining typical smartphone
usage: What data do we need? Cyberpsychology, Behavior, and Social Networking, 21
(6), 395398.
Wu, L. (2013). Social network effects on productivity and job security: Evidence from the
adoption of a social networking tool. Information Systems Research, 24(1), 3051.
R. Toth and T. Trifonova
... And even those who participate in digital trace data collection might alter their behavior as a result of knowing that they are being observed, a phenomenon known in observational studies as reactivity[1] (Webb et al., 1999). While such artificial change in behavior has the potential to bias results, little research exists on whether reactivity transfers from offline studies with human observers to the online realm where digital trackers are used (for an exception see Toth and Trifonova, 2021). ...
... In our main analysis, we ran the models on the first seven days of data collection in line with earlier research by Toth and Trifonova (2021). As a sensitivity analysis, we reran all the models on the first 31 days to see if longer-term trends are present (see Tables B.II through B.XI in Appendix B). ...
... In line with earlier research (Toth and Trifonova, 2021), we found that the effect of reactivity in digital trace data collection wears off within a week after the installation of the tracker, indicating that individuals who at first reduce sensitive online behavior as a reaction to being observed return back to what could be interpreted as their usual behavior within a few days. One explanation for the rather short-lived effect of reactivity could be that individuals just forget about the fact that they are being observed after a couple of days. ...
Purpose Digital trace data provide new opportunities to study how individuals act and interact with others online. One advantage of this type of data is that it measures behavior in a less obtrusive way than surveys, potentially reducing measurement error. However, it is well documented that in observational studies, participants' awareness of being observed can change their behavior, especially when the behavior is considered sensitive. Very little is known regarding this effect in the online realm. Against this background, we studied whether people change their online behavior because digital trace data are being collected. Design/methodology/approach We analyzed data from a sample of 1,959 members of a German online panel who had consented to the collection of digital trace data about their online browsing and/or mobile app usage. To identify reactivity, we studied change over time in five types of sensitive online behavior. Findings We found that the frequency and duration with which individuals engage in sensitive behaviors online gradually increases during the first couple of days after the installation of a tracker, mainly individuals who extensively engage in sensitive behavior show this pattern of increase after installation and this change in behavior is limited to certain types of sensitive online behavior. Originality/value There is an increased interest in the use of digital trace data in the social sciences and our study is one of the first methodological contributions measuring reactivity in digital trace data measurement.
... This can especially affect sensitive behaviours, with participants behaving in a more socially desirable way when observed. Although no experimental research has been conducted yet, preliminary evidence using quasi-experimental data suggests that individuals might not change their behaviour when observed (see Toth & Trifonova, 2021). Changes in behaviours produce measurement errors unless they produce a complete loss of the information needed to compute a non-behavioural measurement, in which case it should be considered as a missing data error. ...
... Strategy to quantify errors: No information was available about whether participants disconnected their trackers or not, nor was there information about the types of content triggering this. Besides, considering that participants had the tracking technologies installed before sampling them, we could not apply quasi-experimental approaches like the ones proposed by Toth and Trifonova (2021). Consequently, we could not quantify (1) whether participants disconnected their meters and (2) the extent to which this could cause a bias in TRI-POL's estimates. ...
Full-text available
Metered data, also called web‐tracking data, are generally collected from a sample of participants who willingly install or configure, onto their devices, technologies that track digital traces left when people go online (e.g., URLs visited). Since metered data allow for the observation of online behaviours unobtrusively, it has been proposed as a useful tool to understand what people do online and what impacts this might have on online and offline phenomena. It is crucial, nevertheless, to understand its limitations. Although some research have explored the potential errors of metered data, a systematic categorisation and conceptualisation of these errors are missing. Inspired by the Total Survey Error, we present a Total Error framework for digital traces collected with Meters (TEM). The TEM framework (1) describes the data generation and the analysis process for metered data and (2) documents the sources of bias and variance that may arise in each step of this process. Using a case study we also show how the TEM can be applied in real life to identify, quantify and reduce metered data errors. Results suggest that metered data might indeed be affected by the error sources identified in our framework and, to some extent, biased. This framework can help improve the quality of both stand‐alone metered data research projects, as well as foster the understanding of how and when survey and metered data can be combined.
... Both, data donation and tracking have in common that they require informed consent from the user and provide higher transparency to the user about what data they share with a researcher. This can be problematic for tracking methods, where the prospective nature of data shared can result in reactivity biases, i.e., the user changes their behavior because they know they are being tracked (Toth & Trifonova, 2021). Both methods require user involvement and are subject to sample biases during the tracking and donation process (Boeschoten et al., 2020;Breuer et al., 2022;. ...
Full-text available
In social media effects research, the role of specific social media content is understudied, in part attributable to the fact that communication science previously lacked methods to access social media content directly. Digital trace data (DTD) can shed light on textual and audio-visual content of social media use and enable the analysis of content usage on a granular individual level that has been previously unavailable. However, because digital trace data are not specifically designed for research purposes, collection and analysis present several uncertainties. This article is a collaborative effort by scholars to provide an overview of how three methods of digital trace data collection - APIs, data donations, and tracking - can be used in studying the effects of social media content in three important topic areas of communication research: misinformation, algorithmic bias, and well-being. We address the question of how to collect raw social media content data and arrive at meaningful measures with multiple state-of-the-art data collection techniques that can be used to study the effects of social media use on different levels of detail. We conclude with a discussion of best practices for the implementation of each technique, and a comparison of their advantages and disadvantages.
Full-text available
There is widespread public and academic interest in understanding the uses and effects of digital media. Scholars primarily use self-report measures of the quantity or duration of media use as proxies for more objective measures, but the validity of these self-reports remains unclear. Advancements in data collection techniques have produced a collection of studies indexing both self-reported and log-based measures. To assess the alignment between these measures, we conducted a pre-registered meta-analysis of this research. Based on 106 effect sizes, we found that self-reported media use correlates only moderately with logged measurements, that self-reports were rarely an accurate reflection of logged media use and that measures of problematic media use show an even weaker association with usage logs. These findings raise concerns about the validity of findings relying solely on self-reported measures of media use.
Full-text available
Both academic and public interest in social media and their effects have increased dramatically over the last decade. In particular, a plethora of studies have been conducted that aimed to uncover the relationship between social media use and youth well-being, fueled by recent concerns that declines in youth well-being may well be caused by a rise in digital technology use. However, reviews of the field strongly suggest that the picture may not be as clear-cut as previously thought, with some studies suggesting positive effects, and some studies suggesting negative effects on youth well-being. To shed light on this ambiguity, we have conducted a narrative review of 94 social media use and well-being studies. A number of patterns in methodological practices in the field have now become apparent: Self-report measures of general statistics around social media use dominate the field, which furthermore often falls short in terms of ecological validity and sufficient use of experimental designs that would enable causal inference. We go on to discuss why such practices are problematic in some cases, and more importantly, which concrete improvements can be made for future studies that aim to investigate the relationship between social media use and well-being.
Full-text available
Debates concerning the impacts of screen time are widespread. Existing research presents mixed findings, and lacks longitudinal evidence for any causal or long-term effects. We present a critical account of the current shortcomings of the screen time literature. These include poor conceptualisation, the use of non-standardised measures that are predominantly self-report, and issues with measuring screen time over time and context. Based on these issues, we make a series of recommendations as a basis for furthering academic and public debate. These include drawing on a user-focused approach in order to seek the various affordances gained from “screen use”. Within this, we can better understand the way in which these vary across time and context, and make distinction between objective measures of “screen time” compared to those more subjective experiences of uses or affordances, and the differential impacts these may bring.
Full-text available
Our smartphone is full of applications and data that analytically organize, facilitate and describe our lives. We install applications for the most varied reasons, to inform us, to have fun and for work, but, unfortunately, we often install them without reading the terms and conditions of use. The result is that our privacy is increasingly at risk. Considering this scenario, in this paper, we analyze the user’s perception towards privacy while using smartphone applications. In particular, we formulate two different hypotheses: 1) the perception of privacy is influenced by the knowledge of the data used by the installed applications; 2) applications access to much more data than they need. The study is based on two questionnaires (within-subject experiments with 200 volunteers) and on the lists of installed apps (30 volunteers). Results show a widespread abuse of data related to location, personal contacts, camera, Wi-Fi network list, running apps list, and vibration. An in-depth analysis shows that some features are more relevant to certain groups of users (e.g., adults are mainly worried about contacts and Wi-Fi connection lists; iOS users are sensitive to smartphone vibration; female participants are worried about possible misuse of the smartphone camera).
Full-text available
The rising penetration of smartphones now gives researchers the chance to collect data from smartphone users through passive mobile data collection via apps. Examples of passively collected data include geolocation, physical movements, online behavior and browser history, and app usage. However, to passively collect data from smartphones, participants need to agree to download a research app to their smartphone. This leads to concerns about nonconsent and nonparticipation. In the current study, we assess the circumstances under which smartphone users are willing to participate in passive mobile data collection. We surveyed 1,947 members of a German nonprobability online panel who own a smartphone using vignettes that described hypothetical studies where data are automatically collected by a research app on a participant’s smartphone. The vignettes varied the levels of several dimensions of the hypothetical study, and respondents were asked to rate their willingness to participate in such a study. Willingness to participate in passive mobile data collection is strongly influenced by the incentive promised for study participation but also by other study characteristics (sponsor, duration of data collection period, option to switch off the app) as well as respondent characteristics (privacy and security concerns, smartphone experience).
Sociability as a disposition describes a tendency to affiliate with others (vs. be alone). Yet, we know relatively little about how much social behavior people engage in during a typical day. One challenge to documenting social behavior tendencies is the broad number of channels over which socializing can occur, both in-person and through digital media. To examine individual differences in everyday social behavior patterns, here we used smartphone-based mobile sensing methods (MSMs) in four studies (total N = 926) to collect real-world data about young adults' social behaviors across four communication channels: conversations, phone calls, text messages, and use of messaging and social media applications. To examine individual differences, we first focused on establishing between-person variability in daily social behavior, examining stability of and relationships among daily sensed social behavior tendencies. To explore factors that may explain the observed individual differences in sensed social behavior, we then expanded our focus to include other time estimates (e.g., times of the day, days of the week) and personality traits. In doing so, we present the first large-scale descriptive portrait of behavioral sociability patterns, characterizing the degree to which young adults engaged in social behaviors and mapping these behaviors onto self-reported personality dispositions. Our discussion focuses on how the observed sociability patterns compare to previous research on young adults' social behavior. We conclude by pointing to areas for future research aimed at understanding sociability using mobile sensing and other naturalistic observation methods for the assessment of social behavior. (PsycINFO Database Record (c) 2019 APA, all rights reserved).