Conference PaperPDF Available

Model of User Engagement

Authors:

Abstract and Figures

Our research goal is to provide a better understanding of how users engage with online services, and how to measure this engagement. We should not speak of one main approach to measure user engagement – e.g. through one fixed set of metrics – because engagement depends on the online services at hand. Instead, we should be talking of models of user engagement. As a first step, we analysed a number of online services, and show that it is possible to derive effectively simple models of user engagement, for example, accounting for user types and temporal aspects. This paper provides initial insights into engagement patterns, allowing for a better understanding of the important characteristics of how users repeatedly interact with a service or group of services.
Content may be subject to copyright.
Models of User Engagement
Janette Lehmann1, Mounia Lalmas2, Elad Yom-Tov3and Georges Dupret4
1Universitat Pompeu Fabra Barcelona, Spain
janette.lehmann@gmx.de
2Yahoo! Research Barcelona, Spain
mounia@acm.org
3Yahoo! Research New York, USA
eladyt@yahoo-inc.com
4Yahoo! Labs Sunnyvale, USA
gdupret@yahoo-inc.com
Abstract. Our research goal is to provide a better understanding of how
users engage with online services, and how to measure this engagement.
We should not speak of one main approach to measure user engagement
– e.g. through one fixed set of metrics – because engagement depends
on the online services at hand. Instead, we should be talking of models
of user engagement. As a first step, we analysed a number of online
services, and show that it is possible to derive effectively simple models
of user engagement, for example, accounting for user types and temporal
aspects. This paper provides initial insights into engagement patterns,
allowing for a better understanding of the important characteristics of
how users repeatedly interact with a service or group of services.
Keywords: diversity of user engagement, models, user type, temporal aspect
1 Introduction
User engagement is the quality of the user experience that emphasises the posi-
tive aspects of the interaction, and in particular the phenomena associated with
being captivated by a web application, and so being motivated to use it. Suc-
cessful web applications are not just used, they are engaged with; users invest
time, attention, and emotion into them. In a world full of choice where the
fleeting attention of the user becomes a prime resource, it is essential that tech-
nology providers design engaging experiences. So-called engagement metrics are
commonly used to measure web user engagement. These include, for example,
number of unique users, click-through rates, page views, and time spent on a web
site. Although these metrics actually measure web usage, they are commonly em-
ployed as proxy for online user engagement: the higher and the more frequent
the usage, the more engaged the user. Major web sites and online services are
compared using these and other similar engagement metrics.
User engagement possesses different characteristics depending on the web
application; e.g. how users engage with a mail tool or a news portal is very dif-
ferent. However, the same engagement metrics are typically used for all types of
web application, ignoring the diversity of experiences. In addition, discussion on
the “right” engagement metrics is still going on, without any consensus on which
metrics to be used to measure which types of engagement. The aim of this paper
is to demonstrate the diversity of user engagement, through the identification
and the study of models of user engagement. To this end, we analysed a large
number of online sites, of various types (ranging from news to e-commerce to so-
cial media). We first show the diversity of engagement for these sites. To identify
models of engagement, we cluster all sites using various criteria (dimensions) of
engagement (e.g. user types, temporal aspects). Our results are two-fold. First,
we can effectively derive models of user engagement, for which we can associate
characteristics of the type of engagement. Second, by using various criteria, we
gain different but complementary insights into the types of engagement.
The paper is organised as follows. Section 2 provides related work. Section 3
describes the data and engagement metrics used. Section 4 demonstrates the
diversity of user engagement. Section 5 presents the methodology adopted to
identify models of user engagement, and the outcomes. Section 6 looks at rela-
tionships between models, providing further insights into types of engagement.
We finish with our conclusions and thoughts for future work.
2 Related Work
Approaches to measure user engagement can be divided into three main groups:
self-reported engagement, cognitive engagement, and online behaviour metrics.
In the former group, questionnaires and interviews (e.g. [7, 4]) are used to elicit
user engagement attributes or to create user reports and to evaluate engagement.
They can be carried out within a lab setting, or via on-line mechanisms (including
crowd-sourcing). However, these methods have known drawbacks, e.g. reliance
on user subjectivity. The second approach uses task-based methods (e.g. dual-
task [8], follow-on task), and physiological measures to evaluate the cognitive
engagement (e.g. facial expressions, vocal tone, heart rate) using tools such as
eye tracking, heart rate monitoring, and mouse tracking [3].
Measures in the second group, although objective, are suitable for measuring
only a small number of interaction episodes at close quarters. In contrast, the
web-analytics community has been studying user engagement through online be-
haviour metrics that assess users’ depth of engagement with a site. For instance,
[5] describes engagement metrics that indicate whether or not users consume
content slowly and methodically, return to a site, or subscribe to feeds. Widely
used metrics include click-through rates, number of page views, time spend on
a site, how often users return to a site, number of users, and so on. Only online
behaviour metrics are able to collect data from millions of users. Although these
metrics cannot explicitly explain why users engage with a service, they act as
proxy for online user engagement: the higher and the more frequent the usage,
the more engaged the user. Indeed, two millions of users accessing a service daily
is a strong indication of a high engagement with that service. Furthermore, by
varying specific aspects of the service, e.g. navigation structure, content, func-
Table 1. Engagement metrics used in this paper.
Metrics Description
Popularity (for a given time frame)
#Users Number of distinct users.
#Visits Number of visits.
#Clicks Number of clicks (page views).
Activity
ClickDepth Average number of page views per visit.
DwellTimeA Average time per visit (dwell time).
Loyalty (for a given time frame)
ActiveDays Number of days a user visited the site.
ReturnRate Number of times a user visited the site.
DwellTimeL Average time a user spend on the site.
tionality, and measuring the effect on engagement metrics can provide implicit
understanding on why users engage with the service. Finally, although this group
of measures is really accounting for “site engagement”, we retain the terminology
“user engagement” as it is commonly used by the online industries. We look at
models of user engagement based on this third group of metrics.
3 Metrics and Interaction Data
Engagement metrics The metrics used in this paper are listed in Table 1.
As our aim is to identify models of user engagement, we restrict ourselves to
a small set of widely reported metrics. We consider three types of engage-
ment metrics, reflecting, popularity,activity, and loyalty. Popularity metrics
measure how much a site is used, e.g. total number of users. The higher
the number, the more popular the corresponding site. How a site is used is
measured with activity metrics, e.g. average number of clicks per visit across
all users. Loyalty metrics are concerned with how often users return to a site.
An example is the return rate, i.e. average number of times users visited a
site5. Loyalty and popularity metrics depend on the considered time interval,
e.g. number of weeks considered. A highly engaging site is one with a high
number of visits (popular), where users spend lots of time (active), and return
frequently (loyal). It is however the case, as demonstrated next, that not
all sites, whether popular or not, have both active and loyal users, or vice
versa. It does not mean that user engagement on such sites is lower; it is sim-
ply different. Our conjuncture is that user engagement depends on the site itself.
5A user can return several times on a site during the same day, hence this metric is
different to the number of active days.
Interaction data This study required a large number of sites, and a record of
user interactions within them. We collected data during July 2011 from a sample
of approximately 2M users who gave their consent to provide browsing data
through the Yahoo! toolbar. These data are represented as tuples (timestamp,
bcookie, url). We restrict ourselves to sites with at least 100 distinct users per
month, and within the US. The latter is because studying the engagement of sites
across the world requires to account for geographical and cultural differences,
which is beyond the scope of the paper. This resulted in 80 sites, encompassing
a diverse set of sites and services such as news, weather, movies, mail, etc.
4 Diversity in Engagement
Sites Figure 1 reports the normalized engagement values for the eight metrics
and the 80 sites under study. All original values viof metric vare translated
into an ordinal scale and then normalized (µvis the mean of the ordinal vi
values, and σvis the corresponding standard deviation value): v0
i= (viµv)v.
The average value (ordinal) of an engagement metric becomes then zero. The
y-axes in Figure 1 order the sites in terms of number of users (#Users). Finally,
MergeUE is the linear combination of #Users,DwellTimeA, and ActiveDays.
We can see that sites differ widely in terms of their engagement. Some
sites are very popular (e.g. news sites) whereas others are visited by small
groups of users (e.g. specific interest sites). Visit activity also depends on
the sites, e.g. search sites tend to have a much shorter dwell time than sites
related to entertainment (e.g. games). Loyalty per site differs as well. Media
(news, magazines) and communication (e.g. messenger, mail) have many users
returning to them much more regularly, than sites containing information
of temporary interests (e.g. buying a car). Loyalty is also influenced by the
frequency in which new content is published (e.g. some sites produce new
content once per week). Finally, using one metric combining the three types
metrics (MergeUE) also shows that engagement varies across sites.
Metrics To show that engagement metrics capture different aspects of a site en-
gagement, we calculate the pair-wise metrics correlations using Kendall tau (τ)
rank correlation on the ordinal values. The resulting average intra-group cor-
relation is τ= 0.61, i.e. metrics of the same groups mostly correlate; whereas
the average inter-group correlation is τ= 0.23, i.e. metrics from different groups
correlate weakly or not at all. This shows that the intuition we followed when
we grouped the metrics is confirmed in practice.
The three popularity engagement metrics show similar engagement type for
all sites, i.e. high number of users implies high number of visits (τ= 0.82),
and vice versa. For the loyalty metrics, high dwell time per user comes from
users having more active days (τ= 0.66), and returning regularly on the site
(τ= 0.62). The correlation between the two activity metrics is lower (τ= 0.33).
There are no correlation between activity and, popularity or loyalty metrics.
High popularity does not entail high activity (τ= 0.09). Many site have many
#Users
1.5 0.5 0.5 1.0 1.5
ClickDepth
1.5 0.5 0.5 1.0 1.5
ActiveDays
1.5 0.5 0.5 1.0 1.5
MergeUE
4 −2 0 2 4
#Visits
1.5 0.5 0.5 1.0 1.5
DwellTimeA
1.5 0.5 0.5 1.0 1.5
ReturnRate
1.5 0.5 0.5 1.0 1.5
#Clicks
1.5 0.5 0.5 1.0 1.5
DwellTimeL
1.5 0.5 0.5 1.0 1.5
#Clicks DwellTimeL
DwellTimeA ReturnRate
#Users
#Visits
ClickDepth ActiveDays MergedUE
Popularity Activity Loyalty Combination
Fig. 1. Normalized engagement values per site (y-axes order sites by #Users).
0 20 40 60 80 100
media (special events),
personal management
media (required
occasionally),
shopping, social media
entertainment,
media
daily activity,
navigation
Tourists
Interested
Average
Active
VIP
Fig. 2. User groups (Tourists, Interested, Average, Active, VIP).
users spending little time on them; e.g. a search site is one where users come,
submit a query, get the result, and if satisfied, leave the site. This results in
a low dwell time even though user expectations were entirely met. The same
argument hold for a site on Q&A, or a weather site. What matters for such
sites is their popularity. Finally, we observe a moderate correlation (τ= 0.49)
between loyalty and popularity metrics. This is because popular sites are those
to which users return regularly. The same reasoning applies for the other metrics
of these two groups.
Users Studies have shown that users may arrive in a site by accident or through
exploration, and simply never return. Other users may visit a site once a month,
for example a credit card site to check their balance. On the other hand, sites
such as mail may be accessed by many users on a daily basis. We thus looked at
how active users are within a month, per site. The number of days a user visited
a site over a month is used for this purpose. We create five types of user groups6:
Group Number of days with a visit
Tourists : 1 day
Interested : 2-4 days
Average : 5-8 days
Active : 9-15 days
VIP : 16 days
The proportion of the user groups for each site is calculated, then sites with
similar proportion of user groups are clustered using k-means. Four cluster
were detected and the cluster centers calculated. Figure 2 displays the four
cluster centers, i.e. the proportion of user groups per cluster. The types of sites
in each cluster are shown, as illustration. We observe that the proportion of
tourist users is high for all sites. The top cluster has the highest proportion
of tourist users; typical sites include special events (e.g. the oscars) or those
related to configuration. The second from the top cluster includes sites related
to specific information that are occasionally needed; as such they are not
visited regularly within a month. The third cluster includes sites related to
e-commerce, media, which are used on a regular basis, albeit not daily. Finally,
the bottom cluster contains navigation sites (e.g. landing page) and commu-
nication sites (e.g. messenger). For these sites, the proportion of VIP users is
higher than the proportion of active and average users. The above indicates
that the type of users, e.g. tourist vs. VIP, matters when measuring engagement.
Time Here, we show that depending on the selected time span different types of
engagement can be observed. We use #Users to show this. Using the interaction
data spanning from February to July 2011, we normalized the number of users
per site (#Users) with the total number of users that visited any of the sites on
that day. The time series for each site was decomposed into three temporal com-
ponents: periodic, trend and peak, using local polynomial regression fitting [1].
To detect periodic behaviour we calculated the correlation between the extracted
periodic component and the residual between the original time series and the
trend component. To detect peaks, the periodic component was removed from
the time series and peaks were detected using a running median.
Figure 3 shows graphically the outcomes for four sites (under examples).
Possible reasons for a periodic or peak behaviours are given (under influence).
Finally, sites for which neither periodic behaviour nor peak were found are given
(under counter-example). The engagement pattern can be influenced by external
and internal factors. Communication, navigation and social media sites tend to
be more “periodically used” than media sites. Access to media sites tends to be
influenced by external factors (important news) or the frequency of publishing
6The terminology and the range of days is based on our experience in how user
engagement is studied in the online industry. For instance, a VIP user is one that
comes on average 4 days per week, so we chose the value 16 days within a month.
#safely
#en−maktoob
#games
#oddnews
Fig. 3. Engagement over time using #Users (February – July 2011).
new information. Interesting is the fact that sites with a periodic behaviour tend
to have no peaks and sites with peaks tend not to be periodic. Thus accounting
for time is likely to bring important insights when measuring site engagement.
5 Models of User Engagement
The previous section showed differences in site engagement. We study now these
differences to identify patterns (models) of user engagement. The base for all
studies is a matrix containing data from the 80 sites under study. Each site is
represented by eight engagement metrics. A metric can be further split into sev-
eral dimensions based on user and time combinations. The values of each metric
are transformed into an ordinal scale to overcome scaling issues. We clustered the
sites using the kernel k-means algorithm [2], with a Kendall tau rank correlation
kernel [6]. The number of clusters are chosen based on the eigenvalue distribution
of the kernel matrix. After clustering, each cluster centroid is computed using
the average rank of cluster members (for each metric). To describe the centroids
(the models), we refer to the subset of metrics selected based on the correlations
between them and the Kruskal-Wallis test with Bonferonni correction, which
identifies values of metrics that are statistically significantly different for at least
one cluster (compared to the other clusters).
Three sets of models are presented, based on the eight engagement metrics
(general), accounting for user groups (user-based), and capturing temporal as-
pects (time-based). Although all dimensions could be used together to derive one
set of models (e.g. using dimensionality reduction to elicit the important charac-
teristics of each model), generating the three sets separately provides clear and
focused insights into engagement patterns. When presenting each model, we give
illustrative examples of the types of sites belonging to them. It is not our aim to
explain why each site belongs to which model, and the associated implications.
popularity
[#Users]
activity
[ClickDepth]
activity
[DwellTimeA]
loyalty
[ReturnRate]
--
++
--
--
++
--
++
++
--
popularity activity
[ClickDepth]
activity
[DwellTime]
loyalty
average high low
++ --
model m
model m
model m
model m
model m
model m
g6
g5
g4
g3
g2
g1
Fig. 4. General models of engagement – Top panels display the cluster (model) centers.
Bottom panels provide the corresponding model descriptions.
5.1 General Models
We look at models of user engagement, without accounting for user type or tem-
poral aspect. We refer to them as “general models”. Our eight metrics generate
six “general” models of user engagement, visualized in Figure 4. As the three
popularity metrics exhibit the same effect, only #Users is reported. The same
applies for the loyalty metrics, i.e. only ActiveDays is reported. The two activity
metrics yield different behaviours, hence are both shown.
In model mg1, high popularity is the main factor; by contrast, low popularity
characterizes model mg6. Media sites providing daily news and search sites
follow model mg1; whereas model mg6captures interest-specific sites. The
main factor for model mg2is a high number of clicks per visit. This model
contains e-commerce and configuration (e.g. profile updating) sites, where the
main activity is to click. By contrast, model mg3describes the engagement of
users spending time on the site, but with few click and with low loyalty. The
model is followed by domain-specific media sites of periodic nature, which are
therefore not often accessed. However when accessed, users spend more time to
consume their content7. Next, model mg4is characterized by highly loyal users,
who spend little time and perform few actions. Navigational sites (e.g. front
pages) belong to model mg4; their role is to direct users to interesting content
in other sites, and what matters is that users come regularly to them. Finally,
model mg5captures sites with no specific engagement patterns.
7Looking further into this, it seems that the design of such sites (compared to main-
stream media sites) leads to such type of engagement, since new content is typically
published on their front page. Thus users are not enticed to reach (if any) additional
content in these sites. This is the sort of reasoning that becomes possible by looking
at models of user engagement, as investigated in this paper.
Tourist (T)
Interested (I)
VIP (V)
Active (A)
Average (N)
popularity
[#Users]
activity
[DwellTimeA]
loyalty
[ReturnRate]
model m
model m
model m
model m
model m
model m
model m
++
++
++
popularity activity loyalty user groups
T
T,I
T,I,N,A
T,I,N,A,V
T,I,N,A,V
T,I,N,A,V
T,I,N,A,V
average high low from T to V: increasing decreasing
++ --
--
--
++
++
u7
u6
u5
u4
u3
u2
u1
Fig. 5. User-based models of engagement – Top panels display the cluster (model)
centers. Bottom panels provide the corresponding model descriptions.
5.2 User-based Models
We investigate now models of user engagement that account for the five user
groups elicited in Section 4. The eight metrics were split, each into five dimen-
sions, one for each user group, i.e. VIP to Tourists. This gives 40 engagement
values per site. A site without a particular user group get 0 values for all met-
rics for that group. We obtain seven “user-based” models (clusters), visualized
in Figure 5. We only report the results for one metric of each group (#Users,
DwellTimeA and ReturnRate), as these are sufficient for our discussion.
The first two models, model mu1and model mu2are characterised by
high popularity across all user groups. Activity is high across all user groups for
model mu2, whereas it increases from Tourist to VIP users for model mu1.
Finally, both models are characterised by an increase in loyalty from Tourist to
VIP users. Popular media sites belong to these models. The next two models,
model mu3and model mu4, exhibit the same increase in popularity from
Tourist to VIP users. High loyalty across all groups and an increase in activity
from Tourist to VIP users further characterise model mu3. Sites falling in this
model include navigation pages (e.g. front pages). High activity across all user
groups apart for VIP and an increasing loyalty from Tourists to Active users is
an important feature of model mu4, which typically include game and sport
sites. Interestingly, model mu4is characterised by a low number of VIP users,
compared to the three previous models.
weekdays
weekends
popularity
[#Users]
activity
[DwellTimeA]
loyalty
[ReturnRate]
{wd}++
{wd}++
{we}++
{we}++
{wd}++
{we}++ {we}++
{wd}++
{we}++
popularity activity loyalty
model m
model m
model m
model m
model m
t5
t4
t3
t2
t1
average high weekdays weekends
++ {wd} {we}
Fig. 6. Time-based models of engagement – Top panels display the cluster (model)
centers. Bottom panels provide the corresponding model descriptions.
Third, model mu5model caters for the engagement of Tourist, Interested
and Average users. Loyalty increases going from Tourist to Average users, which
makes sense as loyalty is used to determine the user groups. More interestingly
is that activity augments the same way, whereas popularity decreases. Shopping
and social media sites belong to this model. Finally, model mu6and model mu7
are concerned with the low engagement (popularity) of Interested and Tourist
users, and only Tourist users, respectively. They correspond to sites on very
particular interests or of a temporary nature; as such popularity for these two
groups of users is low compared to other models. Moreover, model mu7indicates
that when on site, the activity of Tourist users is not negligible. By contrast,
model mu6highlights a higher activity of Interested users than Tourist users.
5.3 Time-based Models
We look now at models of user engagement that account for the temporal aspect.
For simplicity, we consider two time dimensions, weekdays and weekends. Each
site becomes associated with fourteen metrics; seven of our engagement metrics
are split into these two time dimensions (ActiveDays is not used, as it has a dif-
ferent time span). To elicit the differences in engagement on weekdays vs. week-
ends, we transformed the absolute engagement values into proportional ones,
e.g. the proportional ReturnRate is ReturnRateweekdays / (ReturnRateweekdays
+ ReturnRateweekend ). The same methodology as that used for the other types
of models was then applied. This led to the identification of five “time-based”
models of engagement (clusters), shown in Figure 6.
We can see that model mt1and model mt2describe sites with high pop-
ularity on weekends; loyalty is also high on weekends for model mt1, whereas
it is high on weekdays for model mt2. Both models characterize sites related
to entertainment, weather, shopping and social media. The loyalty in model
mt2is more significant on weekdays, because it contains sites for daily use,
whereas model mt1contains sites relating to hobbies and special interests. Sec-
ond, model mt3characterizes sites that are highly active, and to which users
return frequently on weekends. Sites following this model include event related
media sites (e.g. sport), search and personal data management (e.g. calendar,
address book). Finally, model mt4and model mt5are similar as they both are
characterised with high popularity during weekdays, and model mt4is further
characterised by high activity during weekdays. The models are followed by sites
related to daily and particular news and software; model mt4exhibits higher
activity because it contains sites used for work issues.
6 Relationship between Models
We checked whether the three groups of models describe different engagement
aspects of the same set of sites or that they are largely unrelated. We calculate
the similarity between the three groups using the Variance of Information. The
outcome is shown in Table 2 (5.61 is the maximal difference). We observe the
highest (albeit low) similarity between the general and user-based models. The
user- and time-based models differ mostly. Overall, all groups of models are
independent i.e. they characterize different if not orthogonal aspects of user
engagement, even though the matrices used to generate them are related.
We cannot show here all the relationships between each model of each group.
Instead, we discuss two cases. For model mg1, a general model characterizing
popular sites, 38% of its sites belong to model mu1(high popularity and increas-
ing activity and loyalty from tourists to VIP users), and 31% follow model mu5
(no VIP users, decreasing popularity and increasing activity and loyalty from
tourists to active users). We now look at the user-based model mu2character-
izing sites with high popularity and activity in all user groups and an increasing
loyalty from Tourists to VIP users. Sites following this model are split into two
time-based models, model mt2(50%) (high popularity on weekends and high
loyalty on weekdays), and model mt3(50%) (high activity and loyalty on week-
ends). This comparison provides different angles into user engagement, allowing
to zoom into particular areas of interests, e.g. further differentiating the “high
loyalty” associated with model mu2into weekdays vs. weekends.
Table 2. Intersections of the models – cluster similarities.
General User Time (Range [0,5.61])
General 0.00 3.50 4.23
User 3.50 0.00 4.25
Time 4.23 4.25 0.00
7 Conclusions and Future Work
Our aim was to identify models of user engagement. We analysed a large sample
of user interaction data on 80 online sites. We characterised user engagement
in terms of three families of commonly adopted metrics that reflect different as-
pects of engagement: popularity, activity and loyalty. We further divided users
according to how often they visit a site. Finally, we investigated temporal be-
havioural differences in engagement. Then using simple approaches (e.g. k-means
clustering), we generated three groups of models of user engagement: general,
user-based and time-based. This provided us different but complementary in-
sights on user engagement and its diversity. This research constitutes a first step
towards a methodology for deriving a taxonomy of models of user engagement.
This paper did not study why a site follows one engagement model.
However, while analysing our results, we observed that sites of the same type
(e.g. mainstream media) do not necessarily belong to the same model(s) of
engagement. It would be interesting to understand the reasons for this, e.g. is
it the type of content, the structure of the site, etc? Furthermore, other aspects
of user engagement should be considered. Accounting for user demographics
(e.g. gender, age) and finer-grained temporal aspects (e.g. time of the day) are
likely to bring additional and further insights into modelling engagement. In-
corporating geographical location will bring perspectives related to culture and
language. Finally, we must revisit engagement metrics. Indeed, the description
of models often referred to only some of the metrics employed. A major next
step will be to map the most appropriate metrics to each model of engagement.
Acknowledgements Janette Lehmann acknowledges support from the Spanish Min-
istry of Science through the project TIN2009- 14560-C03-01.
References
1. R.B. Cleveland, W.S. Cleveland, J.E. McRae, and I. Terpenning. A seasonal-trend
decomposition procedure based on loss. Journal of Official Statistics, 6:3–73, 1990.
2. I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral
clustering and graph cuts. Technical report, 2004.
3. J. Huang, R.W. White, and S.T. Dumais. No clicks, no problem: using cursor
movements to understand and improve search. In CHI, 2011.
4. H. OBrian, E. Toms, K. Kelloway, and E. Kelley. The development and evaluation
of a survey to measure user engagement. JASIST 61(1):50-69, 2010.
5. E.T. Peterson and J. Carrabis. Measuring the immeasurable: Visitor engagement.
Technical report, Web Analytics Demystified, 2008.
6. S. Sabato, E. Yom-Tov, A. Tsherniak, and S. Rosset. Analyzing system logs: a new
view of what’s important. In USENIX workshop on Tackling computer systems
problems with machine learning techniques, 2007.
7. J. Sauro and J. S. Dumas. Comparison of three one-question, post-task usability
questionnaires. In CHI, 2009.
8. P. Schmutz, S. Heinz, Y. M´etrailler, and K. Opwis. Cognitive load in ecommerce
applications: measurement and effects on user satisfaction. Advances in Human-
Computer Interaction, 2009:3:1–3:9, 2009.
... HCI has managed to surpass the matters of studying enjoyment, emotions, and engagement, based on how humans interact with computers in novel ways (Wu & Bryan-Kinns, 2019). Engagement, the involvement of an individual's full attention and emotion towards the task, has been classified as one of the significant qualities of HCI activities (Lehmann et al., 2012;O'Brien & Toms, 2008;O'Brien, 2010). Whereas, Creative Engagement, one's sense of personal connection and enrichment of the activity, has been shown to be more importantly related to intrinsically rewarding experience as well as the sense of flow (Lehmann et al., 2012;O'Brien & Toms, 2008;O'Brien, 2010). ...
... Engagement, the involvement of an individual's full attention and emotion towards the task, has been classified as one of the significant qualities of HCI activities (Lehmann et al., 2012;O'Brien & Toms, 2008;O'Brien, 2010). Whereas, Creative Engagement, one's sense of personal connection and enrichment of the activity, has been shown to be more importantly related to intrinsically rewarding experience as well as the sense of flow (Lehmann et al., 2012;O'Brien & Toms, 2008;O'Brien, 2010). Similarly, Csikszentmihalyi and Rathunde (1993) discuss the concept of "Flow", which refers to the mental state of being fully absorbed, along with the enjoyment that one feels during an activity. ...
... In digital games, for example, most of the players join the new game in the first few weeks after it was released . In a different domain, Lehmann et al. (2012) showed that the recency of tags has a positive effect on their recurrence probability. Garg et al. (2019) finally considered the recency of the past session with respect to the current session in session-based recommendations. ...
Article
Full-text available
The widespread use of temporal aspects in user modeling indicates their importance, and their consideration showed to be highly effective in various domains related to user modeling, especially in recommender systems. Still, past and ongoing research, spread over several decades, provided multiple ad-hoc solutions, but no common understanding of the issue. There is no standardization and there is often little commonality in considering temporal aspects in different applications. This may ultimately lead to the problem that application developers define ad-hoc solutions for their problems at hand, sometimes missing or neglecting aspects that proved to be effective in similar cases. Therefore, a comprehensive survey of the consideration of temporal aspects in recommender systems is required. In this work, we provide an overview of various time-related aspects, categorize existing research, present a temporal abstraction and point to gaps that require future research. We anticipate this survey will become a reference point for researchers and practitioners alike when considering the potential application of temporal aspects in their personalized applications.
... Most studies agree that engagement includes some interaction with a DMHI [18]; however, there is little agreement as to what exactly engagement is, its bounds, and a precise conceptualization of the concept in general (see Yeager and Benight [19] for a full review). Systematic reviews of engagement research concluded that the definition of engagement must go beyond objective measures of use to include subjective measures of attention, interest, and affect [14][15][16]. ...
Article
Full-text available
Background: Worldwide, exposure to potentially traumatic events is extremely common, and many individuals develop posttraumatic stress disorder (PTSD) along with other disorders. Unfortunately, considerable barriers to treatment exist. A promising approach to overcoming treatment barriers is a digital mental health intervention (DMHI). However, engagement with DMHIs is a concern, and theoretically based research in this area is sparse and often inconclusive. Objective: The focus of this study is on the complex issue of DMHI engagement. On the basis of the social cognitive theory framework, the conceptualization of engagement and a theoretically based model of predictors and outcomes were investigated using a DMHI for trauma recovery. Methods: A 6-week longitudinal study with a national sample of survivors of trauma was conducted to measure engagement, predictors of engagement, and mediational pathways to symptom reduction while using a trauma recovery DMHI (time 1: N=915; time 2: N=350; time 3: N=168; and time 4: N=101). Results: Confirmatory factor analysis of the engagement latent constructs of duration, frequency, interest, attention, and affect produced an acceptable model fit (χ22=8.3; P=.02; comparative fit index 0.973; root mean square error of approximation 0.059; 90% CI 0.022-0.103). Using the latent construct, the longitudinal theoretical model demonstrated adequate model fit (comparative fit index 0.929; root mean square error of approximation 0.052; 90% CI 0.040-0.064), indicating that engagement self-efficacy (β=.35; P<.001) and outcome expectations (β=.37; P<.001) were significant predictors of engagement (R2=39%). The overall indirect effect between engagement and PTSD symptom reduction was significant (β=-.065; P<.001; 90% CI -0.071 to -0.058). This relationship was serially mediated by both skill activation self-efficacy (β=.80; P<.001) and trauma coping self-efficacy (β=.40; P<.001), which predicted a reduction in PTSD symptoms (β=-.20; P=.02). Conclusions: The results of this study may provide a solid foundation for formalizing the nascent science of engagement. Engagement conceptualization comprised general measures of attention, interest, affect, and use that could be applied to other applications. The longitudinal research model supported 2 theoretically based predictors of engagement: engagement self-efficacy and outcome expectancies. A total of 2 task-specific self-efficacies-skill activation and trauma coping-proved to be significant mediators between engagement and symptom reduction. Taken together, this model can be applied to other DMHIs to understand engagement, as well as predictors and mechanisms of action. Ultimately, this could help improve the design and development of engaging and effective trauma recovery DMHIs.
... Community stability is simply the aggregation of individual measurements of editor loyalty, i.e., their recurrence over time. This is related to user loyalty, commonly measured in many platforms and websites [33]. ...
Article
Full-text available
Wikipedia is an undeniably successful project, with unprecedented numbers of online volunteer contributors. After 2007, researchers started to observe that the number of active editors for the largest Wikipedias declined after rapid initial growth. Years after those announcements, researchers and community activists still need to understand how to measure community health. In this paper, we study patterns of growth, decline and stagnation, and we propose the creation of 6 sets of language-independent indicators that we call “Vital Signs”. Three focus on the general population of active editors creating content: retention, stability, and balance; the other three are related to specific community functions: specialists, administrators, and global community participation. We borrow the analogy from the medical field, as these indicators represent a first step in defining the health status of a community; they can constitute a valuable reference point to foresee and prevent future risks. We present our analysis for eight Wikipedia language editions, and we show that communities are renewing their productive force even with stagnating absolute numbers; we observe a general lack of renewal in positions related to special functions or administratorship. Finally, we evaluate our framework by discussing these indicators with Wikimedia affiliates to support them in promoting the necessary changes to grow the communities.
... For instance, judging only algorithm performance does not identify whether the usage of the dashboard effectively helps complete the intended task, or indicate whether the user perceives the interaction as a positive one. Additionally, we know from HCI that an unpleasant experience will influence whether a user continues to interact with a system/application or moves on to another [80]. The dashboard's main purpose positions it as a fundamentally user facing system, for which measures of algorithm performance are essential, but far from sufficient to model the complexity of the interaction. ...
Article
In the era of 'information overload', effective information provision is essential for enabling rapid response and critical decision making. In making sense of diverse information sources, dashboards have become an indispensable tool, providing fast, effective, adaptable, and personalized access to information for professionals and the general public alike. However, these objectives place heavy requirements on dashboards as information systems in usability and effective design. Understanding these issues is challenging given the absence of consistent and comprehensive approaches to dashboard evaluation. In this article we systematically review literature on dashboard implementation in healthcare, where dashboards have been employed widely, and where there is widespread interest for improving the current state of the art, and subsequently analyse approaches taken towards evaluation. We draw upon consolidated dashboard literature and our own observations to introduce a general definition of dashboards which is more relevant to current trends, together with seven evaluation scenarios - task performance, behaviour change, interaction workflow, perceived engagement, potential utility, algorithm performance and system implementation. These scenarios distinguish different evaluation purposes which we illustrate through measurements, example studies, and common challenges in evaluation study design. We provide a breakdown of each evaluation scenario, and highlight some of the more subtle questions. We demonstrate the use of the proposed framework by a design study guided by this framework. We conclude by comparing this framework with existing literature, outlining a number of active discussion points and a set of dashboard evaluation best practices for the academic, clinical and software development communities alike.
... Thus, there are no scales to compare for the engagement and types of media on Instagram. However, models for user engagement was suggested in Lehmann et al., 2012, and used "popularity, activity, and loyalty" metrics in the websites. Here, there are some measurements we cannot take with Instagram analytics such as loyalty, return to the site. ...
Article
Full-text available
Understanding consumer behavior and decisions on e-commerce are vital. Well-defined consumer behavior and investigating what influences that behavior on an online shopping journey is a key for an online seller. However, having insights on what affects consumer behavior and understanding the relationship among content and user is a complex problem. There are various aspects of social media content in this process that mediates the decisions and behavior of customers. This paper investigates consumer behavior in connection with social media content from the media richness theory perspective. In particular, the changes in the content and its effects on consumer engagement and interaction were analyzed by considering the changes in engagement rates and the number of interactions. For empirical testing, a case study is conducted in a start-up e-commerce company, called Freja Silver. The variations of content have been analyzed and data-driven results have been evaluated.
... Engagement has been captured in prior studies using various metrics, such as number of visits, number of clicks, average number of page views per visit (Geva, Reichman, & Somech, 2017;Lehmann, Lalmas, Yom-Tov, & Dupret, 2012), and number of idea submissions and rating others' submissions (Nguyen et al., 2015). In this study, based on the viewpoint of engagement and the technical features of the microblog platform, there can be three different kinds of engagement: liking, commenting, and sharing. ...
Article
Full-text available
This paper examines how alternative food networks (AFNs) cultivate engagement on a social media platform. Using the method proposed in Kar and Dwivedi (2020) and Berente et al. (2019), we contribute to theory through combining exploratory text analysis with model testing. Using the theoretical lens of relationship cultivation and social media engagement, we collected 55,358 original Weibo posts by 90 farms and other AFN participants in China and used Latent Dirichlet Allocation (LDA) modeling for topic analysis. We then used the literature to map the topics with constructs and developed a theoretical model. To validate the theoretical model, a panel dataset was constructed on Weibo account and year level, with Chinese city-level yearly economic data included as control variables. A fixed effects panel data regression analysis was performed. The empirical results revealed that posts centered on openness/disclosure, sharing of tasks, and knowledge sharing result in positive levels of social media engagement. Posting about irrelevant information and advertising that uses repetitive wording in multiple posts had negative effects on engagement. Our findings suggest that cultivating engagement requires different relationship strategies, and social media platforms should be leveraged according to the context and the purpose of the social cause. Our research is also among the early studies that use both big data analysis of large quantities of textual data and model validation for theoretical insights.
Article
Full-text available
Businesses should implement TikTok as a social media marketing tool. In Indonesia particularly, there are more than 22 million active users. To enhance the social media marketing efficacy on TikTok, there is a need for an empirical study discussing factors driving social media engagement on TikTok. This research aims to investigate the effects of visual complexity and content types on social media engagement (i.e., likes, comments, and shares) on TikTok. A total of 647 posts were collected from 7 business accounts marketing their products to Indonesian consumers. This study employed content analysis, and data were computed using negative binomial regression on SPSS. Results prove that TikTok content with high visual complexity has negative effects on shares. Respond to Comment variable has positive impacts on likes; Product, Respond to Comment, and Review posts have positive effects on comments; Product and Review posts have positive effects on shares. Further, this study also evidences that Respond to Comment and Humor parameters negatively influence shares. Theoretically, this research expands social media marketing literature. Practically, insights from this study can guide brands in creating viral TikTok content that eventually can enhance business performance.
Article
Full-text available
System logs, such as the Windows Event log or the Linux system log, are an important resource for computer system management. We present a method for ranking system log messages by their estimated value to users, and generating a log view that displays the most important messages. The ranking process uses a dataset of system logs from many computer systems to score messages. For better scoring, unsupervised clustering is used to identify sets of systems that behave similarly. We propose a new feature construction scheme that measures the difference in the ranking of messages by frequency, and show that it leads to better clustering results. The expected distribu- tion of messages in a given system is estimated using the resulting clusters, and log messages are scored using this estimation. We show experimental results from tests on xSeries servers. A tool based on the described methods is being used to aid support personnel in the IBM xSeries support center.
Conference Paper
Full-text available
Post-task ratings of difficulty in a usability test have the potential to provide diagnostic information and be an additional measure of user satisfaction. But the ratings need to be reliable as well as easy to use for both respondents and researchers. Three one-question rating types were compared in a study with 26 participants who attempted the same five tasks with two software applications. The types were a Likert scale, a Usability Magnitude Estimation (UME) judgment, and a Subjective Mental Effort Question (SMEQ). All three types could distinguish between the applications with 26 participants, but the Likert and SMEQ types were more sensitive with small sample sizes. Both the Likert and SMEQ types were easy to learn and quick to execute. The online version of the SMEQ question was highly correlated with other measures and had equal sensitivity to the Likert question type. Author Keywords
Article
Full-text available
Guidelines for designing usable interfaces recommend reducing short term memory load. Cognitive load, that is, working memory demands during problem solving, reasoning, or thinking, may affect users' general satisfaction and performance when completing complex tasks. Whereas in design guidelines numerous ways of reducing cognitive load in interactive systems are described, not many attempts have been made to measure cognitive load in Web applications, and few techniques exist. In this study participants' cognitive load was measured while they were engaged in searching for several products in four different online book stores. NASA-TLX and dual-task methodology were used to measure subjective and objective mental workload. The dual-task methodology involved searching for books as the primary task and a visual monitoring task as the secondary task. NASA-TLX scores differed significantly among the shops. Secondary task reaction times showed no significant differences between the four shops. Strong correlations between NASA-TLX, primary task completion time, and general satisfaction suggest that NASA-TLX can be used as a valuable additional measure of efficiency. Furthermore, strong correlations were found between browse/search preference and NASA-TLX as well as between search/browse preference and user satisfaction. Thus we suggest browse/search preference as a promising heuristic assessment method of cognitive load.
Article
Abstract: STL is a filtering procedure for decomposing a time series into trend , seasonal , and remainder components. STL has a simple design that consists of a sequence of applications of the loess smoother; the simplicity allows analysis of the properties of the procedure and ...
Article
Research and Analysis from WebAnalyticsDemystified The Web Analytics Thought Leaders w w w .w eb an alytic sd emystif ied .c o m EXECUTIVE SUMMARY Without a doubt, "engagement" has been one of the hottest buzzwords in digital advertising and marketing in the past 18 months. Forrester Research has written about it, companies founded to measure it, and countless arguments spawned just seeking a reasonable working definition of the term to apply in a meaningful way to the online channel. Unfortunately, despite the intense level of interest in the subject, few real gains have been made towards developi ng a practical and useful measure of engagement that can be applied to billions of dollars of advertising, marketing, and technology investments made annually on the Internet. While solutions exist—notably the Evolution Technology™ developed by this document's co-author Mr. Joseph Carrabis—most are relatively unknown and some are not easily integrated with the widely deployed digital measurement solutions in the marketplace today. Until now.
Article
Recently, a variety of clustering algorithms have been proposed to handle data that is not linearly separable. Spectral clustering and kernel k-means are two such methods that are seemingly quite different. In this paper, we show that a general weighted kernel k-means objective is mathematically equivalent to a weighted graph partitioning objective. Special cases of this graph partitioning objective include ratio cut, normalized cut and ratio association. Our equivalence has important consequences: the weighted kernel k-means algorithm may be used to directly optimize the graph partitioning objectives, and conversely, spectral methods may be used to optimize the weighted kernel k-means objective. Hence, in cases where eigenvector computation is prohibitive, we eliminate the need for any eigenvector computation for graph partitioning. Moreover, we show that the Kernighan-Lin objective can also be incorporated into our framework, leading to an incremental weighted kernel k-means algorithm for local optimization of the objective. We further discuss the issue of convergence of weighted kernel k-means for an arbitrary graph affinity matrix and provide a number of experimental results. Theseresults show that non-spectral methods for graph partitioning are as effective as spectral methods and can be used for problems such as image segmentation in addition to data clustering.
Article
STL is a filtering procedure for decomposing a time series into trend, seasonal, and remainder components. STL has a simple design that consists of a sequence of applications of the loess smoother; the simplicity allows analysis of the properties of the procedure and allows fast computation, even for very long time series and large amounts of trend and seasonal smoothing. Other features of STL are specification of amounts of seasonal and trend smoothing that range, in a nearly continuous way, from a very small amount of smoothing to a very large amount; robust estimates of the trend and seasonal components that are not distorted by aberrant behavior in the data; specification of the period of the seasonal component to any integer multiple of the time sampling interval greater than one; and the ability to decompose time series with missing values.
Conference Paper
Understanding how people interact with search engines is important in improving search quality. Web search engines typically analyze queries and clicked results, but these ac- tions provide limited signals regarding search interaction. Laboratory studies often use richer methods such as gaze tracking, but this is impractical at Web scale. In this paper, we examine mouse cursor behavior on search engine results pages (SERPs), including not only clicks but also cursor movements and hovers over different page regions. We: (i) report an eye-tracking study showing that cursor position is closely related to eye gaze, especially on SERPs ; (ii) pre- sent a scalable approach to capture cursor movements, and an analysis of search result examination behavior evident in these large-scale cursor data ; and (iii) describe two applica- tions (estimating search result relevance and distinguishing good from bad abandonment) that demonstrate the value of capturing cursor data. Our findings help us better under- stand how searchers use cursors on SERPs and can help design more effective search systems. Our scalable cursor tracking method may also be useful in non-search settings.
Article
Facilitating engaging user experiences is essential in the design of interactive systems. To accomplish this, it is necessary to understand the composition of this construct and how to evaluate it. Building on previous work that posited a theory of engagement and identified a core set of attributes that operationalized this construct, we constructed and evaluated a multidimensional scale to measure user engagement. In this paper we describe the development of the scale, as well as two large-scale studies (N=440 and N=802) that were undertaken to assess its reliability and validity in online shopping environments. In the first we used Reliability Analysis and Exploratory Factor Analysis to identify six attributes of engagement: Perceived Usability, Aesthetics, Focused Attention, Felt Involvement, Novelty, and Endurability. In the second we tested the validity of and relationships among those attributes using Structural Equation Modeling. The result of this research is a multidimensional scale that may be used to test the engagement of software applications. In addition, findings indicate that attributes of engagement are highly intertwined, a complex interplay of user-system interaction variables. Notably, Perceived Usability played a mediating role in the relationship between Endurability and Novelty, Aesthetics, Felt Involvement, and Focused Attention.