PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Traditionally, the efficiency and effectiveness of search systems have both been of great interest to the information retrieval community. However, an in-depth analysis of the interaction between the response latency and users' subjective search experience in the mobile setting has been missing so far. To address this gap, we conduct a controlled study that aims to reveal how response latency affects mobile web search. Our preliminary results indicate that mobile web search users are four times more tolerant to response latency reported for desktop web search users. However, when exceeding a certain threshold of 7-10 sec, the delays have a sizeable impact and users report feeling significantly more tensed, tired, terrible, frustrated and sluggish, all which contribute to a worse subjective user experience.
Content may be subject to copyright.
Impact of Response Latency on User Behaviour in
Mobile Web Search
Ioannis Arapakis
Telefonica Research
Spain
ioannis.arapakis@telefonica.com
Souneil Park
Telefonica Research
Spain
souneil.park@telefonica.com
Martin Pielot
Google
Germany
mpielot@google.com
ABSTRACT
Traditionally, the eciency and eectiveness of search systems
have both been of great interest to the information retrieval com-
munity. However, an in-depth analysis of the interaction between
the response latency and users’ subjective search experience in
the mobile setting has been missing so far. To address this gap, we
conduct a controlled study that aims to reveal how response latency
aects mobile web search. Our preliminary results indicate that
mobile web search users are four times more tolerant to response
latency reported for desktop web search users. However, when
exceeding a certain threshold of 7-10 sec, the delays have a size-
able impact and users report feeling signicantly more tensed,tired,
terrible,frustrated and sluggish, all which contribute to a worse
subjective user experience.
CCS CONCEPTS
Human-centered computing Human computer interac-
tion (HCI);User studies;Haptic devices.
KEYWORDS
mobile search; response latency; user behaviour; user study
ACM Reference Format:
Ioannis Arapakis, Souneil Park, and Martin Pielot. 2021. Impact of Response
Latency on User Behaviour in Mobile Web Search. In Proceedings of the
2021 ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR ’21), March 14–19, 2021, Canberra, Australia. ACM, New York, NY,
USA, 6 pages. https://doi.org/10.1145/3406522.3446038
1 BACKGROUND AND MOTIVATION
Site performance has been, from an economic standpoint, a major
determinant to a website’s success [
12
14
,
16
,
18
,
23
,
26
,
31
]. That is,
a website that is characterised by relatively high page load times will
not only degrade the experience of its users that are accustomed to
sub-second response times, but will be also penalised with respect
to its page ranking, since site speed is one of the factors of the
ranking algorithm for many commercial search sites.
In a joint conference presentation [
15
], Bing and Google scien-
tists presented a series of latency experiments. Bing reported that
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
CHIIR ’21, March 14–19, 2021, Canberra, Australia
©2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8055-3/21/03.. .$15.00
https://doi.org/10.1145/3406522.3446038
an added latency of 1.5 seconds to their usual page loading times
reduced the queries per user by 1.8%, the revenues per user by
4.3%, and the overall satisfaction by 3.8%. In a similar vein, Google
reported that a 400 ms delay resulted in a 0.59% reduction in the
number of searches per user and diminished trac by 20%. What’s
even more noteworthy, is that the slower user experience had a long-
term eect on search behaviour Additionally, Amazon reported [
25
]
that a 100 ms latency would result in a 1% revenues drop, which
amounts to a loss of $745 million per year
1
. The aforementioned
reports [
15
,
25
] point out that a slow website can incur considerable
negative eects on Search Engine Optimization (SEO), conversion
rates, revenue, and user experience.
Such ndings have motivated a large body of research to in-
vestigate the response time of computer systems (refer to [
11
] for
an overview). In the more specic context of web systems, earlier
work investigated the impact of page load time on web browsing
behaviour [
12
14
,
16
,
18
,
23
,
26
,
31
]. Taylor et al
. [31]
reported page
load times tolerable by users who are seeking information online to
be in the 7-11 seconds range. Despite being outdated, [
23
] also pro-
vides references to studies on identifying the largest page load time
that users can tolerate. A related line of research has investigated
the trade-o between the cost of searching in information seeking.
However, the bulk of these studies were conducted over a decade
ago, when people used primarily desktop computers to browse the
internet and were accustomed to slower download speeds. Thus, it
is not clear what exactly the expectations of today’s users are.
More specically, in [
22
] the authors veried several hypotheses
(taken from information foraging and search economic theories)
about how users’ search behaviour should change when faced with
delays. Baskaya et al
. [5]
simulated interactive search sessions in a
desktop scenario, where querying eort is low, and a smart phone
scenario, which requires high querying eort. They showed that the
user eort spent on searching, when coupled with a time constraint,
determined the user search behaviour in both use cases. Schurman
and Brutlag
[27]
exposed the users of a commercial search site to
varying response time delays and observed the impact on long-term
search behaviour. Barreda-Ángeles et al
. [4]
conducted a controlled
study to reveal the physiological eects of response latency on
users and discovered that latency eects are present even at small
increases in response latency, but this analysis is limited to the
desktop setting. Last, Teevan et al
. [32]
performed a query log
analysis on the impact of search latency on user engagement and
found that, with increasing response latency, the likelihood that
the user will click on the search results decreased.
Nowadays, web trac is higher on mobile [
30
] and people use
mostly mobile phones for their everyday tasks. In addition, 4G
1Considering the company’s estimated annual revenue of $74.5 billion
arXiv:2101.09086v1 [cs.IR] 22 Jan 2021
technology is an order of magnitude faster than the xed lines of a
decade ago. Thus, we need to assume that people have adapted to
faster internet speeds, and that the expectations of what constitutes
“good” internet may have risen. To this end, we revisit the work of
understanding the eect of latency on user behaviour in the context
of web search. Moreover, since previous work focused primarily on
desktop search, we situate our analysis to the mobile setting and
consider a number of simulated and increasing cellular network
latency values, the eect of which is not well understood.
2 USER STUDY
To demonstrate the impact of response latency on users’ search
behavior, we carried out a controlled experiment that examined
users’ interactions with dierent search sites. To avoid that partici-
pants became aware of the articially-induced latency and that this
awareness altered the results, we set-up our experiment in a way
that the apparent goal was the subjective evaluation of search sites
for answering complex tasks (unbeknownst to the participants, we
manipulated the response latency for each question & search site
pair
2
). Additionally, to mitigate the potential eect of brand bias
produced by well-known commercial search sites like Google, Bing
or Yahoo Search [2], we opted for some less popular options.
2.1 Design
For our study, we used a three-way, mixed design (see Table 1). The
repeated measures independent variables were as follows. First, the
search site (with ve levels:
DuckDuckGo
”,
Gibiru
”,
SearchEncrypt
”,
StartPage
”,
Yandex
”) was controlled by using one of the pre-selected
commercial search sites to complete the task. Second, the response
latency was controlled by pre-computing ten dierent latency
values (measured in milliseconds:
𝑙(3)
:“337”,
𝑙(4)
:“506”,
𝑙(5)
:“759”,
𝑙(6)
:“1139”,
𝑙(7)
:“1709”,
𝑙(8)
:“2563”,
𝑙(9)
:“3844”,
𝑙(10)
:“5767”,
𝑙(11)
:“8650”,
𝑙(12)
:“12975”). Considering that human perception is not linear but
exponential (see Weber-Fechner Law [
21
,
29
,
33
]), we increased the
latency values super-linearly by applying the formula
𝑙(𝑥)=1.5𝑥×100 (ms), 𝑥 [3,12](1)
The resulting values were then split into two sets of ve levels each:
{
𝑙(3)
1
,
𝑙(5)
2
,
𝑙(7)
3
,
𝑙(9)
4
,
𝑙(11)
5
} and {
𝑙(4)
1
,
𝑙(6)
2
,
𝑙(8)
3
,
𝑙(10)
4
,
𝑙(12)
5
}. Because
of the relatively large number of trials, we allocated half of our
participants to the rst set and the other half to the second set. This
way, we tested the eect of a wider range of response latencies and
reduced the eort needed for completing the study.
One may argue that such latency values are not informed by
realistic time delays produced by any commercial search engine.
This may apply to a desktop setting, where the main network la-
tency components in a web search scenario are the network latency,
search engine latency, and browser latency [
2
]. However, mobile
web browsing has dierent bottlenecks and resource constraints.
Mobile devices are scaled down versions of desktops: they have
limited CPU, poorer network capacities, and lower memory. Hence,
it is not unlikely to observe latencies in the range of 5-10 seconds
[
6
,
9
,
20
,
24
,
28
] due to the poor performance of the cellular net-
work, the slower computational speeds, or other reasons, making
it imperative to consider them in our analysis.
2The study was reviewed and approved by a team of legal experts.
Repeated measures
Websites DuckDuckGo, Gibiru, SearchEncrypt, StartPage, Yandex
Latency (ms) 338, 759, 1709, 3844, 8650 506, 1139, 2563, 5767, 12975
Between-group
Expectation Anticipate connectivity issue Expect usual internet speed
Table 1: Variables of Study Design.
Also, considering the prior work on the eect of ‘anticipated
time’ on users’ tolerance [
7
], we introduced a between-groups in-
dependent variable i.e. participants’ prior expectations about the
mobile network performance (two levels: “anticipate connectivity
issues”, “expect usual internet speed”). To control this variable, we
informed half of our participants that we previously experienced
issues with the internet speed, so that they may expect connectivity
issues. The dependent variable was the subjective user experience,
as captured through self-reports.
2.2 Apparatus
2.2.1 Latency. We used a custom-made mobile phone app to ac-
cess a starting web page that listed the ve search sites used in our
study. A hidden settings menu allowed the experimenter to set the
articially-induced latency before each search task. Following the
denition of user-perceived latency proposed in [
2
], we considered
as page load time the time dierence between the rendering of the
web page content in the user’s browser and “clicking“ on the corre-
sponding link. To keep the latency values xed, we used Android’s
WebView as browser and intercepted all the events where a link
was clicked. We then made the browser view element invisible and
replaced it with a text saying “loading...”. The browser was made
visible exactly after the time specied in the settings had expired.
We conducted the experiment in a location with fast and reliable
internet connection, so that the website would be fully loaded upon
the web view becoming visible again. Each participant used their
own mobile phone device.
Table 2: Example of a create search task scenario.
After the F1 season opened this year, your niece became really interested in soapbox
derby racing. Since her parents are both really busy, you’ve agreed to help her build
a F1 car so that she can enter a local race. The rst step is to gure out how to build
a car. Identify some basic designs that you might use and create a basic plan for
constructing the car.
2.2.2 Search Tasks. Among a number of relevant query collec-
tions [
3
,
8
], we used the search tasks proposed by Kelly et al. [
19
]
that analyzes and categorizes the tasks by their complexity. The
tasks follow Anderson and Krathwohl’s taxonomy of educational
objectives [
1
], which focuses on the cognitive process dimension
and includes six types of cognitive processes: remember,under-
stand,apply,analyze,evaluate and create (with increasing amounts
of cognitive eort). Furthermore, it spans across the domains of
health, commerce, entertainment, and science and technology. We
opted only for those tasks that fall under the create category (Ta-
ble 2), which require the searcher to nd and compile a list of items,
understand their dierences and make a recommendation at the
end of the search task. The reason for that is because the create
tasks were shown [
19
] to result in signicantly more search queries
(
𝑀=
4
.
85
, 𝑆𝐷 =
4
.
42), URLs visited (
𝑀=
14
.
43
, 𝑆𝐷 =
12
.
34) and
SERP clicks (
𝑀=
5
.
98
, 𝑆𝐷 =
5
.
02), compared the other ve types.
Hence, they were more likely to facilitate sucient exposure to
the latency stimuli and observe its eects on the subjective user
experience. Finally, we made minor adjustments to the content of
the search tasks to keep it regionally relevant (e.g., using the term
“F1” instead of “NASCAR” or “football” instead of “baseball”).
2.2.3 Experience Sampling. To quantify the impact of response
latency on our participants, we administered a questionnaire at
post-task. At rst, we asked them to rate to what extent they were
satised with the search results. This item served as a distractor, to
make it more credible that we were studying the eectiveness of
the search sites. Next, we prompted the participants to report their
feedback on eight Likert-scales that made sense within the context
of search sites: 1) three that inquired about the user’s aective state,
and 2) ve from the Questionnaire for User Interaction Satisfaction
(QUIS; [
17
]). More specically, we asked participants to rate their
subjective experience with the search sites on the axes “Bad–Good“,
“Tense–Calm”, “Tired–Awake”, “Terrible–Wonderful”, “Dicult–
Easy“, “Frustrating–Satisfying”, “Dull–Stimulating”, and “Rigid–
Flexible”. Each of these items used a 7-point Likert scale, where the
center response represents the neutral point.
2.3 Participants
There were 30 participants (female=14, male=16), aged from 24 to
45 (
𝑀=
35
.
7
, 𝑆𝐷 =
5
.
3) and free from any obvious physical or
sensory impairment. The participants were of mixed nationality
and were all procient with the English language.
2.4 Procedure
At the beginning of the study, the participants were asked to ll
out a demographics questionnaire. Then, they were told that the
purpose of the experiment was to assess the utility of several search
sites for answering complex tasks (without revealing its true pur-
pose). To this end, they would have to evaluate to what extent the
search sites allowed them to arrive to a satisfying answer, using
their own mobile phone devices. The study consisted of ve brief,
informational search tasks (Section 2.2.2), where participants were
presented with a predened scenario and were asked to freely issue
as many search queries they wanted, and examine as many search
results they needed, to address the search task objective. To control
for order eects, the search task assignment and the sequence of
articially-induced latencies were randomized and then altered
via the Latin-square method. This ensured that the search task or
the search site would not have a systematic eect on the outcome,
when grouping the results by the response latency values.
To respect participants’ time, we limited the time to answer each
question to 5 minutes. We asked the participants to consider only
websites and ignore video results, to avoid spending large parts
of the allocated time in watching videos, as this would limit their
exposure to the response latencies. We informed participants that
they might not always arrive to a satisfying conclusion during that
period of time. We also encouraged them to keep validating their
results in case they arrived to a satisfying answer sooner. The ra-
tionale was to keep the exposure to the experimental manipulation
comparable across all sessions. At the end of each search task, the
participants were informed about the true conditions of the study
and were asked to ll out a post-task questionnaire (Section 2.2.3)
Figure 1: Boxplot of the ‘Frustrating vs. Satisfactory’ Likert-
scale scores (17) depending on the latency level.
3 RESULTS & DISCUSSION
We employ two types of analysis. First, we use Friedman test
3
as
an omnibus test to detect the signicant dierences in the ratings
among the response latency conditions. We then model the rela-
tionship between the ratings and response latency using ordinal
regression. We elaborate on each analysis below.
The Friedman test was combined with the Games Howell post-
hoc test (with a Bonferroni correction) in order to conduct a pair-
wise comparison of the ratings for all response latency levels. Prior
to that, we tested and conrmed that our experimental control
of the search site (e.g.,
DuckDuckGo
“,
Gibiru
“) did not have any
adverse eect on participants’ ratings. A signicant eect of the
latency levels was identied for 5 out of the 8 subjective dimen-
sions that we investigated: the aective dimensions “Tense–Calm”
(
𝜒2
(4)=
7
.
600
, 𝑝 <.
0001) and “Tired–Awake” (
𝜒2
(4)=
12
.
90
, 𝑝 <
.
01) and the QUIS items “Terrible–Wonderful” (
𝜒2
(4)=
10
.
23
, 𝑝 <
.
05), “Frustrating–Satisfying” (
𝜒2
(4)=
14
.
82
, 𝑝 <.
01) and “Dull–
Stimulating” (
𝜒2
(4)=
13
.
36
, 𝑝 <.
01). Our post-hoc analysis sug-
gests that users demonstrate a considerable amount of tolerance
to response latency. For all ve subjective dimensions, we did not
observe any signicant dierences between response latency levels
𝑙1
(338 ms, 506 ms) to
𝑙4
(3844 ms, 5767 ms). However, the self-
reported ratings were signicantly lower for the highest response
latency level
𝑙5
(8650 ms, 12975 ms). For example, when examining
the “Frustrating–Satisfying” scale (Fig. 1), the reported ratings for
response latency
𝑙5
dropped signicantly compared to those of
𝑙1
,
𝑙2and 𝑙3.
Similar to [
4
], our self-report ndings indicate that users’ ability
to consciously perceive response latency degrades as we progress
towards smaller delays. Interestingly, for the mobile setting, the
user tolerance threshold for latency appears to be four times greater
(7-10 sec) to that reported in [
4
,
32
]. We presume that the observed
(high) tolerance could be related to factors such as keyboard layout,
typing speed, and greater query formation eort, and may have a
role in adjusting users’ expectations [
5
]. However, when exceeding
that threshold, users felt signicantly more tensed,tired,terrible,
frustrated and sluggish, all which contribute to a worse search ex-
perience. In addition, we did not nd any signicant dierences
on participants’ ratings between the search sites or the prior ex-
pectations we set about the mobile network performance. We note
3
Friedman test was used as the ratings include repeated measures and their distribution
did not seem normal.
Figure 2: Estimated probability of the ‘Frustrating vs. Satisfactory’ Likert-scale scores (
1
7
) depending on the latency. The red
markers represent the response latency values (log2 scale) examined in this study.
that we increased the response latency values exponentially to sim-
ulate more accurately human perception of time. In practice, this
means that the observed rating scores do not correspond to equally
distanced response latency values, as done in [4, 32].
Using ordinal regression
4
, we move beyond identifying the tol-
erance threshold and model how the ratings change as response
latency increases along the continuous time scale i.e. we estimate
the likelihood of each Likert-scale rating as a function of the re-
sponse latency. For every scale in our questionnaire we run the
model by taking the response latency and the prior expectation
as the independent variables. As we have repeated measures, the
random eect of the subjects is also accounted for in the model.
Variable Estimate Std. Err. z-value Pr(>|z|) Var. Std. Dev.
‘Bad vs. Good’
Latency -0.1284 0.0389 -3.2977 0.0009
Expectation -0.2927 0.3065 -0.9551 0.3395
Rand. Eects 0.0626 0.2502
‘Dicult vs. Easy’
Latency -0.1192 0.0389 -3.0633 0.0002
Expectation -0.3378 0.4282 -0.9551 0.4302
Rand. Eects 0.7140 0.8450
‘Frustrating vs. Satisfying’
Latency -0.1580 0.0400 -3.9488 0.0001
Expectation -0.3935 0.3232 -1.2176 0.2234
Rand. Eects 0.1338 0.3658
Table 3: Summary of ordinal regression analysis.
For all the scales, the regression model conrms a very gradual
decrease of the likelihood of observing positive ratings as the re-
sponse latency increases, and a gradual increase for the negative
ratings. Table 3 shows the results for the scales that seem more
relevant to web search tasks (the coecients of the response latency
for the remaining dimensions are signicant and fall within the
range of -.12 to -.15). The coecient of -0.15 indicates that the odds
of giving a lower rating (e.g., from rating 7 verses a rating below 7)
is multiplied by 0.15 for every second of latency increase. Similar
to our statistical analysis, the prior expectations about the mobile
network performance do not seem to have a signicant eect. In
Fig. 2 we transform the odds ratio to probabilities of each rating and
plot the tendency along the time scale, for the scale “Frustrating vs.
Satisfying”. For example, the 6th column of Fig. 2 shows that the
dierence of the likelihood of rating 6 between the smallest and the
largest latency is only about 20%. Such gradual tendencies of the
ratings demonstrate that users are less sensitive to small increases
of latency, which is consistent to our previous nding. However,
4We used the R package textttclmm2 [10].
we also note that the likelihood of rating 1 (most negative) grows
superlinearly (e.g., while the dierence of the likelihood between 2”
and 4” is around 2%, that between 10” and 12” is around 5%), imply-
ing that users’ tolerance would quickly drop at a certain response
latency limit.
4 CONCLUSIONS
Nowadays, web trac is higher on mobile and people use mostly
mobile phones for their everyday tasks. However, the page load
performance on mobile devices does not match up to its importance:
mobile page load times have been found to be an order of magnitude
slower [
6
,
9
,
20
,
24
,
28
] compared to loading pages on a desktop
browser, often taking 10s of seconds to load the landing page. With
this in mind, we implemented a study where we examined the
interaction between response latency and users’ subjective search
experience in the mobile setting, and revisited previous work in
light of today’s adjusted expectations.
Interestingly, our analysis indicates that users are four times
more tolerant to known thresholds (1139-1709 ms) reported for
the desktop setting, although the two cohorts of studies are not
directly comparable. Through modelling the relationship between
response latency and web search experience, we also show that the
latter degrades quickly when the threshold of 7-10 sec is exceeded,
which may explain the high % of churn and revenue loss reported
by several commercial search sites. This nding has several impli-
cations for web sites and browser vendors. First, it suggests that
optimizations may not improve mobile and desktop web search
equally, due to dierences in users’ expectations. Second, this toler-
ance zone creates an opportunity for mobile browsing: web page
content may be served to each user at adjusted latencies, provided
that no degradation in the user experience is expected. This setup
requires less hardware resources on the search engine side, but also
compensates for the limited CPU, poorer network capacities, and
lower memory of mobile devices. Also, unlike [
7
], we did not ob-
serve similar eects due to the prior expectations about the mobile
network performance, for our setting.
Last, our work comes with certain limitations. For example, our
analysis is bounded by the power of the self-report tools we used
and, as shown in Barreda-Ángeles et al
. [4]
, we cannot entirely
discount the possibility that there are sizeable unconscious eects.
In addition, our study involved a relatively small sample; we reserve
the replication of these ndings with a larger sample for future
work. Furthermore, limiting the search task time to ve minutes
may have introduced a confounding factor of stress, which might
have aected user behaviour. Finally, our ndings relate to the
complexity level of our search tasks; we intend to investigate the
eects of response latency for other types of search tasks, and also
of simpler nature.
REFERENCES
[1]
L. W. Anderson and D. A. Krathwohl. 1999. A taxonomy for learning, teaching and
assessing: A revision of Bloom’s taxonomy of educational objectives. New York.
[2]
Xiao Bai, Ioannis Arapakis, B. Barla Cambazoglu, and Ana Freire. 2017. Under-
standing and Leveraging the Impact of Response Latency on User Behaviour in
Web Search. ACM Trans. Inf. Syst. 36, 2, Article 21 (2017), 42 pages.
[3]
Peter Bailey, Alistair Moat, Falk Scholer, and Paul Thomas. 2016. UQV100: A
test collection with query variability. In Proceedings of the 39th International ACM
SIGIR conference on Research and Development in Information Retrieval. 725–728.
[4]
Miguel Barreda-Ángeles, Ioannis Arapakis, Xiao Bai, B. Barla Cambazoglu, and
Alexandre Pereda-Baños. 2015. Unconscious Physiological Eects of Search
Latency on Users and Their Click Behaviour. In Proc. 38th Int’l ACM SIGIR Conf.
on Research and Development in Information Retrieval. 203–212.
[5]
Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time Drives Inter-
action: Simulating Sessions in Diverse Searching Environments. In Proc. 35th Int’l
ACM SIGIR Conf. on Research and Development in Information Retrieval. 105–114.
[6]
Enrico Bocchi, Luca De Cicco, and Dario Rossi. 2016. Measuring the Quality
of Experience of Web Users. In Proceedings of the 2016 Workshop on QoE-Based
Analysis and Management of Data Communication Networks (Internet-QoE ’16).
Association for Computing Machinery, New York, NY, USA, 37–42.
[7]
Anna Bouch, Allan Kuchinsky, and Nina Bhatti. 2000. Quality is in the eye
of the beholder: meeting users’ requirements for Internet quality of service. In
Proceedings of the SIGCHI Conf. on Human Factors in Computing Systems. 297–304.
[8] Chris Buckley and Janet Walz. 1999. The TREC-8 query track. In TREC.
[9]
Michael Butkiewicz, Daimeng Wang, Zhe Wu, Harsha V. Madhyastha, and Vyas
Sekar. 2015. KLOTSKI: Reprioritizing Web Content to Improve User Experience
on Mobile Devices. In Proceedings of the 12th USENIX Conference on Networked Sys-
tems Design and Implementation (NSDI’15). USENIX Association, USA, 439–453.
[10]
Rune Haubo B Christensen. [n.d.]. A Tutorial on tting Cumulative Link Mixed
Models with clmm2 from the ordinal Package. ([n. d.]).
[11]
Jim Dabrowski and Ethan V. Munson. 2011. 40 years of searching for the best
computer system response time. Interacting with Computers 23, 5 (2011), 555–564.
[12]
Erica S. Davis and Donald A. Hantula. 2001. The eects of download delay on
performance and end-user satisfaction in an Internet tutorial. Computers in
Human Behavior 17, 3 (2001), 249–268.
[13]
Benedict G.C. Dellaert and Barbara E. Kahn. 1999. How tolerable is delay?:
Consumers’ evaluations of internet web sites after waiting. Journal of Interactive
Marketing 13, 1 (1999), 41–54.
[14]
Alan R. Dennis and Nolan J. Taylor. 2006. Information foraging on the web: The
eects of “acceptable” Internet delays on multi-page information search behavior.
Decision Support Systems 42, 2 (2006), 810–824.
[15]
Jake Brutlag Eric Schurman. [n.d.]. The User and Business Impact of Server
Delays, Additional Bytes, and HTTP Chunking in Web Search Presentation. https:
//conferences.oreilly.com/velocity/velocity2009/public/schedule/detail/8523
[16]
D. F. Galletta, R. Henry, S. McCoy, and P. Polak. 2004. Web Site Delays: How
Tolerant are Users? Journal of the Assoc. for Inf. Syst. 5, 1 (2004), 1–28.
[17]
P. Harper and Kent Norman. 1993. Improving user satisfaction: The questionnaire
for user interaction satisfaction version 5.5. (01 1993).
[18]
Julie A. Jacko, Andrew Sears, and Michael S. Borella. 2000. The eect of network
delay and media on user perceptions of web resources. Behaviour & Information
Technology 19, 6 (2000), 427–439.
[19]
Diane Kelly, Jaime Arguello, Ashlee Edwards, and Wan-ching Wu. 2015. Devel-
opment and Evaluation of Search Tasks for IIR Experiments Using a Cognitive
Complexity Framework. In Proc. 2015 Int’l Conf. on The Theory of Information
Retrieval. 101–110.
[20]
Conor Kelton, Jihoon Ryoo, Aruna Balasubramanian, and Samir R. Das. 2017.
Improving User Perceived Page Load Time Using Gaze. In Proceedings of the 14th
USENIX Conference on Networked Systems Design and Implementation (NSDI’17).
USENIX Association, USA, 545–559.
[21]
D. M. MacKay. 1963. Psychophysics of Perceived Intensity: A Theoretical Basis
for Fechner’s and Stevens’ Laws. Science 139, 3560 (1963), 1213–1216.
[22]
David Maxwell and Leif Azzopardi. 2014. Stuck in Trac: How Temporal Delays
Aect Search Behaviour. In Proc. 5th Information Interaction in ContextSymp osium.
155–164.
[23]
Fiona Fui-Hoon Nah. 2004. A study on tolerable waiting time: how long are Web
users willing to wait? Behaviour & Information Technology 23, 3 (2004), 153–163.
[24]
Javad Nejati and Aruna Balasubramanian. 2016. An In-Depth Study of Mobile
Browser Performance. In Proceedings of the 25th International Conference on World
Wide Web (WWW ’16). International World Wide Web Conferences Steering
Committee, 1305–1315.
[25]
Steve Olenski. [n.d.]. Why Brands Are Fighting Over Milliseconds.
https://www.forbes.com/sites/steveolenski/2016/11/10/why-brands-are-
ghting-over-milliseconds/#4bcecd314ad3
[26]
Judith Ramsay, Alessandro Barbesi, and Jenny Preece. 1998. A psychological
investigation of long retrieval times on the World Wide Web. Interacting with
Computers 10, 1 (1998), 77–86.
[27]
E. Schurman and J. Brutlag. 2009. Performance related changes and their user
impact. In Velocity Web Performance and Operations Conf.
[28]
Ashiwan Sivakumar, Shankaranarayanan Puzhavakath Narayanan, Vijay
Gopalakrishnan, Seungjoon Lee, Sanjay Rao, and Subhabrata Sen. 2014. PARCEL:
Proxy Assisted BRowsing in Cellular Networks for Energy and Latency Reduc-
tion. In Proceedings of the 10th ACM International on Conference on Emerging
Networking Experiments and Technologies (CoNEXT ’14). Association for Comput-
ing Machinery, New York, NY, USA, 325–336.
[29]
John Staddon. 1978. Theory of behavioral power functions. Psychological Review
85 (07 1978), 305–320.
[30]
StatCounter Global Stats. [n.d.]. Desktop vs Mobile vs Tablet Market Share World-
wide. https://gs.statcounter.com/platform-market-share/desktop- mobile-tablet
[31]
Nolan J. Taylor, Alan R. Dennis, and Je W. Cummings. 2013. Situation normality
and the shape of search: The eects of time delays and information presentation
on search behavior. Journal of the American Society for Information Science and
Technology 64, 5 (2013), 909–928.
[32]
Jaime Teevan, Kevyn Collins-Thompson, Ryen W. White, Susan T. Dumais, and
Yubin Kim. 2013. Slow Search: Information Retrieval Without Time Constraints.
In Proc. Symposium on Human-Computer Interaction and Information Retrieval.
Article 1, 10 pages.
[33]
Peter A. van der Helm. 2010. Weber-Fechner behavior in symmetry perception?
Attention, Perception, & Psychophysics 72, 7 (2010), 1854–1864.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Mobile page load times are an order of magnitude slower compared to non-mobile pages. It is not clear what causes the poor performance: the slower network, the slower computational speeds, or other reasons. Further, most Web optimizations are designed for non-mobile browsers and do not translate well to the mobile browser. Towards understanding mobile Web page load times, in this paper we: (1) perform an in-depth pairwise comparison of loading a page on a mobile versus a non-mobile browser, and (2) characterize the bottlenecks in the mobile browser {\em vis-a-vis} non-mobile browsers. To this end, we build a testbed that allows us to directly compare the low-level page load activities and bottlenecks when loading a page on a mobile versus a non-mobile browser. We find that computation is the main bottleneck when loading a page on mobile browsers. This is in contrast to non-mobile browsers where networking is the main bottleneck. We also find that the composition of the critical path during page load is different when loading pages on the mobile versus the non-mobile browser. A key takeaway of our work is that we need to fundamentally rethink optimizations for mobile browsers.
Conference Paper
Full-text available
Understanding the impact of a search system's response latency on its users' searching behaviour has been recently an active research topic in the information retrieval and human-computer interaction areas. Along the same line, this paper focuses on the user impact of search latency and makes the following two contributions. First, through a controlled experiment, we reveal the physiological effects of response latency on users and show that these effects are present even at small increases in response latency. We compare these effects with the information gathered from self-reports and show that they capture the nuanced attentional and emotional reactions to latency much better. Second, we carry out a large-scale analysis using a web search query log obtained from Yahoo to understand the change in the way users engage with a web search engine under varying levels of increasing response latency. In particular, we analyse the change in the click behaviour of users when they are subject to increasing response latency and reveal significant behavioural differences.
Article
The interplay between the response latency of web search systems and users’ search experience has only recently started to attract research attention, despite the important implications of response latency on monetisation of such systems. In this work, we carry out two complementary studies to investigate the impact of response latency on users’ searching behaviour in web search engines. We first conduct a controlled user study to investigate the sensitivity of users to increasing delays in response latency. This study shows that the users of a fast search system are more sensitive to delays than the users of a slow search system. Moreover, the study finds that users are more likely to notice the response latency delays beyond a certain latency threshold, their search experience potentially being affected. We then analyse a large number of search queries obtained from Yahoo Web Search to investigate the impact of response latency on users’ click behaviour. This analysis demonstrates the significant change in click behaviour as the response latency increases. We also find that certain user, context, and query attributes play a role in the way increasing response latency affects the click behaviour. To demonstrate a possible use case for our findings, we devise a machine-learning framework that leverages the latency impact, together with other features, to predict whether a user will issue any clicks on web search results. As a further extension of this use case, we investigate whether this machine-learning framework can be exploited to help search engines reduce their energy consumption during query processing.
Conference Paper
Measuring quality of Web users experience (WebQoE) faces the following trade-off. On the one hand, current practice is to resort to metrics, such as the document completion time (onLoad), that are simple to measure though knowingly inaccurate. On the other hand, there are metrics, like Google’s SpeedIndex, that are better correlated with the actual user experience, but are quite complex to evaluate and, as such, relegated to lab experiments. In this paper, we first provide a comprehensive state of the art on the metrics and tools available for WebQoE assessment. We then apply these metrics to a representative dataset (the Alexa top-100 webpages) to better illustrate their similarities, differences, advantages and limitations. We next introduce novel metrics, inspired by Google’s SpeedIndex, that (i) offer significant advantage in terms of computational complexity, (ii) while maintaining a high correlation with the SpeedIndex at the same time. These properties makes our proposed metrics highly relevant and of practical use.
Conference Paper
We describe the UQV100 test collection, designed to incorporate variability from users. Information need ?backstories? were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, and provide the queries they would use; plus effort estimates of how many useful documents they would have to read to satisfy the need. A total of 10,835 queries were collected from 263 workers. After normalization and spell-correction, 5,764 unique variations remained; these were then used to construct a document pool via Indri-BM25 over the ClueWeb12-B corpus. Qualified crowd workers made relevance judgments relative to the backstories, using a relevance scale similar to the original TREC approach; first to a pool depth of ten per query, then deeper on a set of targeted documents. The backstories, query variations, normalized and spell-corrected queries, effort estimates, run outputs, and relevance judgments are made available collectively as the UQV100 test collection. We also make available the judging guidelines and the gold hits we used for crowd-worker qualification and spam detection. We believe this test collection will unlock new opportunities for novel investigations and analysis, including for problems such as task-intent retrieval performance and consistency (independent of query variation), query clustering, query difficulty prediction, and relevance feedback, among others.
Conference Paper
One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.
Article
http://cdn.oreillystatic.com/en/assets/1/event/29/The%20User%20and%20Business%20Impact%20of%20Server%20Delays,%20Additional%20Bytes,%20and%20HTTP%20Chunking%20in%20Web%20Search%20Presentation.pptx
Article
It is shown by example how a cumulative link mixed model is fitted with the clmm2 function in package ordinal. Model interpretation and inference is briefly discussed. A tutorial for the more recent clmm function is work in progress. We will consider the data on the bitterness of wine from Randall (1989) presented in Table 1 and available as the object wine in package ordinal. The data were also analyzed with mixed effects models by Tutz and Hennevogl (1996). The following gives an impression of the wine data object:> data(wine)> head(wine) response rating temp contact bottle judge 1 36 2 cold no 1 1 2 48 3 cold no 2 1 3 47 3 cold yes 3 1 4 67 4 cold yes 4 1 5 77 4 warm no 5 1 6 60 4 warm no 6 1> str(wine) ✬data.frame✬: 72 obs. of 6 variables:
Article
Real life information retrieval takes place in sessions, where users search by iterating between various cognitive, perceptual and motor subtasks through an interactive interface. The sessions may follow diverse strategies, which, together with the interface characteristics, affect user effort (cost), experience and session effectiveness. In this paper we propose a pragmatic evaluation approach based on scenarios with explicit subtask costs. We study the limits of effectiveness of diverse interactive searching strategies in two searching environments (the scenarios) under overall cost constraints. This is based on a comprehensive simulation of 20 million sessions in each scenario. We analyze the effectiveness of the session strategies over time, and the properties of the most and the least effective sessions in each case. Furthermore, we will also contrast the proposed evaluation approach with the traditional one, rank based evaluation, and show how the latter may hide essential factors that affect users' performance and satisfaction - and gives even counter-intuitive results.
Article
The present study investigated the effects of a variable unique to Internet-based learning; download delay of instructional materials. A simulated online teaching tool was created to measure the effects of download delay of images on test performance, time spent on the material, end-user satisfaction, and perceived effectiveness. Overall, download delay of instructional material had mixed effects on all four dependent variables. The results from this study appear to be the first empirical evidence that the effects of download delay are not linear, rather they are moderated by academic experience of the subject and difficulty of the material.