ArticlePDF Available

Best practices: Two Web-browser-based methods for stimulus presentation in behavioral experiments with high-resolution timing requirements

Authors:

Abstract

The Web is a prominent platform for behavioral experiments, for many reasons (relative simplicity, ubiquity, and accessibility, among others). Over the last few years, many behavioral and social scientists have conducted Internet-based experiments using standard web technologies, both in native JavaScript and using research-oriented frameworks. At the same time, vendors of widely used web browsers have been working hard to improve the performance of their software. However, the goals of browser vendors do not always coincide with behavioral researchers’ needs. Whereas vendors want high-performance browsers to respond almost instantly and to trade off accuracy for speed, researchers have the opposite trade-off goal, wanting their browser-based experiments to exactly match the experimental design and procedure. In this article, we review and test some of the best practices suggested by web-browser vendors, based on the features provided by new web standards, in order to optimize animations for browser-based behavioral experiments with high-resolution timing requirements. Using specialized hardware, we conducted four studies to determine the accuracy and precision of two different methods. The results using CSS animations in web browsers (Method 1) with GPU acceleration turned off showed biases that depend on the combination of browser and operating system. The results of tests on the latest versions of GPU-accelerated web browsers showed no frame loss in CSS animations. The same happened in many, but not all, of the tests conducted using requestAnimationFrame (Method 2) instead of CSS animations. Unbeknownst to many researchers, vendors of web browsers implement complex technologies that result in reduced quality of timing. Therefore, behavioral researchers interested in timing-dependent procedures should be cautious when developing browser-based experiments and should test the accuracy and precision of the whole experimental setup (web application, web browser, operating system, and hardware).
Best practices: Two Web-browser-based methods for stimulus
presentation in behavioral experiments with high-resolution timing
requirements
Pablo Garaizar
1
&Ulf-Dietrich Reips
2
#Psychonomic Society, Inc. 2018
Abstract
The Web is a prominent platform for behavioral experiments, for many reasons (relative simplicity, ubiquity, and accessibility,
among others). Over the last few years, many behavioral and social scientists have conducted Internet-based experiments using
standard web technologies, both in native JavaScript and using research-oriented frameworks. At the same time, vendors of
widely used web browsers have been working hard to improve the performance of their software. However, the goals of browser
vendors do not always coincide with behavioral researchersneeds. Whereas vendors want high-performance browsers to
respond almost instantly and to trade off accuracy for speed, researchers have the opposite trade-off goal, wanting their
browser-based experiments to exactly match the experimental design and procedure. In this article, we review and test some
of the best practices suggested by web-browser vendors, based on the features provided by new web standards, in order to
optimize animations for browser-based behavioral experiments with high-resolution timing requirements. Using specialized
hardware, we conducted four studies to determine the accuracy and precision of two different methods. The results using CSS
animations in web browsers (Method 1) with GPU acceleration turned off showed biases that depend on the combination of
browser and operating system. The results of tests on the latest versions of GPU-accelerated web browsers showed no frame loss
in CSS animations. The same happened in many, but not all, of the tests conducted using requestAnimationFrame (Method
2) instead of CSS animations. Unbeknownst to manyresearchers, vendors of web browsers implement complex technologies that
result in reduced quality of timing. Therefore, behavioral researchers interested in timing-dependent procedures should be
cautious when developing browser-based experiments and should test the accuracy and precision of the whole experimental
setup (web application, web browser, operating system, and hardware).
Keywords Web a ni ma tions .Experimental software .High-resolution timing .iScience .Browser
Shortly after its inception, the Web was demonstrated to be an
excellent environment to conduct behavioral experiments.
The first Internet-based experiments were conducted in the
mid-1990s, shortly after the World Wide Web had been
invented at CERN in Geneva (Musch & Reips, 2000; Reips,
2012). Conducting studies via the Internet is considered a
second revolution in behavioral and social research, after the
computer revolution in the late 1960s, and subsequently that
method has brought about many advantages over widely used
paper-and-pencil procedures (e.g., automated processes,
heightened precision). The Internet added interactivity via a
worldwide network and brought many benefits to research,
adding a third category to what had traditionally been seen
as a dichotomy between lab and field experiments (Honing
& Reips, 2008;Reips,2002). Although Internet-based exper-
iments have some inherent limitations, due to a lack of control
and the limits of technology, they also have a number of ad-
vantages over lab and field experiments (Birnbaum, 2004;
Reips, 2002;Schmidt,1997). Some of the main advantages
are that (1) it is possible to easily collect large behavioral data
sets (see, however, Wolfe, 2017, noting that this is actually not
happening as frequently as one would expect); (2) it is also
possible to recruit large heterogeneous samples and people
with rare characteristics (e.g., people suffering from
sexsomnia and their peers; Mangan & Reips, 2007)fromlo-
cations far away; and (3) after an initial investment, the meth-
od is more cost-effective, in terms of time, space, and labor,
*Pablo Garaizar
garaizar@deusto.es
1
University of Deusto, Bilbao, Spain
2
University of Konstanz, Konstanz, Germany
Behavior Research Methods
https://doi.org/10.3758/s13428-018-1126-4
than either lab or field research. As compared to paper-and-
pencil research, most of the advantages of computer-mediated
research applyfor example, the benefit that process vari-
ables (Bparadata^) can be recorded (Stieger & Reips, 2010).
Despite the numerous studies comparing web-based re-
search with laboratory research that have concluded that both
approaches work, there are still doubts about the capabilities
of web browsers for presenting and recording data accurately
(e.g., Schmidt, 2007). Early discussions (Reips, 2000,2007;
Schmidt, 1997) saw reaction time measurement in Internet-
based experimenting as possible, but clearly pointed out its
limitations. In fact, there is an open debate as to the lack of
temporal precision of experimentation based on computers as
a possible cause to explain the ongoing replication crisis
across the field of psychology (Plant, 2016).
On the other hand, several studies have provided web tech-
nology benchmarks (see van Steenbergen & Bocanegra, 2016,
for a comprehensive list) that help researchers figure out when
the timing of web-based experimentation is acceptable for the
chosen experimental paradigm. Moreover, notable efforts
have been made in recent years to simplify the development
and improve the accuracy of timing in web experiments using
standard web technologies based in research-oriented frame-
works including jsPsych (de Leeuw, 2015)orLab.js
(Henninger, Mertens, Shevchenko, & Hilbig, 2017).
At the same time, vendors of widely used web browsers
(Google Chrome, Mozilla Firefox, Apple Safari, and
Microsoft Edge, among others) have been working hard to
improve the performance of their software. However, there
are some important discrepancies between the goals of brows-
er vendors and behavioral researchers regarding the desired
features of an ideal web browser. Whereas browser vendors
try their best to provide a faster browser than their competitors
and have as their main goal to increase the responsiveness of
the web applications presented to the user, behavioral re-
searchers foremost need precision and accuracy when present-
ing stimuli and recording user input, and not necessarily
speed. Thus, browser vendors and researchers tend to be at
opposite ends of the desired speedaccuracy trade-off.
Fortunately, some of the technological advances that have
recently been developed in response to browser vendorsneeds
have turned out to be aligned with behavioral researchersneeds
as well. Modern web browsers are now provided with frame-
oriented animation timers (i.e., requestAnimationFrame), a com-
prehensive and accurate application programming interface for
audio (Web Audio API), and submillisecond-accurate input
events timestamps (DOMHighResTimeStamp). They are also
provided with submillisecond-accurate timing functions (i.e.,
window.performance.now) in several versions, but a new class
of timing attacks in modern CPUs (e.g., Spectre and Meltdown)
have forced web-browser vendors to reduce the precision of
these timing functions, either by rounding (Scholz, 2018)or
slightly randomizing the value returned (Kyöstilä, 2018). In
the case of Mozilla Firefox, this limitation can be disabled by
modifying the privacy.reduceTimerPrecision configuration
property, which has been enabled by default since version 59.
In the case of Google Chrome, developers decided to reduce the
resolution of performance.now() from 5 to 100 μsandtoadd
pseudorandom jitter on top.
To explain these new features to application developers,
web-browser vendors have written several best-practice
guidelines emphasizing the underlying concepts related to
web animations in terms of performance (Bamberg, 2018a,
2018b;Lewis,2018). In the next section, we will review those
best practices from a behavioral researchersperspective.
Best practices for animations in Web-based
experiments
It is important to understand that a browser-based experiment
can be conducted either offline (not via the Internet) or online
(on the Internet); see, for instance, Honing and Reips (2008)
or Reips (2012). Even in web-technology-based experiments
conducted offline, it is necessary for accurate timing to load
the experiments assets (images, styles, audio, video, etc.) in a
participants browser before the experiment starts. Once load-
ed, the assets will be ready to be rendered by the browser. In
this section, we will analyze these two tasks from the perspec-
tive of a behavioral researcher.
Best practices for loading assets
For controlled timing, web browsers need to download all the
assets, including any media, referenced in the HTML docu-
ment that describes a web page before running it. In most
cases, preloading delays the time until the user can interact
with the web page, so reducing download time becomes a
priority. Consequently, browser vendors are defining new
standards to eliminate unnecessary asset downloads, optimize
file formats, or cache assets, among others (see HTTP/2 spec-
ification for details; Belshe, Peon, & Thomson, 2015).
However, from a behavioral researcher perspective, there is
no such need for speedy downloading or blocking of web
assets. In most experiments, researchers have to explain to
participants how to proceed, get their informed consent and
maybe gather some socio-demographic information. This
preexperimental time can be used to download large assets
in the background. Even in the unlikely case that participants
have read the instructions and filled all required information
before all the assets are downloaded, asking them to wait until
the experiment is ready to be conducted is not a serious prob-
lem. However, not predownloading all assets needed to com-
pose an experiments stimuli before it is presented to the par-
ticipant can cause serious methodological issues.
Behav Res
There are several techniques to preload web assets. In the
past, web developers used CSS tricks like fetching images as
background images of web components placed outside the
boundaries of the web page or set as hidden. Currently, the
rel=Bpreload^property of the link element in the header of the
HTML document should be the preferred way to preload web
assets (Grigorik & Weiss, 2018). This method should not be
confused with <link rel=Bprefetch^>. The Bprefetch^directive
asks the browser to fetch a resource that will probably be
needed for the next navigation. Therefore, the resource will
be fetched with extremely low priority. Conversely, the
Bpreload^directive tells the web browser to fetch the web
asset as soon as possible because it will be needed in the
current navigation.
Alternatively, web developers can preload images (or other
web assets) creating them from scratch in JavaScript. In
Listing 1, we provide an example script of how to create a
set of images and wait until it has been completely
downloaded in a JavaScript web application relying on the
Bonload^event of the images. This method works in most
cases, but there are some issues related to Bonload^: Events
not properly being fired have been reported in previous ver-
sions of widely used web browsers (e.g., Google Chrome v50)
and would affect cached images. For this reason, in Listing 2,
we provide a new script of how to actively wait until a set of
images has been completely downloaded in a JavaScript web
application that does not rely on the Bonload^event to deter-
mine whether the image has been completely downloaded and
ready to be displayed or not. These examples can be easily
adapted for other kinds of assets (audio, video) if needed, by
querying web elementsproperties (e.g., in the case of video
assets, the readyState property).
Best practices for rendering web pages
Once the assets needed to conduct the experiment have been
downloaded, the browser is ready to show the experiment.
Showing ormore technically speakingrendering a web
application implies a sequence of tasks that web browsers
have to accomplish in the following order: (1) JavaScript/
CSS (cascading style sheets), (2) style, (3) layout, (4) paint,
and (5) composite. Understanding all of them is crucial to
develop accurate and precise animations for web-based be-
havioral experiments.
However, rendering is only one of the steps web browsers
take when executing a web application, a simple form of which
is a web page. Web applications run in an execution environ-
ment that comprises several important components: (1) a
var images = [],
total = 24,
loaded = 0;
for (var i = 0; i < total; i++) {
images.push(new Image());
images[i].addEventListener('load', function() {
loaded++;
if (loaded == total) {
startExperiment();
}
}, false);
images[i].src = 'img/numbers/'+(i+1)+'.png';
}
Listing 1 JavaScript code to preload a set of images and use the onload event to check that all of them have been downloaded before the experiment
begins
function isImageLoaded (img) {
if (!img.complete) { return false; }
if (typeof img.naturalWidth != "undefined" && img.naturalWidth == 0)
{ return false; }
return true;
}
function checkLoad () {
for (var i = 0; i < images.length; i++) {
if (!isImageLoaded(images[i])) {
setTimeout(checkLoad, 50);
return false;
}
}
startExperiment();
return true;
}
Listing 2 JavaScript code to test whether a set of images has been downloaded before the experiment begins by not relying on the onload event
Behav Res
JavaScript execution context shared with all the scripts referred
in the web application, (2) a browsing context (useful to man-
age the browsing history), (3) an event loop (described later),
and (4) an HTML document, among other components. The
event loop orchestrates what JavaScript code will be executed
and when to run it, manages user interaction and networking,
renders the document, and performs other minor tasks (Mozilla,
2018;WHATWG2018).Theremustbeatmostoneeventloop
per related similar-origin browsing contexts (i.e., different web
applications running on the same web browser do not share
event loops, each one has its own event loop).
The event loop uses different task queues (i.e., ordered lists
of tasks) to manage its duties: (1) events queue: for managing
user-interface events; (2) parser queue: for parsing HTML; (3)
callbacks queue: for managing asynchronous callbacks (e.g.,
via setTimeout or requestIdleTask timers); (4) resources
queue: for fetching web resources (e.g., images) asynchro-
nously; and (5) document manipulation queue: for reacting
when an element is modified in the web document. During
the whole execution of the web application, the event loop
waits until there is a task in its queues to be processed.
Then, it selects the oldest task on one of the event loopstask
queues and runs it. After that, the event loop updates the ren-
dering of the web application.
Browsers begin the rendering process by interpreting
the JavaScript/CSS code that web developers have coded
to make visual changes in the web page. In some cases,
these visual changes are controlled by a JavaScript code
snippet, whereas in others CSS animations are used to
change the properties of web elements dynamically (JS/
CSS phase). This phase involves (in this order): (1)
dispatching pending user-interface events, (2) running
the resize and scroll steps for the web page, (3) running
CSS animations and sending corresponding events (e.g.,
Banimationend^), (4) running full-screen rendering steps,
and (5) running the animation frame callbacks for the web
page. Once the browser knows what must be done, it
figures out which CSS rules it needs to apply to which
web element and the compounded styles are applied to
each element (style phase). Then, the browser is able to
calculate how much space each element will take on the
screen to create the web page layout (layout phase). This
enables the browser to paint the actual pixels of every
visual part (text, colors, images, borders, shadows) of
the elements (paint phase). Modern browsers are able to
paint several overlapping layers independently for in-
creasing performance. These overlapping layers have to
be drawn in the correct order to render the web page
properly (composite phase).
Considering this rendering sequence as a pipeline, any
change made in one of the phases implies recalculating
the following phases. Therefore, developing web anima-
tions that only require composite changes prevents the
execution of previous phases. In addition to this general
recommendation, some other details should be taken into
account in each phase.
Taking into account the underlying technologies men-
tioned before, Web experiments should rely on CSS ani-
mations whenever suitable in the JavaScript/CSS phase,
for several reasons. First, they do not need JavaScript to
be executed and therefore do not add a new task to the
queues to be executed by the event loop. This not only
reduces the number of tasks that has to be executed, but
also increments the likelihood that input events (i.e., user
responses in the case of a web experiment) are dispatched
as fast as they occur. Second, if the web browser is able to
use GPU-accelerated rendering, some CSS animations can
be managed asynchronously by the browsersGPUpro-
cess, resulting in a performance boost.
However, not all web experiment animations can be de-
fined declaratively using CSS. For the cases in which the
animations needed to present stimuli rely on JavaScript,
avoiding standard timers (i.e., setTimeout, setInterval) in favor
of the requestAnimationFrame timer is a must: Standard
timers are not synchronized with the frame painting process
and can lead to accumulative timing errors in web animations
(e.g., it is impossible to be in sync with a display at 60 Hz
16.667 ms per frame using standard timers, because setting a
16-ms interval is too short and a 17-ms interval is too long),
whereas requestAnimationFrame was designed to be in per-
fect sync with the frame rate. Moreover, using
requestAnimationFrame in web experiments enables re-
searchers to implement frame counting in order to achieve
single-frame accuracy in most cases (Barnhoorn, Haasnoot,
Bocanegra, & van Steenbergen, 2015). Nevertheless, being
aware of the time needed by the browser to calculate every
frame of the animation is crucial. At this point, we should
consider that JavaScripts call stack is single-threaded, syn-
chronous, and nonblocking. This means that only one piece
of JavaScript code can be executed at a time in a browsing
context; there is no task switching (tasks are carried out to
completion); and web browsers still accept events even
though they might not be dispatched immediately. In such
an execution environment, requestAnimationFrame-based an-
imationsJavaScript code must compete with the rest of
JavaScript tasks waiting for the single execution thread.
Fortunately, newer versions of common browsers allow web
programmers to trace these times in detail using their web
developer toolkits, reducing the problem substantially.
The style phase can be optimized reducing the complexity
of the style sheets (i.e., complexity of selectors, number of
elements implied, or hierarchy of the elements affected by a
style change). Some tools (e.g., unused CSS) can significantly
reduce the complexity of style sheets.
Avoiding layout changes within a loop is the best recom-
mendation regarding the layout phase, because it implies the
Behav Res
calculation of lots of layouts that will be discarded immedi-
ately (also known as Blayout thrashing^). Another important
recommendation for this phase is to apply animations to ele-
ments that are position fixed or absolute because it is much
easier for the browser to calculate layout changes in those
cases.
Painting is often the most expensive phase of the pipeline.
Therefore, the recommendation here is to avoid or reduce
painting areas as much as possible. This can be done by dif-
ferent means: using layers, transforming opacity of Web ele-
ments, or modifying hidden elements. Finally, the recommen-
dation for the composite phase is to stick to transformations
(position, scale, rotation, skew, matrix) and opacity changes
for the experiments animations to maximize the likelihood of
being managed asynchronously by the GPU process of the
browser.
To validate these best practices, we prepared a set of exper-
iments in which we (1) preloaded all assets before an experi-
ment begins, (2) used CSS animations to control the experi-
ments animations, (3) tried to minimize layout changes, (4)
tried to reduce painting areas, and (5) tried to stick to opacity
changes in animations. In the study presented in the next sec-
tion, we tested the accuracy and precision of the animations
used in these experiments.
Study 1
The goal of the present study was to test the accuracy and
precision of the animations used in a set of experiments that
would try to follow the web-browser vendorsbest practices
explained above.
Method
Apparatus and materials Considering the potential inaccura-
cies that can take place when the same device is used to pres-
ent visual content and assess its timing, we decided to use an
external measurement system: the Black Box Toolkit
(BBTK), which is able to register the precise moment at which
the content is shown, with submillisecond accuracy (Plant,
Hammond, & Turner, 2004).
We installed Google Chrome 58 and Mozilla Firefox
54 web browsers on both Microsoft Windows 10 and
Ubuntu Linux 16.04.3 systems, on a laptop with an Intel
i56200-Uchipwith20GBofRAManda120-GBSSD
disk, not connected to the Internet and isolated from ex-
ternal sources of asynchronous events. In this setting, we
ran a web experiment application that showed an anima-
tion of visual items typical for many of the web experi-
ments that will be described below. This web application
uses CSS animations to control the presentation of the
stimuli. Each stimulus is placed in a different layer, and
the CSS animation controls which one is shown through
opacity changes of the layers. The stimuli consisted of 24
different images (i.e., the natural numbers from 1 to 24) in
which odd numbers were placed on a white background
and even numbers on a black background, to facilitate the
detection of changes by the photo-sensors of the BBTK.
The stimuli were preloaded by the experimental software
before the animation started. This set of stimuli and the
web application used in this study are publicly available
via the Open Science Framework: https://osf.io/h7erv/.
Listing 3shows the setSlideshow function of this web
experiment. In this function, a set of 24 images are
appended to the parent element in a for loop. Before this,
each image is properly configured: (1) opacity is set to
zero (invisible) and BwillChange^property is set to
Bopacity,^to inform the web browser that this property
will change during the animation; (2) a fixed position is
set, in order to prevent reflows of the web document; (3) a
CSS animation is configuredan Binterval^argument de-
fines the duration of the animation, the Bsteps (1, end)^
function defines a nonprogressive (= immediate) change
of opacity, and status is set to paused; (4) CSS animation
events are defined (Banimationstart,^Banimationend^)to
log the onset and offset times of the stimuli. Then, all the
animations of the images are changed to the Brunning^
state.
Procedure For each combination of web browser (Google
Chrome, Mozilla Firefox) and operating system (MS
Windows, GNU/Linux), we tested the same web experiment,
which presented a slideshow of the first 24 natural numbers.
Each number was presented during a short interval before the
next one was presented. We tested this series of stimuli with
different presentation intervals for each stimulus: 500, 200,
100, and 50 ms, which correspond to the duration of 30, 12,
six, and three frames in a 60-Hz display. Considering that all
the tests were conducted using this refresh rate, the subframe
deviations of the intervals measured by the photo-sensors of
the BBTK (e.g., 51.344 ms instead of 50 ms) were caused by
difficulties of the LCD displays with handling abrupt changes
of luminosity, and not by the series of stimuli tested.
Therefore, we converted all durations of the stimulus presen-
tations from milliseconds to frames because the main purpose
of this study was to assess the accuracy of the web presenta-
tion software, not the hardware. To reduce the effect of un-
foreseen sources of delays, we tested each configuration three
times.
Results
The results of the tests conducted on Google Chrome are
shown in Table 1. Each cell in the table represents the number
of Bshort^or Blong^frames during each test (a presentation of
Behav Res
the 24 stimuli). Surprisingly, there is a noticeable difference
between the GNU/Linux and MS Windows setups. The web
application tested works flawlessly on Google Chrome under
GNU/Linux at all intervals, whereas the same web application
presents an unacceptable number of lost frames under MS
Windows. We call those frames Bshort^that were presented
before they were expected, and those Blong^that were pre-
sented after they were expected (note that in many cases, the
sum of short and long frames is near zero, because CSS ani-
mations tend to interpolate all missing frames in an animation,
making the animation last as long as expected). As happens
with experimental software such as E-Prime, with its event
mode timing and cumulative mode timing (Schneider,
Eschman, & Zuccolotto, 2012), researchers can decide how
their experiment will behave when an unexpected delay oc-
curs. In the first case (event mode timing), the delay will
cause a stimulus to be displayed longer than expected.
This will not affect the duration of the next stimulus,
but will affect the total duration of the animation. In the
second case (cumulative mode timing), the delay will
function setSlideshow (element, start, interval) {
element.style.backgroundImage = 'none';
for (var i = 0; i < total; i++) {
images[i].style.opacity = 0;
images[i].style.willChange = 'opacity';
images[i].style.position = 'fixed';
images[i].style.top = 100;
images[i].style.left = 100;
images[i].style['animation'] = 'show '+interval+'ms steps(1,end) '+(start
+ i*interval)+'ms 1 normal none paused';
images[i].addEventListener('animationstart', function (event) {
console.log('Start at: ' + event.elapsedTime + ' ' + event.timeStamp);
}, false);
images[i].addEventListener('animationend', function (event) {
console.log('End at: ' + event.elapsedTime + ' ' + event.timeStamp);
element.style.backgroundImage = 'none';
}, false);
element.appendChild(images[i]);
}
for (var i = 0; i < total; i++) {
images[i].style['animation-play-state'] = 'running';
}
}
Listing 3 JavaScript code to configure a slideshow using opacity changes through CSS animations on a set of images
Table 1 Study 1: Short/long frames using CSS animations and opacity changes between layers on Google Chrome 58 and Mozilla Firefox 54
Tes t N30 Frames 12 Frames 6 Frames 3 Frames
Short Long Short Long Short Long Short Long
Google Chrome 58 Windows 1 24 10 10 13 12 13 12 12 11
22455 12 11 66 10 10
32411 10 12 11 12 11 11 10
Linux 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Mozilla Firefox 54 Windows 1 24 0 1 0 0 0 0 0 0
22400000000
32401000000
Linux 1 24 66 55 55 55
22455 75 31 44
32444 67 42 54
Behav Res
cause one stimulus to be displayed longer, whereas the
next will be displayed a shorter time than expected, to
make the whole animation meet its duration requirements.
In the tests presented in this study, CSS animations work
like E-Primes cumulative mode timing. However, it is
possible to use CSS animations to develop experiments
that work in event mode timing by replacing the 24-
keyframeanimationusedherewith24animationsofone
keyframe to be launched successively once the
Banimationend^event of the previous animation is
triggered.
The results of the tests conducted on Mozilla Firefox
arealsoshowninTable1. There is also a noticeable
difference between GNU/Linux and MS Windows,
butsurprisinglyin the opposite diection from the dif-
ference we found running these tests with Google
Chrome. Therefore, the tested technique (i.e., layer
opacity changes through CSS animations) cannot be
used as a reliable way to generate web experiments with
accurate and precise stimulus presentations in any
multiplatform environment. Consequently, we developed
a new web application for test purposes, based on a
slightly different approach.
Study 2
The goal of Study 2 was to find a good combination of
best practices in the development of web animations
that would be suitable to present stimuli in an accurate
andprecisewayonbothGoogleChrome58and
Mozilla Firefox 54 under MS Windows and GNU/
Linux operating systems.
In this new web application we also used CSS animations
to control the sequence of stimuli, but instead of creating the
slideshow by placing each stimulus in a separate layer and
using opacity changes to show each of them, we placed all
stimuli in one large single image and used Bbackground
position^changes to show each of them. This big image con-
taining all of the stimuli, and the corresponding offsets to
show each of them can easily be generated using tools such
as Glue (https://github.com/jorgebastida/glue)orthrough
HTML/JavaScript features such as canvas API. Needless to
say, the image with all stimuli has to be preloaded before the
experiment begins.
Method
Apparatus, materials, and procedure As in Study 1, we ran
the same web application with different presentation in-
tervals for each stimulus (30, 12, six, and three frames)
three times for each stimulusbrowserOS combination,
on Google Chrome 58 and Mozilla Firefox 54 under
both GNU/Linux and MS Windows. The BBTKs
photo-sensor was attached to the display of the laptop
used in Study 1. The procedure was identical to that in
Study 1.
Listing 4shows how this web application defines the
slideshow of stimuli. First, the corresponding background po-
sition for each stimulus in the big picture is defined. Then the
keyframes of the animation are added to a string that will
contain the whole definition of the CSS animation. After that,
the CSS animation keyframes are included in the web docu-
mentsBslideshow^style sheet. Finally, the animation of the
parents element (i.e., the div box that will show all stimuli) is
configured to use the keyframes previously defined. For log-
ging purposes, the Banimationstart^and Banimationend^
event listeners log the starting and ending time stamps of the
slideshow.
Results
Tab le 2summarizes the results obtained for Google Chrome
58 using our test web application. As we can see, despite the
fact that some frames were presented too early in the three-
frame interval under MS Windows, this new approach
outperformed the previous one and was able to present stimuli
in an accurate and precise way in most cases. The same hap-
pened when running our tests in Mozilla Firefox 54. The web
application tested also showed some short frames under MS
Windows in the three-frame interval, and under GNU/Linux
in the 30-frame interval, but it was able to present stimuli
accurately and precisely in most cases.
Therefore, contrary to the best practices suggested by the
web-browser vendors for the development of web animations,
changing background position in an image with all stimuli
(which implies new paint and composite phases)
outperformed changing the opacity of layers (which implies
just redoing the composite phase) in this setup. Slideshows
based on background position changes work properly in both
Google Chrome 58 and Mozilla Firefox 54 under GNU/Linux
and MS Windows. However, to understand the unexpected
results from Study 1, we decided to conduct another study
as a replication with newer browser versions and forced
GPU acceleration.
Study 3
The goal of Study 3 was to find out whether the browser
versions used in Study 1 could have been the cause of the
unexpected results found. With this goal in mind, we repeated
all the tests using the same technique (layer opacity changes
through CSS animations) 10 months later, using the latest
versions of Google Chrome (v.66) and Mozilla Firefox
(v.59) available, under GNU/Linux and MS Windows.
Behav Res
Method
Apparatus, materials, and procedure We ran the same set of
stimuli with different presentation intervals for each stimu-
lus (30, 12, six, and three frames) three times for each
stimulusbrowserOS combination, on Google Chrome 66
and Mozilla Firefox 59, under both GNU/Linux and MS
Windows. The BBTKs photo-sensor was attached to the
display of the laptop used in Study 1. The procedure was
identical to that of Study 1.
function setSlideshow (element, start, interval) {
var rules = '',
percs = '',
images = [],
order = ['blank', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10',
'11', '12', '13', '14', '15', '16', '17', '18', '19', '20',
'21', '22', '23', '24', 'blank' ],
animationName = 'slideshow'+(new Date().getTime()),
stylesheet = document.getElementById('slideshow');
images['24'] = '{background-position:0px 0px;}';
images['23'] = '{background-position:0px -1440px;}';
images['22'] = '{background-position:-640px -1440px;}';
images['21'] = '{background-position:-1280px -1440px;}';
images['20'] = '{background-position:-1920px 0px;}';
images['19'] = '{background-position:-1920px -960px;}';
images['18'] = '{background-position:-1920px -1440px;}';
images['17'] = '{background-position:0px -1920px;}';
images['16'] = '{background-position:-640px -1920px;}';
images['15'] = '{background-position:-1280px -1920px;}';
images['14'] = '{background-position:-1920px -1920px;}';
images['13'] = '{background-position:-2560px 0px;}';
images['12'] = '{background-position:-2560px -480px;}';
images['11'] = '{background-position:-2560px -960px;}';
images['10'] = '{background-position:-2560px -1440px;}';
images['9'] = '{background-position:-640px 0px;}';
images['8'] = '{background-position:0px -480px;}';
images['7'] = '{background-position:-640px -480px;}';
images['6'] = '{background-position:-1280px 0px;}';
images['5'] = '{background-position:-1280px -480px;}';
images['4'] = '{background-position:0px -960px;}';
images['3'] = '{background-position:-640px -960px;}';
images['2'] = '{background-position:-1920px -480px;}';
images['1'] = '{background-position:-2560px -1920px;}';
images['blank'] = '{background-position:-1280px -960px;}';
for (var i = 0, len = order.length; i < len; i++) {
percs += (i*100/len) + '% ' + images[order[i]] + '\n';
}
rules += '@keyframes ' + animationName + ' {\n' + percs + '}\n';
stylesheet.innerHTML = rules;
element.style['animation'] = animationName + ' ' + (order.length *
interval) + 'ms steps(1) ' + start + 'ms 1 normal none paused';
element.addEventListener('animationstart', function (event) {
console.log('Start at: ' + event.elapsedTime + ' ' + event.timeStamp);
}, false);
element.addEventListener('animationend', function (event) {
console.log('End at: ' + event.elapsedTime + ' ' + event.timeStamp);
element.style.backgroundImage = 'none';
}, false);
element.style['animation-play-state'] = 'running';
}
Listing 4 JavaScript code to configure a slideshow using background position changes through CSS animations on a set of images
Behav Res
In addition to updating the versions of the web browsers,
we also configured them to force the use of GPU accelera-
tion. In the case of Google Chrome, we accessed the
chrome://flags URL in the address bar and enabled the
BOverride software rendering list.^option. Then we
relaunched Google Chrome and verified that GPU accelera-
tion was enabled by accessing the chrome://gpu URL. In the
case of Mozilla Firefox, we accessed the about:config URL
and changed the Blayers.acceleration.force-enabled^property
from Bfalse^to Btrue.^
Results
All tests conducted (24-stimulus animations with three-,
six-, 12-, and 30-frame durations, repeated three times)
resulted in no frame loss on Google Chrome 66 and
Mozilla Firefox 59 under MS Windows and GNU/
Linux. This was a significant improvement over the re-
sults obtained in Study 1.
On the basis of the results of Study 3, we could assume
that the poor results of Study 1 were due to the fact that the
configuration used did not ensure the use of GPU accelera-
tion in animations based on the change in opacity of the
layers. However, these results cannot distinguish whether
the web browser version update or the GPU acceleration
configuration caused the better performance of the tests. To
disentangle these possible causes, we repeated the tests that
had uncovered the timing problems in Study 1 (Mozilla
Firefox 54 under GNU/Linux and Google Chrome 58 under
MS Windows), but forced the use of GPU acceleration in
those configurations.
GPU-accelerated Mozilla Firefox 54 under GNU/
Linux performed accurately in the new tests (no frame
loss). However, GPU-accelerated Google Chrome 58 un-
der MS Windows still missed an unacceptable number
of frames in all tests (see Table 3for details).
Specifically, every stimulus presented on a white back-
ground lasted one frame longer than expected (i.e., one
long frame), and every stimulus presented on a black
background lasted one frame less than expected (i.e.,
one short frame) in the three-, six-, and 12-frame dura-
tion tests. At first sight, this inaccurate behavior might
look like a BBTK photo-sensor calibration problem.
However, every time we found a significant number of
short or long frames in our tests, we repeated a previ-
ously conducted test that yielded no short or long
frames (e.g., a six-frame duration stimulus using CSS
animations and background-position changes on Google
Chrome 58 under MS Windows) to be sure that our
experimental setup was still properly calibrated.
In the case of the 30-frame duration tests on GPU-
accelerated Google Chrome 58 under MS Windows, on-
ly one test presented this behavior, whereas the other
two lost no frames while presenting the last 16 stimuli.
Therefore, our recommendation for studies in which de-
viations of one frame are not acceptable is not only to
restrict data collection to browsers with enabled GPU
acceleration and to have participants update the
browsers whenever possible to a tested version that
loses no frames, but also to assess the accuracy of the
web technique used to present stimuli accurately on the
exact setup that will be used by participants. However,
accuracies within a one-frame deviation are likely ac-
ceptable in many experiments. Therefore, researchers
should weigh the cost of following these recommenda-
tions in those cases.
Table 2 Study 2: Short/long frames using CSS animations and background-position changes on Google Chrome 58 and Mozilla Firefox 54
Tes t N30 Frames 12 Frames 6 Frames 3 Frames
Short Long Short Long Short Long Short Long
Google Chrome 58 Windows 1 24 0 0 0 0 0 0 30
2240 0 0 0 0 0 20
3240 1 0 0 0 0 20
Linux 1 24 0 0 0 0 0 0 0 0
22401000000
32400000000
Mozilla Firefox 54 Windows 1 24 0 0 0 0 0 0 10
2240 0 0 0 0 0 10
3240 0 0 0 0 0 10
Linux 1 24 0 0 0 0 0 0 0 0
22412000000
32412000000
Behav Res
Study 4
The goal of Study 4 was to assess the accuracy of both tech-
niques developed in Studies 1 and 2 (layer opacity changes
and background-position changes) before using request
AnimationFrame instead of CSS animations.
Method
Apparatus, materials, and procedure In this study we tested
the two techniques presented in Studies 1 (layer opacity
changes) and 2 (background-position changes) using
requestAnimationFrame to animate the slideshow. Listing
5shows the animation function, which is scheduled to be
executed in every v-sync (i.e., repaint of the whole screen,
60 times every second at 60 Hz). This function gets a
time stamp from the web browser, to be aware of the
precise moment when requestAnimationFrame started to
execute callbacks (i.e., all the functions requested to be
executed by requestAnimationFrame). By subtracting from
this time stamp the moment the web animation had shown
the previous stimulus, it was possible to estimate the num-
ber of frames the stimulus had been presented and decide
when to present the next one. Note that 5 ms are added to
this numeric expression in order to prevent rounding errors
when calculating the moment when the next stimulus
should be rendered. This Brule of thumb^is a common
recommendation in experiment software user manuals
(e.g., E-Prime; see Schneider et al., 2012), and it allowed
our tests to work properly even with timing sources
rounded to 2 ms, such as in Mozilla Firefoxslatestver-
sions. We have made the web applications used in this
study publicly available at the Open Science Framework:
https://osf.io/h7erv/.
Results
The results of all the tests conducted are shown in Table 4.As
can be seen, there was no frame loss in the tests conducted on
Google Chrome 66 and Mozilla Firefox 59 under MS
Windows. The same happened in the tests conducted on
Mozilla Firefox 59 under GNU/Linux. In the case of Google
Chrome 66 under GNU/Linux, all tests that used background-
image position changes worked flawlessly, but we found
frame loss in tests using layer opacity changes to show the
stimuli. In most cases these tests only missed one frame, but
this combination of web technologies was especially unreli-
able during the 30-frame interval tests.
Conclusions and outlook
Studying the accuracy and precision of browser animations is
of fundamental methodological importance in web-based
Table 3 Study 3: Short/long frames using CSS animations and opacity changes between layers on Google Chrome 58 and Mozilla Firefox 54 with
GPU acceleration
Tes t N30 Frames 12 Frames 6 Frames 3 Frames
Short Long Short Long Short Long Short Long
Google Chrome 58 Windows 1 24 44 12 12 12 12 12 12
22444 12 12 12 12 12 12
32412 12 12 12 12 12 12 12
Mozilla Firefox 54 Linux 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
function animate (timestamp) {
if (i < total) window.requestAnimationFrame(animate);
var progress = timestamp - start;
if (progress + 5 >= interval) {
images[i].style.opacity = 0;
i++;
images[i].style.opacity = 1;
start = timestamp;
}
}
Listing 5 JavaScript code to animate a slideshow using layer opacity changes through requestAnimationFrame on a set of images
Behav Res
research. All static visual stimuli used by researchers in their
experiments can be easily converted to images (sometimes in
scenes made of several distinct image files) using the canvas
element before the experiment begins to preload or
pregenerate the assets needed in the web experiment.
Crucially, the stimuli presentation can then be controlled by
CSS animations to free the JavaScript event queue in order to
dispatch user-generated input events promptly to get accurate
time stamps.
The results of Studies 1 and 2 allow us to realize that
even when a combination of web techniques has proved
to be accurate enough for our experimental paradigm in
the past, it should be tested thoroughly (using BBTK or
similar procedures) again for the combinations of
browsersandoperatingsystemsthatmaybeusedbypar-
ticipants. This can be done ex post for the OSbrowser
combinations identified from server log files; see, for in-
stance, Reips and Stieger (2004). Otherwise, researchers
might obtain results that are biased by the choice of tech-
nology, as in the interaction between browsers and oper-
ating systems that we found in Study 1.
Best-practice recommendations by web-browser vendors
encourage researchers to use techniques such as manipulating
layersopacity to keep all the changes in the composite phase,
but Study 2 shows that background-position changes worked
better in most cases, even if this involved more phases in the
pipeline.
The results of Study 3 showed that enabling GPU acceler-
ation in web browsers can result in a significant improvement
of the accuracy of the presentation of visual stimuli in web-
based experiments. Thus, we recommend checking the status
of this feature before running web-based behavioral experi-
ments with high-resolution timing requirements.
Study 4 showed some limitations of the layer opacity
changes technique using requestAnimationFrame in Google
Chrome under GNU/Linux, but it worked flawlessly in the
rest of the tests under both GNU/Linux and MS Windows.
In light of the results from the studies we have presented
here, we believe that behavioral researchers should be cau-
tious in following browser vendor recommendations when
developing web-based experiments, and rather should adopt
the best practices derived from our empirical tests of the
Table 4 Study 4: Short/long frames using requestAnimationFrame to make layer opacity and background-position changes on Google Chrome 66 and
Mozilla Firefox 59
Tes t N30 Frames 12 Frames 6 Frames 3 Frames
Short Long Short Long Short Long Short Late
Back-ground position Google Chrome 66 Windows 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Linux 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Mozilla Firefox 59 Windows 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Linux 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Layer opacity Google Chrome 66 Windows 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Linux 1 24 32 10 10 10
22422 10 10 10
32443 11 10 10
Mozilla Firefox 59 Windows 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Linux 1 24 0 0 0 0 0 0 0 0
22400000000
32400000000
Behav Res
accuracy and precision of the whole experimental setup (web
application, web browser, operating system, and hardware).
These best practices should also immediately be included in
curricula in psychology and other behavioral and social sci-
ences, as students are often conducting web-based experi-
ments and will be future researchers (Krantz & Reips 2017).
Because the proposed web techniques had not been assessed
in previous studies on the accuracy of web applications under
high-resolution timing requirements (de Leeuw & Motz, 2016;
Garaizar, Vadillo, & López-de-Ipiña, 2014;Reimers&Stewart,
2015), the studies and detailed guidelines presented in this arti-
cle can help behavioral researchers who take them into account
when developing their web-based experiments.
In the old days of Internet-based experimenting, technology
was simpler. The effects of new technologies were easier to spot
for researchers who began using the Internet. In fact, one of us
(Reips) has long advocated a Blow-tech principle^in creating
Internet-based research studies, because, early on, technology
was shown to interfere with participantsbehavior in Internet-
based experiments. For example, Schwarz and Reips (2001)
created the very same web experiment both with server-side
(i.e., CGI) and client-side (i.e., Javascript) technologies and
observed significantly larger and increasing dropout rates in
the latter version. Buchanan and Reips (2001)further
established that technology preferences depend on a partici-
pants personality and may thus indirectly bias sample compo-
sition, and consequently behavior, in Internet-based research
studies (even though this seems to be less the case for different
operating systems on smartphones; see Götz, Stieger, & Reips,
2017). Modern web browsers have evolved to handle a much
wider range of technologies that, on the one hand, are capable of
achieving much more accuracy and precision in the control of
loading and rendering content than were earlier browsers, but on
the other hand, are increasingly likely to fall victim to insuffi-
cient optimization of complexity. Unbeknownst to many re-
searchers, vendors of web browsers implement a multitude of
technologies that are geared toward the optimization of goals
(e.g., speed) that are not in line with those of science (e.g.,
quality, timing). In the present article we have empirically shown
that this conflict has an effect on display and timing in Internet-
based studies and provided recommendations and scripts that
researchers can and should use to optimize their studies.
Alternativelyand this may be the only general rule of thumb
we are able to offer as an outcome of the empirical investigation
presented herethey might follow the Blow-tech principle^as
much as possible, to minimize interference.
Author note Support for this research was provided by the
Departamento de Educación, Universidades e Investigación
of the Basque Government (Grant No. IT1078-16) and by
the Committee on Research at the University of Konstanz.
The authors declare that there was no conflict of interest in
the publication of this study.
References
Bamberg, W. (2018a). Intensive JavaScript. MDN web docs. Retrieved
from https://developer.mozilla.org/en-US/docs/Tools/Performance/
Scenarios/Intensive_JavaScript
Bamberg, W. (2018b). Animating CSS properties. MDN web docs.
Retrieved from https://developer.mozilla.org/en-US/docs/Tools/
Performance/Scenarios/Animating_CSS_properties
Barnhoorn, J. S., Haasnoot, E., Bocanegra, B. R., & van Steenbergen, H.
(2015). QRTEngine: An easy solution for running online reaction
time experiments using Qualtrics. Behavior Research Methods,47,
918929. https://doi.org/10.3758/s13428-014-0530-7
Belshe,M.,Peon,R.,Thomson,M.(2015). Hypertext Transfer Protocol
Version 2 (HTTP/2). Retrieved from https://http2.github.io/http2-spec/
Birnbaum, M. H. (2004). Human research and data collection via the
Internet. Annual Review of Psychology,55,803832. https://doi.
org/10.1146/annurev.psych.55.090902.141601
Buchanan, T., & Reips, U.-D. (2001). Platform-dependent biases in on-
line research: Do Mac users really think different? In K. J. Jonas, P.
Breuer, B. Schauenburg, & M. Boos (Eds.), Perspectives on Internet
research: Concepts and methods. Available at http://www.uni-
konstanz.de/iscience/reips/pubs/papers/Buchanan_Reips2001.pdf.
Accessed 26 Sept 2018
Garaizar, P., Vadillo, M. A., & López-de-Ipiña, D. (2014). Presentation
accuracy of the web revisited: Animation methods in the HTML5
era. PLoS ONE,9, e109812. https://doi.org/10.1371/journal.pone.
0109812
Götz, F. M., Stieger, S., & Reips, U.-D. (2017). Users of the main
smartphone operating systems (iOS, Android) differ only little in
personality. PLoS ONE,12, e0176921. https://doi.org/10.1371/
journal.pone.0176921
Grigorik, I., & Weiss, Y. (2018). W3C Preload API. Retrieved from
https://w3c.github.io/preload/#x2.link-type-preload
Henninger, F., Mertens, U. K., Shevchenko, Y., & Hilbig, B. E. (2017).
lab.js: Browser-based behavioral research. https://doi.org/10.5281/
zenodo.597045
Honing, H., & Reips, U.-D. (2008). Web-based versus lab-based studies:
A response toKendall (2008). Empirical Musicology Review,3,73
77. https://doi.org/10.5167/uzh-4560
Krantz, J., & Reips, U.-D. (2017). The state of web-based research: A
survey and call for inclusion in curricula. Behavior Research
Methods,49, 16211629. https://doi.org/10.3758/s13428-017-0882-x
Kyöstilä, S. (2018). Clamp performance.now() to 100us. Retrieved from
https://chromium-review.googlesource.com/c/chromium/src/+/
853505
de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behav-
ioral experiments in a Web browser. Behavior Research Methods,
47,112. https://doi.org/10.3758/s13428-014-0458-y
de Leeuw, J. R., & Motz, B. A. (2016). Psychophysics in a Web browser?
Comparing response times collected with JavaScript and
Psychophysics Toolbox in a visual search task. Behavior Research
Methods,48,112.https://doi.org/10.3758/s13428-015-0567-2
Lewis, P. (2018). Rendering performance. Retrieved from https://
developers.google.com/web/fundamentals/performance/rendering/
Mangan, M., & Reips, U.-D. (2007). Sleep, sex, and the Web: Surveying
the difficult-to-reach clinical population suffering from sexsomnia.
Behavior Research Methods,39,233236. https://doi.org/10.3758/
BF03193152
Mozilla. (2018). Concurrency model and Event Loop. MDN web docs.
Retrieved from https://developer.mozilla.org/en-US/docs/Web/
JavaScript/EventLoop
Musch, J., & Reips, U.-D. (2000). A brief history of Web experimenting.
In M. H. Birnbaum (Ed.), Psychological experiments on the Internet
(pp. 6188). San Diego: Academic Press. https://doi.org/10.1016/
B978-012099980-4/50004-6
Behav Res
Plant, R. R. (2016). A reminder on millisecond timing accuracy and
potential replication failure in computer-based psychology experi-
ments: An open letter. Behavior Research Methods,48,408411.
https://doi.org/10.3758/s13428-015-0577-0
Plant, R. R., Hammond, N., & Turner, G. (2004). Self-validating presen-
tation and response timing in cognitive paradigms: How and why?
Behavior Research Methods, Instruments, & Computers,36,291
303. https://doi.org/10.3758/BF03195575
Reimers, S., & Stewart, N. (2015). Presentation and response timing
accuracy in Adobe Flash and HTML5/JavaScript Web experiments.
Behavior Research Methods,47,309327. https://doi.org/10.3758/
s13428-014-0471-1
Reips, U.-D. (2000). The Web experiment method: Advantages, disad-
vantages, and solutions. In M. H. Birnbaum (Ed.), Psychological
experiments on the Internet (pp. 89117).San Diego: Academic
Press. https://doi.org/10.5167/uzh-19760
Reips, U.-D. (2002). Standards for Internet-based experimenting.
Experimental Psychology,49, 243256. https://doi.org/10.1027/
1618-3169.49.4.243
Reips, U.-D. (2007). Reaction times in Internet-based research. Invited
symposium talk at the 37th Meeting of the Society for Computers in
Psychology (SCiP) Conference, St. Louis.
Reips, U.-D. (2012). Using the Internet to collect data. In H. Cooper, P.
M. Camic, R. Gonzalez, D. L. Long, A. Panter, D. Rindskopf, & K.
J. Sher (Eds.), APA handbook of research methods in psychology,
Vol 2: Research designs: Quantitative, qualitative, neuropsycholog-
ical, and biological (pp. 291310). Washington, DC: American
Psychological Association. https://doi.org/10.1037/13620-017
Reips, U.-D., & Stieger, S. (2004). Scientific LogAnalyzer: AWeb-based
tool for analyses of server log files in psychological research.
Behavior Research Methods, Instruments, & Computers,36,304
311. https://doi.org/10.3758/BF03195576
Schmidt, W. C. (1997). World-Wide Web survey research: Benefits, po-
tential problems, and solutions. Behavior Research Methods,
Instruments, & Computers,29,274279. https://doi.org/10.3758/
BF03204826
Schmidt, W. C. (2007). Technical considerations when implementing
online research. In A. Joinson, K. McKenna, T. Postmes, & U.-D.
Reips (Eds.), The Oxford handbook of Internet psychology (pp.
461472). Oxford: Oxford University Press.
Schneider, W., Eschman, A., and Zuccolotto, A. (2012). E-Prime users
guide. Pittsburgh: Psychology Software Tools, Inc.
Scholz, F. (2018). performance.now(). MDN web docs. Retrieved from:
https://developer.mozilla.org/en-US/docs/Web/API/Performance/now
Schwarz, S., & Reips, U.-D. (2001). CGI versus JavaScript: A Web ex-
periment on the reversed hindsight bias. In U.-D. Reips & M.
Bosnjak (Eds.), Dimensions of Internet science (pp. 7590).
Lengerich: Pabst.
van Steenbergen, H., & Bocanegra, B. R. (2016). Promises and pitfalls of
Web-based experimentation in the advance of replicable psycholog-
ical science: A reply to Plant (2015). Behavior Research Methods,
48, 17131717. https://doi.org/10.3758/s13428-015-0677-x
Stieger, S., & Reips, U.-D. (2010). What are participants doing while
filling in an online questionnaire: A paradata collection tool and an
empirical study. Computers in Human Behavior,26,1488
1495. https://doi.org/10.1016/j.chb.2010.05.013
WHATWG (Apple, Google, Mozilla, Microsoft). (2018). HTML living
standard: Event loops. Retrieved from https://html.spec.whatwg.
org/multipage/webappapis.html#event-loops
Wolfe, C. R. (2017). Twenty years of Internet-based research at SCiP: A
discussion of surviving concepts and new methodologies. Behavior
Research Methods,49,16151620. https://doi.org/10.3758/s13428-
017-0858-x
Behav Res
... The precise evaluation of response latencies is of particular interest, especially when investigating behavioral change over time, for example, to study learning and memory [8][9][10] . Unfortunately, several significant methodological shortcomings have been reported [11][12][13][14] , which cause considerable obstacles to interpreting reaction time measurements collected online. Additionally, significant time-of-day variations in demographic composition 15 , inadequate worker comprehension of task instructions 16 , worker inattentiveness 17,18 , non-random attrition (i.e., loss of participants over time) 19 , bots and poor data quality 20 . ...
... To control these factors, some researchers suggest enforcing GPU hardware acceleration strictly in experiments where single-frame deviations or "dropped frames" are unacceptable 14 . ...
... Contrary to this recommendation, our data suggest that frame rate variability within and across sessions remains a significant source of bias leading to unreliable data even in within-subject designs. Browser types do represent another significant limitation and cause of variability because it largely remains opaque how they, in general, and with each version update specifically, interact with hardware acceleration status or impact frame rate 12,14 . One approach may be to restrict access to the experiment only through specified browsers, which again is another source of bias and limitation of participant diversity and does not account for unknown interactions between browser version . ...
Preprint
Full-text available
This study explored challenges associated with online crowdsourced data collection, particularly focusing on longitudinal tasks with time-sensitive outcomes like response latencies. The research identified two significant sources of bias: technical shortcomings such as low, variable frame rates, and human factors, contributing to high attrition rates. The study also explored potential solutions to these problems, such as enforcing hardware acceleration and defining study-specific frame rate thresholds, as well as pre-screening participants and monitoring hardware performance and task engagement over each experimental session. This study provides valuable insights into improving the quality and reliability of data collected via online crowdsourced platforms and emphasizes the need for researchers to be cognizant of potential pitfalls in online research.
... One limitation of online experiments relates to the software (Bridges et al., 2020;Pronk et al., 2020). The simplest and by far the most popular way to conduct online experiments is via web-browsers, using the JavaScript (JS) programming language to manipulate and monitor the HTML/CSS graphical interface for (interactive) stimulus display and recording responses (e.g., Grootswagers, 2020; technical mechanisms are extensively detailed by, e.g., Garaizar & Reips, 2019). However, as often noted (e.g., Garaizar & Reips, 2019;Pronk et al., 2020), browsers are not optimized for such specific purposes, and, in particular, they do not necessarily provide the timing precision that is desired in behavioral experiments. ...
... The simplest and by far the most popular way to conduct online experiments is via web-browsers, using the JavaScript (JS) programming language to manipulate and monitor the HTML/CSS graphical interface for (interactive) stimulus display and recording responses (e.g., Grootswagers, 2020; technical mechanisms are extensively detailed by, e.g., Garaizar & Reips, 2019). However, as often noted (e.g., Garaizar & Reips, 2019;Pronk et al., 2020), browsers are not optimized for such specific purposes, and, in particular, they do not necessarily provide the timing precision that is desired in behavioral experiments. Plenty of studies have been devoted to comparing results of online studies and technologies to laboratory-based ones (e.g., Bridges et al., 2020;Crump et al., 2013;De Leeuw & Motz, 2016;Miller et al., 2018;Reimers & Stewart, 2015): By and large, the consensus is that the online alternatives work very well and produce very similar results as laboratory-based ones, but they are still to some extent inferior, and timing precision still leaves room for improvement (Anwyl-Irvine et al., 2020;Bridges et al., 2020;Grootswagers, 2020;Pronk et al., 2020;Reimers & Stewart, 2015). ...
... However, empirical testing is crucial, because the results can contradict what is expected (Garaizar & Reips, 2019), and indeed the RAF's behavior too can differ unexpectedly depending on the circumstances (as demonstrated in the present study, but also as indicated by repeated sanity checks throughout the past years; Lukács, 2018). Relatively few studies have been devoted to empirically comparing different JS methods for online experiments -and these studies focused not on display change timing precision, but primarily on ensuring the correctness of the duration of a given stimulus being displayed on the screen (Gao et al., 2020;Garaizar & Reips, 2019;Garaizar et al., 2014;Pronk et al., 2020). ...
Article
Full-text available
Conducting research via the Internet is a formidable and ever-increasingly popular option for behavioral scientists. However, it is widely acknowledged that web-browsers are not optimized for research: In particular, the timing of display changes (e.g., a stimulus appearing on the screen), still leaves room for improvement. So far, the typically recommended best (or least bad) timing method has been a single (RAF) JavaScript function call within which one would give the display command and obtain the time of that display change. In our Study 1, we assessed two alternatives: Calling the RAF twice consecutively, or calling the RAF during a continually ongoing independent loop of recursive RAF calls. While the former has shown little or no improvement as compared to single RAF calls, with the latter we significantly and substantially improved overall precision, and achieved practically faultless precision in most practical cases. Our two basic methods for effecting display changes, plain text change and color filling, proved equally efficient. In Study 2, we reassessed the “RAF loop” timing method with image elements in combination with three different display methods: We found that the precision remained high when using either or changes – while drawing on a element consistently led to comparatively lower precision. We recommend the “RAF loop” display timing method for improved precision in future studies, and or changes when using image stimuli. We publicly share the easy-to-use code for this method, exactly as employed in our studies.
... We designed a simple study for the laboratory setting based on the "QButterfly template" for Qualtrics that recorded the user interaction. We chose a more straightforward approach instead of 2 Concerning images, these problems can be addressed by pre-loading the images before display [14] using external devices to register stimuli and trigger events (e.g., a BBTK photodiode and robotic actuator [6]) since we considered the timing precision sufficient for our purposes. We hosted a realistic stimulus with multiple linked webpages based on HTML, CSS, and JavaScript (a fitness tracking website) on Amazon Web Services and simulated user interaction directly on the computer that showed the stimulus. ...
... The toolkit can also be extended for user cases requiring higher resolution timing. For example, best practices for stimulus presentation on the web, such as preloading assets, can be integrated [14]. ...
Preprint
Full-text available
We provide a user-friendly, flexible, and lightweight open-source HCI toolkit (github.com/QButterfly) that allows non-tech-savvy researchers to conduct online user interaction studies using the widespread Qualtrics and LimeSurvey platforms. These platforms already provide rich functionality (e.g., for experiments or usability tests) and therefore lend themselves to an extension to display stimulus web pages and record clickstreams. The toolkit consists of a survey template with embedded JavaScript, a JavaScript library embedded in the HTML web pages, and scripts to analyze the collected data. No special programming skills are required to set up a study or match survey data and user interaction data after data collection. We empirically validated the software in a laboratory and a field study. We conclude that this extension, even in its preliminary version, has the potential to make online user interaction studies (e.g., with crowdsourced participants) accessible to a broader range of researchers.
... Therefore, web-based survey methods provide a comprehensive solution. Their potential benefits include reduced costs of time and money [7,8], responses quicker than traditional methods of data collection [9], more interactive or customized formats and less effort to complete and return the survey [10]. More complexity can also be applied in a web survey to simplify the collection and reporting of data and to obtain larger samples [11]. ...
Article
The purpose of this study is to describe the advantages of social survey for the product development based on market research. The method used in this study was a qualitative method in which the data presented are data from phenomena. The results of this study proved that Web-based survey had advantages to the users as consumers viewed through how the survey team gave rewards to users after finishing surveys, the features of the web and users would help the companies by their surveys. The conclusion obtained in this study is relationships between other companies, survey teams and users as consumers have benefit to each other.
... Using participants' smartphones increases the heterogeneity of operating systems and hardware in a study; such heterogeneity may, for example, have an influence on the timing and measurement (Garaizar & Reips, 2019;Kuhlmann et al., 2020). ...
Chapter
Practical guide on how to conduct online experiments
... The task was developed in JavaScript, following recommendations for stimulus presentation (Garaizar & Reips, 2019). The presentation of each of the 72 faces was preceded by an orange fixation cross for 500 ms on a white background. ...
Article
Full-text available
Social anxiety (SA) and depression have been associated with negative interpretation biases of social stimuli. Studies often assess these biases with ambiguous faces, as people with SA and depression tend to interpret such faces negatively. However, the test-retest reliability of this type of task is unknown. Our objectives were to develop a new interpretation bias task with ambiguous faces and analyse its properties in terms of test-retest reliability and in relation to SA, depression, and looming maladaptive style (LMS). Eight hundred sixty-four participants completed a task in which they had to interpret morphed faces as negative or positive on a continuum between happy and angry facial expressions. In addition, they filled out scales on SA, depressive symptoms, and LMS. Eighty-four participants completed the task again after 1-2 months. The test-retest reliability was moderate (r = .57-.69). The data revealed a significant tendency to interpret faces as negative for people with higher SA and depressive symptoms and with higher LMS. Longer response times to interpret the happy faces were positively associated with a higher level of depres-sive symptoms. The reliability of the present task was moderate. The results highlight associations between the bias interpretation task and SA, depression, and LMS.
... Using participants' smartphones increases the heterogeneity of operating systems and hardware in a study; such heterogeneity may, for example, have an influence on the timing and measurement (Garaizar & Reips, 2019;Kuhlmann et al., 2020). ...
... Doubts have been voiced about whether these devices are sufficiently accurate at timing stimuli and measuring response times (RTs) (Plant & Quinlan, 2013;van Steenbergen & Bocanegra, 2016). Studies that have measured stimulus duration via photometry on desktops and laptops, found that accuracy varied per combination of physical device, operating system, and browser, henceforth jointly denoted as device (Anwyl-Irvine et al., 2021;Barnhoorn et al., 2015;Bridges et al., 2020;Garaizar et al., 2014;Garaizar & Reips, 2019;Reimers & Stewart, 2015). ...
Article
Full-text available
Research deployed via the internet and administered via smartphones could have access to more diverse samples than lab-based research. Diverse samples could have relatively high variation in their traits and so yield relatively reliable measurements of individual differences in these traits. Several cognitive tasks that originated from the experimental research tradition have been reported to yield relatively low reliabilities (Hedge et al., 2018) in samples with restricted variance (students). This issue could potentially be addressed by smartphone-mediated administration in diverse samples. We formulate several criteria to determine whether a cognitive task is suitable for individual differences research on commodity smartphones: no very brief or precise stimulus timing, relative response times (RTs), a maximum of two response options, and a small number of graphical stimuli. The flanker task meets these criteria. We compared the reliability of individual differences in the flanker effect across samples and devices in a preregistered study. We found no evidence that a more diverse sample yields higher reliabilities. We also found no evidence that commodity smartphones yield lower reliabilities than commodity laptops. Hence, diverse samples might not improve reliability above student samples, but smartphones may well measure individual differences with cognitive tasks reliably. Exploratively, we examined different reliability coefficients, split-half reliabilities, and the development of reliability estimates as a function of task length.
... To ensure accuracy and test-retest reliability in presentation and response recording, researchers should use specialty scripts that require a working knowledge of programming languages such as JavaScript (Anwyl-Irvine et al., 2020;Garaizar & Reips, 2019). However, if researchers are not able to program and code, they should turn to dedicated online research software providers. ...
Article
Conducting organizational research via online surveys and experiments offers a host of advantages over traditional forms of data collection when it comes to sampling for more advanced study designs, while also ensuring data quality. To draw attention to these advantages and encourage researchers to fully leverage them, the present paper is structured into two parts. First, along a structure of commonly used research designs, we showcase select organizational psychology (OP) and organizational behavior (OB) research and explain how the Internet makes it feasible to conduct research not only with larger and more representative samples, but also with more complex research designs than circumstances usually allow in offline settings. Subsequently, because online data collections often also come with some data quality concerns, in the second section, we synthesize the methodological literature to outline three improvement areas and several accompanying strategies for bolstering data quality. Plain Language Summary: These days, many theories from the fields of organizational psychology and organizational behavior are tested online simply because it is easier. The point of this paper is to illustrate the unique advantages of the Internet beyond mere convenience—specifically, how the related technologies offer more than simply the ability to mirror offline studies. Accordingly, our paper first guides readers through examples of more ambitious online survey and experimental research designs within the organizational domain. Second, we address the potential data quality drawbacks of these approaches by outlining three concrete areas of improvement. Each comes with specific recommendations that can ensure higher data quality when conducting organizational survey or experimental research online.
Article
Full-text available
The increasingly widespread use of mobile phone applications (apps) as research tools and cost-effective means of vast data collection raises new methodological challenges. In recent years, it has become a common practice for scientists to design apps that run only on a single operating system, thereby excluding large numbers of users who use a different operating system. However, empirical evidence investigating any selection biases that might result thereof is scarce. Henceforth, we conducted two studies drawing from a large multi-national (Study 1; N = 1,081) and a German-speaking sample (Study 2; N = 2,438). As such Study 1 compared iOS and Android users across an array of key personality traits (i.e., well-being, self-esteem, willingness to take risks, optimism, pessimism, Dark Triad, and the Big Five). Focusing on Big Five personality traits in a broader scope, in addition to smartphone users, Study 2 also examined users of the main computer operating systems (i.e., Mac OS, Windows). In both studies, very few significant differences were found, all of which were of small or even tiny effect size mostly disappearing after sociodemographics had been controlled for. Taken together, minor differences in personality seem to exist, but they are of small to negligible effect size (ranging from OR = 0.919 to 1.344 (Study 1), np² = .005 to .036 (Study 2), respectively) and may reflect differences in sociodemographic composition, rather than operating system of smartphone users. © 2017 Goetz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Article
Full-text available
The first papers that reported on conducting psychological research on the web were presented at the Society for Computers in Psychology conference 20 years ago, in 1996. Since that time, there has been an explosive increase in the number of studies that use the web for data collection. As such, it seems a good time, 20 years on, to examine the health and adoption of sound practices of research on the web. The number of studies conducted online has increased dramatically. Overall, it seems that the web can be a method for conducting valid psychological studies. However, it is less clear that students and researchers are aware of the nature of web research. While many studies are well conducted, there is also a certain laxness appearing regarding the design and conduct of online studies. This laxness appears both anecdotally to the authors as managers of large sites for posting links to online studies, and in a survey of current researchers. One of the deficiencies discovered is that there is no coherent approach to educating researchers as to the unique features of web research.
Article
Full-text available
In a recent letter, Plant (2015) reminded us that proper calibration of our laboratory experiments is important for the progress of psychological science. Therefore, carefully controlled laboratory studies are argued to be preferred over Web-based experimentation, in which timing is usually more imprecise. Here we argue that there are many situations in which the timing of Web-based experimentation is acceptable and that online experimentation provides a very useful and promising complementary toolbox to available lab-based approaches. We discuss examples in which stimulus calibration or calibration against response criteria is necessary and situations in which this is not critical. We also discuss how online labor markets, such as Amazon's Mechanical Turk, allow researchers to acquire data in more diverse populations and to test theories along more psychological dimensions. Recent methodological advances that have produced more accurate browser-based stimulus presentation are also discussed. In our view, online experimentation is one of the most promising avenues to advance replicable psychological science in the near future.
Article
Full-text available
There is an ongoing 'replication crisis' across the field of psychology in which researchers, funders, and members of the public are questioning the results of some scientific studies and the validity of the data they are based upon. However, few have considered that a growing proportion of research in modern psychology is conducted using a computer. Could it simply be that the hardware and software, or experiment generator, being used to run the experiment itself be a cause of millisecond timing error and subsequent replication failure? This article serves as a reminder that millisecond timing accuracy in psychology studies remains an important issue and that care needs to be taken to ensure that studies can be replicated on current computer hardware and software.
Article
Full-text available
Behavioral researchers are increasingly using Web-based software such as JavaScript to conduct response time experiments. Although there has been some research on the accuracy and reliability of response time measurements collected using JavaScript, it remains unclear how well this method performs relative to standard laboratory software in psychologically relevant experimental manipulations. Here we present results from a visual search experiment in which we measured response time distributions with both Psychophysics Toolbox (PTB) and JavaScript. We developed a methodology that allowed us to simultaneously run the visual search experiment with both systems, interleaving trials between two independent computers, thus minimizing the effects of factors other than the experimental software. The response times measured by JavaScript were approximately 25 ms longer than those measured by PTB. However, we found no reliable difference in the variability of the distributions related to the software, and both software packages were equally sensitive to changes in the response times as a result of the experimental manipulations. We concluded that JavaScript is a suitable tool for measuring response times in behavioral research.
Article
Full-text available
Using the Web to run behavioural and social experiments quickly and efficiently has become increasingly popular in recent years, but there is some controversy about the suitability of using the Web for these objectives. Several studies have analysed the accuracy and precision of different web technologies in order to determine their limitations. This paper updates the extant evidence about presentation accuracy and precision of the Web and extends the study of the accuracy and precision in the presentation of multimedia stimuli to HTML5-based solutions, which were previously untested. The accuracy and precision in the presentation of visual content in classic web technologies is acceptable for use in online experiments, although some results suggest that these technologies should be used with caution in certain circumstances. Declarative animations based on CSS are the best alternative when animation intervals are above 50 milliseconds. The performance of procedural web technologies based on the HTML5 standard is similar to that of previous web technologies. These technologies are being progressively adopted by the scientific community and have promising futures, which makes their use advisable to utilizing more obsolete technologies.
Article
Full-text available
Performing online behavioral research is gaining increased popularity among researchers in psychological and cognitive science. However, the currently available methods for conducting online reaction time experiments are often complicated and typically require advanced technical skills. In this article, we introduce the Qualtrics Reaction Time Engine (QRTEngine), an open-source JavaScript engine that can be embedded in the online survey development environment Qualtrics. The QRTEngine can be used to easily develop browser-based online reaction time experiments with accurate timing within current browser capabilities, and it requires only minimal programming skills. After introducing the QRTEngine, we briefly discuss how to create and distribute a Stroop task. Next, we describe a study in which we investigated the timing accuracy of the engine under different processor loads using external chronometry. Finally, we show that the QRTEngine can be used to reproduce classic behavioral effects in three reaction time paradigms: a Stroop task, an attentional blink task, and a masked-priming task. These findings demonstrate that QRTEngine can be used as a tool for conducting online behavioral research even when this requires accurate stimulus presentation times. Electronic supplementary material The online version of this article (doi:10.3758/s13428-014-0530-7) contains supplementary material, which is available to authorized users.
Article
This discussion of the symposium 20 Years of Internet-Based Research at SCiP: Surviving Concepts, New Methodologies compares the issues faced by the pioneering Internet-based psychology researchers who presented at the first symposia on the topic, at the 1996 annual meeting of the Society for Computers in Psychology, to the issues facing researchers today. New methodologies unavailable in the early days of Web-based psychological research are discussed, with an emphasis on mobile computing with smartphones that is capitalizing on capabilities such as touch screens and gyro sensors. A persistent issue spanning the decades has been the challenge of conducting scientific research with consumer-grade electronics. In the 1996 symposia on Internet-based research, four advantages were identified: easy access to a geographically unlimited subject population, including subjects from very specific and previously inaccessible target populations; bringing the experiment to the subject; high statistical power through large sample size; and reduced cost. In retrospect, it appears that Internet-based research has largely lived up to this early promise—with the possible exception of sample size, since the public demand for controlled psychology experiments has not always been greater than the supply offered by researchers. There are many reasons for optimism about the future of Internet-based research. However, unless courses and textbooks on psychological research methods begin to give Web-based research the attention it deserves, the future of Internet-based psychological research will remain in doubt.
Article
This article introduces some of the rudimentary underlying concepts of how the Internet works and points out a number of caveats that can influence the quality of collected data. Topics covered include Internet basics, technical problems, programming for the lowest common technology, client configuration issues, server side and data security issues, and the limits of precision. It is hoped that after becoming familiar with the information herein, researchers will be capable of determining whether the research application they are interested in pursuing is fit for the Internet medium, or whether technical issues will pose problems which threaten the validity of the work.