ArticlePDF Available

Open-source software for mouse-tracking in Qualtrics to measure category competition


Abstract and Figures

Mouse-tracking is a sophisticated tool for measuring rapid, dynamic cognitive processes in real time, particularly in experiments investigating competition between perceptual or cognitive categories. We provide user-friendly, open-source software ( for designing and analyzing such experiments online using the Qualtrics survey platform. The software consists of a Qualtrics template with embedded JavaScript and CSS along with R code to clean, parse, and analyze the data. No special programming skills are required to use this software. As we discuss, this software could be readily modified for use with other online survey platforms that allow the addition of custom JavaScript. We empirically validate the provided software by benchmarking its performance on previously tested stimuli (android robot faces) in a category-competition experiment with realistic crowdsourced data collection.
Content may be subject to copyright.
Behavior Research Methods
Open-source software for mouse-tracking in Qualtrics to measure
category competition
Maya B. Mathur1,2 ·David B. Reichling3
©The Author(s) 2019
Mouse-tracking is a sophisticated tool for measuring rapid, dynamic cognitive processes in real time, particularly in
experiments investigating competition between perceptual or cognitive categories. We provide user-friendly, open-source
software ( for designing and analyzing such experiments online using the Qualtrics survey platform.
The software consists of a Qualtrics template with embedded JavaScript and CSS along with R code to clean, parse, and
analyze the data. No special programming skills are required to use this software. As we discuss, this software could be
readily modified for use with other online survey platforms that allow the addition of custom JavaScript. We empirically
validate the provided software by benchmarking its performance on previously tested stimuli (android robot faces) in a
category-competition experiment with realistic crowdsourced data collection.
Keywords Experimental design ·Mouse-tracking ·Response dynamics ·Cognition ·Qualtrics ·R
Capturing rapid, dynamic cognitive processes that may lie
outside subjective awareness is a key methodological task
in several realms of experimental psychology. One promis-
ing method for gaining insight into these processes is to
analyze the trajectories of subjects’ mouse cursors as they
complete experimental tasks (Freeman & Johnson, 2016).
For example, in tasks in which subjects must rapidly cat-
egorize stimuli (such as faces) into mutually exclusive,
binary categories (such as “male” and “female”), the trajec-
tories of subjects’ mouse cursors as they attempt to rapidly
select a category button can serve as direct physical
manifestations of cognitive competition between the cat-
egories (see Fig. 1for a hypothetical trial). Stimuli that
are difficult to categorize because they are intermediate
Maya B. Mathur
1Department of Epidemiology, Harvard T. H. Chan School
of Public Health, Boston, MA, USA
2Quantitative Sciences Unit, Stanford University, 1070
Arastradero Road, Palo Alto, CA 94305 USA
3Oral & Maxillofacial Surgery (retired), University
of California at San Francisco, San Francisco,
between the two categories or are atypical exemplars of their
category, such as gender-atypical faces, tend to produce
mouse trajectories that differ markedly from those pro-
duced by stimuli falling clearly into one category (Dale,
Kehoe, & Spivey, 2007; Freeman, Ambady, Rule, & John-
son, 2008; Freeman, Pauker, & Sanchez, 2016). That is, the
trajectories produced when subjects attempt to categorize
ambiguous stimuli will tend to reflect the subjects’ “con-
fusion” and simultaneous or alternating attraction to both
categories; these trajectories typically show more changes
of direction and greater divergence from the most direct pos-
sible trajectory from the mouse cursor’s starting and ending
positions. For example, in the hypothetical trial depicted in
Figure 1, the subject must attempt to categorize as “robot”
or “human” a stimulus depicting an extremely human-like
android robot. Mouse-tracking has been used to investigate
category competition in diverse subdisciplines, including
language processing (Dale & Duran 2011; Farmer, Ander-
son, & Spivey, 2007; Spivey, Grosjean, & Knoblich, 2005),
social judgments of white versus black faces (Wojnow-
icz, Ferguson, Dale, & Spivey, 2009; Yu, Wang, Wang, &
Bastin, 2012), and social game theory (Kieslich & Hilbig,
Collecting reliable mouse trajectories that are compara-
ble across subjects and trials requires precise control over
the visual layout and timing of the experiment, as we will
describe. Perhaps for this reason, mouse-tracking experi-
ments to date have usually been conducted in person, with
Behav Res
Fig. 1 Typical outcome measures for category-competition experi-
ments. In this example, a hypothetical subject’s cursor trajectory
suggests initial attraction to the “robot” category, but in an early
change of direction, the subject appears to become more attracted to
the “human” category. There is a final, weak attraction once again to
the “robot” category, but the subject ultimately categorizes the face as
“human”. In our implementation, there is a 570-px horizontal distance
between the category buttons and a 472-px vertical distance between
the category buttons and the middle of the Next button
subjects physically present in the lab (with some excep-
tions, e.g., Freeman et al. 2016). Such settings allow for
a consistent visual presentation of the experiment through
the use of existing mouse-tracking software (Freeman &
Ambady, 2010; Kieslich & Henninger, 2017). In contrast,
collecting mouse-tracking data online, for example through
crowd-sourcing websites, could allow for much larger sam-
ples, greater demographic diversity (Gosling, Sandy, John,
& Potter, 2010), and the possibility of implementing the
same experiment in multiple collaborating labs without spe-
cial hardware or software requirements. We are not aware of
existing open-source software that is suitable for these set-
tings, that can accommodate common experimental features
such as presentation of multiple stimuli and randomization,
and that ensures a consistent, validated experimental pre-
sentation even when subjects complete the study from their
home computers or other devices.
The present paper therefore provides open-source soft-
ware enabling reliable and precise design of mouse-tracking
experiments through the widely used software Qualtrics
(Provo, UT, last accessed 10-2018), a graphical user inter-
face that is designed for online data collection that interfaces
easily with crowd-sourcing websites such as Amazon
Mechanical Turk. Our software pipeline consists of:
(1) a premade Qualtrics template containing embedded
JavaScript and CSS that manages stimulus presentation,
trains subjects on the experimental task, and collects mouse
trajectory and time data; and (2) R code to clean, parse, and
analyze the data. We present a validation study demonstrat-
ing consistent data collection even in relatively uncontrolled
online settings and demonstrating that these methods show
concurrent validity when benchmarked using previously
tested stimuli.
A basic category-competition experiment
In a standard category-competition experiment, the subject
views a series of stimuli presented sequentially on separate
pages. The subject must categorize each stimulus by
clicking on one of two buttons presented on the left and right
sides of the window (Fig. 1). Stimuli are typically chosen
such that some fall clearly into one of the categories, while
others are ambiguous or difficult to categorize. Ambiguous
stimuli are thought to activate mental representations of both
categories simultaneously, leading to dynamic competition
that manifests in real time as unstable mouse dynamics
(Freeman & Johnson, 2016). That is, because the subject
is continuously or alternately attracted to both categories,
the mouse trajectory may contain frequent direction changes
and may diverge substantially from a direct path from
the start position to the location of the category button
ultimately chosen.
Specifically, past literature (e.g., Freeman et al. 2008)
has used several outcome measures to operationalize
category competition through mouse dynamics. More
ambiguous stimuli typically increase the number of times
the subject’s mouse changes directions horizontally (x-
flips). Additionally, compared to unambiguous stimuli,
ambiguous stimuli tend to produce trajectories that diverge
more from an “ideal trajectory” consisting of a straight
line from the subject’s initial cursor position to the finally
chosen radio button (Fig. 1, red dashed line). That is, the
maximum horizontal deviation between the ideal trajectory
and the subject’s actual trajectory (Fig. 1, red solid
line), as well as the area between the ideal and actual
trajectories (Fig. 1, pink shading), are typically larger for
ambiguous stimuli. Our implementation calculates these
measures using trajectories rescaled to unit length in both
the x-andy-dimensions and calculates the area using
Riemann integration. Other outcome measures can include
the maximum speed of the subject’s cursor (ambiguous
stimuli tend to produce higher maximum speeds, reflecting
abrupt category shifts (Freeman et al. 2016)) and the total
reaction time for the trial (ambiguous stimuli tend to produce
Behav Res
Tab le 1 Modifiable JavaScript global variables
Variable Default Meaning
howManyPracticeImages 6 The number of practice stimuli (for which no mouse trajectories will
be recorded)
howManyRealImages 10 The number of experimental stimuli (for which mouse trajectories will
be recorded)
maxAnswerTime 5000 The maximum time (ms) that can be spent on a trial.
Trials with longer answer times will receive a “took too long” alert.
maxLatency 700 The maximum time (ms) after trial onset for which subject can
leave mouse position unchanged.
Trials with longer latencies will receive a “started too late” alert.
longer reaction times). We calculate reaction time as the
time elapsed between the start of the trial, after the page is
fully loaded, to the time the subject clicks on a button to
categorize the stimulus. However, both maximum speed and
reaction time have limitations and are perhaps best treated
as secondary measures (Freeman et al., 2016).
How to create and analyze an experiment
with our software
Our open-source software provides a user-friendly data
collection and analysis pipeline for creating such exper-
iments as follows. All questionnaire and code files are
available online (, along with a detailed
READ-ME file that users are strongly encouraged to
read before implementation. First, the user imports into
Qualtrics a template questionnaire implementing the val-
idation study presented below. The key feature is two
question “blocks” that present the stimuli sequentially, in
randomized order, via Qualtrics’ “Loop & Merge” fea-
ture; other blocks in the survey, such as one present-
ing demographic questions, can be added or removed as
needed. The image URLs in the Loop & Merge can sim-
ply be edited through the Qualtrics interface to replace
the default stimuli. The first block of the questionnaire
shows instructions (Online Supplement). Then the first
Loop & Merge block presents training stimuli to accli-
matize the subject to the experiment, including to alert
messages designed to optimize subject behavior for mouse-
tracking, detailed in “Optimizing subject behavior for
mouse-tracking” below. The second Loop & Merge block
of experimental stimuli begins data collection by activat-
ing mouse-tracking. The underlying JavaScript that acti-
vates mouse-tracking1requires no modification except that
global variables specifying the number of training stimuli
(howManyPracticeImages, defaulting to 6) and real
1The JavaScript code is already embedded in the template Qualtrics
files, but it is also available as standalone files (
experimental stimuli (howManyRealImages, defaulting
to 10) must be changed to match the number of user-
supplied stimuli. Additional parameters that the user can
optionally change are listed in Table 1. The Qualtrics
template also contains (in the “Look and Feel” section
accessible through the Qualtrics user interface) a small snip-
pet of CSS that formats the radio buttons.2The Qualtrics
questionnaire is then ready to collect data.
After data collection, the raw Qualtrics dataset in
wide format will contain columns with continuous records
of the subjects’ mouse coordinates (xPos and yPos),
the absolute time (ms since January 1, 1970, 00:00:00
UTC, which is the standard origin time in JavaScript)
at which these coordinates were recorded (t), the times
at which each trial began (onReadyTime), and the
times at which the subject chose a category button
(buttonClickTime). These variables are recorded as a
single string for each subject with a special character “a”
separating the individual recordings, enabling easy parsing
in R or another analysis software. That is, onReadyTime
and buttonClickTime are sampled once per trial,
while xPos,yPos,andtare sampled as a triplet
approximately every 16–18 ms.3Additionally, the user’s
2The CSS code is also available as a standalone file (
3Specifically, our JavaScript function for recording mouse position
(getMousePosition) is triggered by “mousemove” events issued
by the browser. Therefore, the frequency of mousemove events
determines the minimum time interval for measuring mouse position.
The current World Wide Web Consortium standards (“UI events”,
2018) do not specify a frequency at which browsers should issue
mouse events, but at the time of writing, the most common browsers
use a de facto standard 60-Hz rate (to match the most common display
screen refresh rate). Other factors may also contribute to the sampling
rate, including mouse DPI (the number of positions reported by the
mouse per inch of movement), the system’s USB polling rate (how
often the mouse is queried for data), and potential variable lag due to
high demand on CPU resources. However, in practice we have found
that the effect of these factors must be minimal because our median
mouse position interval (17 ms) agrees well with the 60-Hz event
reporting interval of 16.7 ms.
Behav Res
browser, browser version, operating system, and browser
resolution are recorded. Table 2provides details on these
variables, along with additional variables that are collected
in the raw Qualtrics data but were not used in the present
The R code in data prep.R automatically checks
the data for idiosyncratic problems, returning a list of
subjects flagged for possible exclusion, along with reasons
(see “Special considerations for online use” below for
details). The R code then parses the raw data downloaded
from Qualtrics, computes the outcome measures described
above, and returns the dataset in an analysis-ready format.
Specifically, the code first parses the character-separated
strings into a list for each subject, each of which contains
a list for each experimental stimulus. For example, a
particular subject might have the following x-coordinate
lists for the first three stimuli (prior to rescaling the
trajectories to unit length):
[1]947 946 946 946 946 944 941 938 936 934 932 927 922 916 910
[16]908 906 903 899 894 887 880 874 867 859 850 839 829 815 803
[31]794 786 777 768 758 750 744 736 728 723 719 717 714 709 703
[46]700 696 692 690 689 687 684 681 680 678 676 675 674 672 670
[61]669 668 668
[1]972 968 964 960 956 951 946 939 927 917 908 900 888 876 862
[16]847 831 816 801 784 772 763 753 743 733 725 721 715 709 704
[31]699 696 694 692 689 685 683 682 679 676 675 674 674 673 672
[46]671 671
[1]988 987 986 982 977 972 966 961 953 942 927 910 894 878 866
[16]849 826 808 792 781 771 761 751 745 741 738 733 729 725 722
[31]719 715 710 707 704 701 699 696 693 689 686 685 683 682 681
[46]679 678 676 676 676
In the process, the code accounts for the possibility of
order-randomized Loop & Merge iterates by appropriately
reordering the coordinate and time data. The outcome
measures are computed for each subject and appended
to the wide-format dataset. By default, our analysis code
defines the time variable as the time elapsed from the
beginning of each trial, specifically the time at which the
page was loaded. Note that if the trajectories are to be
directly averaged rather than used to compute the outcome
measures we describe, the times should be standardized to
account for differences in the times elapsed for each trial
(Freeman & Ambady, 2010). This can be accomplished
simply by passing the argument rescale = TRUE to
the function get subject lists when parsing the time
data. Additional outcome measures, such as trajectory
curvature (Dale et al., 2007; Kieslich & Hilbig, 2014;
Wojnowicz et al., 2009) or speed profiles throughout a trial
(Freeman et al., 2016), could also be easily calculated from
the raw coordinate data supplied by the provided R scripts.
Finally, the dataset is reshaped into an analysis-friendly long
format, such that there is one row for each trial rather than
for each subject:
id cat xflips xdev area speed rxnt
11Robot 0 0.132 0.0599 0.00295 1048
22Robot 0 0.112 0.0577 0.00906 701
33Robot 1 0.225 0.1638 0.00776 1184
44Robot 2 0.266 0.1473 0.00328 2022
55Robot 2 0.254 0.1129 0.00655 1410
66Robot 2 0.254 0.1180 0.01493 1037
(Note that the outcome measures xflips,xdev,
and area are computed using rescaled trajectories, so
are unitless.) The code also prints information about
alert messages displayed to subjects, discussed in the
next section. Although analysis methods will differ by
substantive application, we provide an example R file,
analysis.R, which conducts the analyses described in
Validation study”below.
Methodological details
Optimizing subject behavior for mouse-tracking
If subjects sometimes make their category decisions prior
to moving their mouse cursors—that is, if they wait to
begin moving their cursors until they have already made a
decision—then their mouse trajectories may begin too late
to capture dynamic category competition (Freeman et al.,
2016). For this reason, at the end of each trial in which the
subject took more than 700 ms (by default) to begin moving
the cursor, the questionnaire issues a “started too late”
alert warning the subject to begin moving the cursor faster
at the beginning of each trial. Additionally, to encourage
fast decision-making and discourage subjects from taking
unscheduled breaks from the experiment, after any trial in
which the subject takes longer than 5000 ms (by default)
to make a category decision, the questionnaire issues a
Behav Res
Tab le 2 Codebook of mouse-tracking, timing, and computing system variables in raw Qualtrics data
Variable Units Meaning
xPos px x-coordinate of cursor relative to upper left-hand
corner of browser window
yPos px y-coordinate of cursor relative to upper left-hand corner
time ms since 1970-01-01 Time at which each coordinate pair was measured
0:00:00 UTC
onLoadTime ms since 1970-01-01 Time at which page for each trial started loading
0:00:00 UTC
onReadyTime ms since 1970-01-01 Time at which the page for each trial was loaded
0:00:00 UTC (beginning of trial)
buttonClickTime ms since 1970-01-01 Time at which subject made category decision
0:00:00 UTC (end of trial)
pageSubmitTime ms since 1970-01-01 Time at which subject proceeded to next trial by
0:00:00 UTC clicking “Next”
windowWidth px Width of subject’s browser window at beginning of trial
windowHeight px Height of subject’s browser window at beginning of trial
alerts N/A Alerts received during each trial:
0 = none
1 = started too early
2 = started too late
3 = surpassed time limit for trial
4 = window too small to fully display experiment
latency ms Time between onReadyTime and first mouse move
stimulusOrder N/A Stimulus URLs for each trial in the order presented to
browser Browser N/A Internet browser
browser Version N/A Browser version
browser Operating.System N/A Operating system
browser Resolution N/A Browser resolution
“took too long” alert reminding the subject to answer more
quickly (Freeman et al., 2016). Some investigators choose
not to limit total response time (e.g., Kieslich & Hilbig
2014), in which case the parameter maxLatencyTime could
simply be set to a very large value, such as 50,000 ms.
All alerts are recorded in the dataset at the time they are
triggered, but to avoid disrupting the subject’s behavior
during the trial, they are not displayed onscreen until after
the subject selects a category button, but before the subject
clicks the Next button to proceed to the next trial. The
recorded alert data allow investigators to exclude trials or
subjects receiving certain types of alerts if desired. The full
text of all alert messages appears in the Online Supplement.
Special considerations for online use
As mentioned, allowing subjects to complete the experiment
on their own devices, rather than in a controlled lab setting,
poses several challenges to collecting reliable and precise
mouse-tracking data. For example, the software cannot
precisely position the subject’s cursor at the start of each
trial; browsers do not provide this functionality to preclude
malicious misuse. Furthermore, the experiment interface is
displayed with the same pixel dimensions for every subject
and trial, regardless of the size and resolution of each
subject’s screen, potentially yielding interfaces of somewhat
differing visual sizes for different subjects. Fixing the visual
size, rather than the pixel dimensions, of the experiment
interface across subjects was not feasible because the survey
software does not have reliable access to data on each
subject’s screen size and resolution. Additionally, if subjects
attempted to complete the experiment with a browser
window that is smaller than the size of the experiment
interface (for example, because their devices’ screens are
physically too small), then they might have to scroll in
the middle of each trial, leading to non-continuous mouse
trajectories and erroneous reaction times.
Our JavaScript implementation addresses each of these
possibilities. To ensure that the cursor starts in an
approximately fixed location, the Next button, which is
Behav Res
the necessary ending point for the cursor on every trial, is
positioned in the same location on every trial. Furthermore,
if the subject moves the cursor away from this position
before the next trial begins (i.e., while the page is loading),
the questionnaire issues a “started too early” alert to warn
the subject not to begin moving the cursor before the page
is loaded. During the first training trial, the code checks the
pixel dimensions of the subject’s browser window, and if
the window is smaller than the expected pixel dimensions
of the experiment interface, the questionnaire issues an alert
instructing the subject to increase the window size until the
stimulus image, both radio buttons, and the submit button
are fully visible. On subsequent trials, the subject’s ability to
scroll is disabled, such that subjects using devices with too-
small screens or browser windows will not have access to
the Next button and will thus be unable to proceed through
the experiment.
As mentioned above, the use of fixed pixel dimensions
does not guarantee that the visual distance between the
buttons will be the same for every subject due to the many
possible combinations of different physical dimensions of
computer monitors and different pixel-per-inch resolutions.
In addition, some subjects might use their browser’s zoom
function, changing both the pixel distances and the visual
distances. Therefore, our R analysis code by default rescales
all trajectories to unit length in both the x-andy-
dimensions. However, the validation study described in
Validation study” below found systematically larger values
of the outcome measures for subjects with trajectories
suggesting non-standard pixel scaling due, for example, to
zooming typically showed larger values of the outcome
measures. These differences persisted despite that the
trajectories had been rescaled to unit length. Importantly,
despite these mean differences on the outcome measures,
the key stimulus ambiguity effects were comparable
between subjects with non-standard pixel scaling and
subjects with standard pixel scaling. In practice, then,
investigators might choose to simply adjust analysis models
for covariates indicating whether a subject had non-standard
pixel scaling (operationalized as having unexpectedly large
or small pixel distances between the starting and ending
x-coordinates on any trial) and whether a subject had
ever had a too-small window; this is the approach we
adopt in the validation study. Because the experimental
manipulation is randomized, these idiosyncrasies of the
visual display size effectively introduce “non-differential”
noise in the continuous outcome measures, in which case
the estimate for the effect of stimulus ambiguity remains
unbiased even without adjustment for the scaling and
window size variables (Rothman, Greenland, Lash, & et
al. 2008). Thus, estimates should be comparable across
samples with different frequencies of non-standard scaling
and too-small windows. However, adjusting for these
variables as “precision covariates” may improve statistical
power by removing some of the residual variation on the
outcome measures that is due to these visual idiosyncrasies
rather than to stimulus ambiguity. The provided R code
automatically includes these two indicator variables (called
weird.scaling and wts, respectively) in the prepared
long-format dataset. Alternatively, subjects displaying these
idiosyncrasies could simply be excluded.
As an additional data quality concern in online settings,
it is sometimes possible for automated “bots” to complete
Mechanical Turk tasks, yielding invalid data (Difallah,
Demartini, & Cudr´
e-Mauroux, 2012). Because bots do
not physically use computer mice or trackpads to proceed
through the questionnaire, but rather select buttons directly,
they would not provide any mouse trajectories at all for
our data collection system to erroneously record. If a bot
managed to complete the questionnaire and respond to any
alerts in the process, our data preparation script would
automatically flag its data for exclusion due to missing
Extensions to other survey platforms
This software is tailored to the Qualtrics survey platform.
However, because the specialized functions that manage
the collection of mouse trajectory and timing data are
entirely contained in the JavaScript, this code could be
readily adapted to other online survey platforms or custom
experimental interfaces as long as they are able to: (1)
support addition of custom JavaScript, and provide a
JavaScript API with basic functions similar to Qualtrics’
present multiple stimuli iteratively, while recording their
possibly randomized order; and (3) display the experiment
at fixed pixel dimensions. In short, to use this software on
another platform, an investigator would need to use that
platform’s user interface to adjust the questionnaire display
and flow to imitate our Qualtrics-implemented design and
would need to add our custom JavaScript, replacing the
small number of calls to the Qualtrics API with the relevant
functions for the investigator’s own platform. Additionally,
the values of some JavaScript global variables related to
the display of the experiment, such as minWindowWidth
and minWindowHeight, might require modification. The
JavaScript is thoroughly commented to facilitate such
adaptation and further modification by other users. Finally,
it would also be possible for investigators with experience
coding in HTML to create a simple survey platform,
incorporating our Javacsript code, that could be hosted on
their own servers or used to run subjects in the lab.
Behav Res
Our implementation has limitations. Occasional idiosyn-
crasies (e.g., extremely poor quality connections, use of
proxy servers) can cause losses of coordinate data for some
trials or subjects. Our R code automatically checks for sub-
jects with these data losses and creates a list of subject IDs
that should be excluded, along with reasons for exclusion.
The validation study presented below suggested that these
issues affect a small fraction of trials for approximately
10% of subjects when data are collected in an uncontrolled
crowdsourcing setting. A conservative analysis approach,
which we adopt in the validation study, could be to exclude
every subject with data losses on any trial.
Additionally, our implementation cannot control sub-
jects’ individual mouse speed settings. That is, different
mice and trackpads may be set to respond with larger or
smaller onscreen movements for any given physical move-
ment of the subject’s hand, and these differences in mouse
dynamics could affect the confusion measures. Because our
implementation collects data through an Internet browser,
it is not able to measure subjects’ mouse speeds indepen-
dently of, for example, their hand speeds. However, like the
visual idiosyncrasies produced by non-standard pixel scal-
ing or small browser windows, we would expect differences
in mouse speed settings to introduce only non-differential
noise in the outcomes and thus not compromise estima-
tion of stimulus ambiguity effects (albeit with some loss of
statistical power).
Last, although our implementation appears to perform
reliably across common browsers (see “Effect of stimulus
ambiguity on mouse trajectories”), it is incompatible with
Internet Explorer; subjects running Internet Explorer will be
unable to proceed through the questionnaire, and no data
will be collected. (At present, Internet Explorer has only
a 3% share of browser usage worldwide (“Browser market
share worldwide”, 2018). Finally, subjects with very slow
Internet connections, causing image stimuli to load slowly,
may receive a large number of “started too late” alerts,
although their data will otherwise be useable. In practice,
subjects with a high frequency of “started too late” alerts
could be discarded if this were of concern.
Validation study
Design and subjects
To validate the provided software, we used it to perform a
simple category confusion experiment using image stimuli
depicting the faces of humanoid robots ranging from very
mechanical to very humanlike. Previous work (e.g., Mathur
& Reichling 2016; Mathur & Reichling 2009) suggests
that humanoid robot faces that closely, but imperfectly,
resemble humans—those occupying the “Uncanny Valley”
(Mori, 1970)—can provoke intense feelings of eeriness,
dislike, and distrust in human viewers. One mechanism of
these negative reactions may be that robots occupying the
Uncanny Valley provoke category confusion, which may
itself be aversive (Yamada, Kawabe, & Ihaya, 2013). In
partial support for this hypothesis, Mathur and Reichling
(2016) found that robot faces in the Uncanny Valley elicited
the most category confusion. As a validation, we attempted
to conceptually reproduce Mathur and Reichling (2016)’s
findings using the mouse-tracking software presented here.
From Mathur and Reichling (2016)’s stimuli, we arbitrarily
selected five “unambiguous” faces not occupying the
Uncanny Valley (Fig. 2, row 1) and five “ambiguous”
faces occupying the Uncanny Valley (Fig. 2,row2).
Given previous findings regarding these faces (Mathur
& Reichling, 2016), we expected mouse trajectories to
indicate greater average confusion for ambiguous faces
vs. unambiguous faces. We analyzed mouse trajectories
from n=188 United States subjects recruited on Amazon
Mechanical Turk from among users with a prior task
approval rating of at least 95%. We compensated subjects
$0.25 to complete the study and set a time limit of 20
minutes for the entire task to discourage subjects from
taking long breaks from the study. Subjects used the
template Qualtrics questionnaire provided here to categorize
each face as either a “robot” or a “human”. We randomized
the order of stimulus presentation for each subject. A link to
a live demonstration version of the questionnaire is provided
Statistical analysis
We regressed each of the five outcome measures described
in “A basic category-competition experiment” on a binary
indicator for stimulus ambiguity. For ease of interpretation,
we first standardized the four continuous outcome vari-
ables (area, maximum x-deviation, peak speed, and reaction
time); thus, their coefficients represent the average number
of standard deviations by which the outcome measure was
larger for ambiguous versus unambiguous trials. Regression
models were semiparametric generalized estimating equa-
tions (GEE) models with a working exchangeable correla-
tion structure and robust inference, and the unit of analysis
was trials (1880 observations). We chose the GEE specifi-
cation in order to account for arbitrary correlation structures
within subjects and within stimuli, as well as to avoid mak-
ing distributional assumptions on the residuals for highly
skewed outcomes such as reaction time. Models for contin-
uous outcomes used the identity link, while the model for
x-flips used the Poisson link. To account for residual varia-
tion in the visual display size of the experiment as described
Behav Res
Fig. 2 Mouse trajectories for a single subject categorizing unambiguous (top row) versus ambiguous (bottom row) humanoid robot faces.
Trajectories have been rescaled to unit length in both the x-andy-dimensions
in “Special considerations for online use” above, each out-
come model included main effects of indicator variables for
non-standard pixel dimensions and for too-small browser
windows (the variables weird.scaling and wts), as
well as all possible interactions among these nuisance vari-
ables and the stimulus ambiguity indicator. As a sensitivity
analysis, we also performed the analyses excluding all such
subjects (for an analyzed n=103) rather than adjusting
for the nuisance covariates, yielding nearly identical point
estimates and inference.
Descriptive measures
Table 3displays demographic characteristics of the ana-
lyzed subjects, as well as their Internet browsers and operat-
ing systems. We collected data on 203 subjects (using an a
priori sample size determination of n=200) and excluded
24 due to idiosyncratic timing issues, yielding an analyzed
sample size of 188. These exclusion criteria are conserva-
tive in that we excluded all trials for any subject with these
problems on any trial, even if only a small number of tri-
als were affected. As discussed in “Special considerations
for online use”, our questionnaire also collects data on scal-
ing and window size idiosyncrasies that do still allow for
normal data collection but that could in principle affect the
confusion measures; of the analyzed subjects, 18 had a too-
small window on at least one trial, and 77 had non-standard
pixel dimensions on at least one trial. No subject’s data
indicated a clear violation of the instructions, so we com-
pensated all subjects who completed the study on Amazon
Mechanical Turk.
Across all trials, subjects used a median browser window
height of 775 px (25th percentile: 726 px; 75th percentile:
938 px) and a median window width of 1532 px (25th
percentile: 1366 px; 75th percentile: 1846 px). Across
all trials, the median reaction time was 1170 ms (25th
percentile: 859 ms; 75th percentile: 1628 ms). The average
latency (that is, the time elapsed between the beginning of
Behav Res
Tab le 3 Demographics and computing system characteristics for
subjects in validation study
Total N 188
Age (mean (SD)) 36.80 (11.73)
Did not graduate high school 2 (1.1)
Graduated 2-year college 35 (18.6)
Graduated 4-year college 75 (39.9)
Graduated high school 54 (28.7)
Post-graduate degree 22 (11.7)
Female (mean (sd)) 0.52 (0.50)
Race (n (%))
Black/African American 16 (8.5)
Caucasian 144 (76.6)
Native American 8 (4.3)
East Asian 12 (6.4)
Hispanic 14 (7.4)
Middle Eastern 4 (2.1)
Southeast Asian 3 (1.6)
South Asian 2 (1.1)
Browser (n (%))
Chrome 153 (81.4)
Edge 2 (1.1)
Firefox 33 (17.6)
Operating system (n (%))
Chrome OS 7 (3.7)
Linux 6 (3.2)
Macintosh 19 (10.1)
Windows 155 (82.4)
the trial and the subject’s first mouse movement) was 442
ms (25th percentile: 87 ms; 75th percentile: 640 ms), which
is short enough to suggest that the mouse trajectories would
have captured dynamic competition processes occurring
almost immediately after stimulus presentation. Across all
sampled mouse coordinate pairs, the median sampling rate
was 17 times per second (25th percentile: 16 ms; 75th
percentile: 18 ms). To provide some reference for the
frequency of alert messages that can be expected, 68% of
trials received no alerts, and the remaining 32% of trials
received a median of 1 alert (of a maximum of 4).4Tab l e 4
displays the relative frequencies of each alert type among
all alerts received, and Table 5displays the percent of
4It may appear counterintuitive that a trial could receive all four alerts,
including both “Started too early” and “Started too late”. However,
this can occur if the subject moves the cursor outside the Next button
before the subsequent trial has fully loaded (“Started too early”) but
then, once the subsequent trial is loaded, waits too long to move the
cursor again (“Started too late”). To avoid confusing the subject, in
this situation, only the “Started too early” alert is actually displayed,
though both alerts are recorded in the dataset.
Tab le 4 Summary of all 711 alert messages received in validation
study across all 1880 trials
Alert type % of all alerts received
Started too early 40
Started too late 31
Surpassed trial time limit 8
Window too small 21
subjects receiving each alert type at least once. The fairly
high frequency of alerts is to be expected: as discussed
in “Optimizing subject behavior for mouse-tracking”, the
alerts, particularly those instructing the subject to begin
moving the cursor sooner or to avoid moving it before
the trial is fully loaded, are designed to optimize subject
behavior rather than to indicate invalid data.
Effect of stimulus ambiguity on mouse trajectories
As a visual example of the mouse trajectories, Fig. 2
shows unit-scaled trajectories from the fifth subject. For this
subject, ambiguous faces 6, 8, and 9 in particular elicited
mouse trajectories characteristic of substantial category
confusion, evidenced by x-flips and large deviations from
the ideal trajectory. (The reason for the rightward trajectory
for face 7 is that the subject classified this face as “Human”,
whereas all the other faces were classified as “Robot”.)
Figure 3aggregates outcome data across subjects in violin
plots that display the medians of each standardized outcome
measure for ambiguous versus unambiguous stimuli, as
well as density estimates of their distributions. These
results indicate visually that each measure of confusion
was on average higher for ambiguous versus unambiguous
stimuli. That is, aligning with the predicted results discussed
in “A basic category-competition experiment”, subjects’
cursors appeared to make more horizontal changes of
direction, to make less direct paths, and to reach higher
peak speeds for ambiguous versus unambiguous stimuli,
and furthermore trials for ambiguous stimuli elicited longer
reaction times. Point estimates from the GEE models of the
mean difference for each confusion measure for ambiguous
versus unambiguous stimuli (Fig. 3) were in the predicted
Tab le 5 Percent of subjects (n= 188) receiving each type of alert
message at least once across 10 trials
Alert type % of subjects
Started too early 60
Started too late 58
Surpassed trial time limit 20
Window too small 10
Behav Res
Ambiguous Unambiguous
Stimulus type
x−flips (count)
^ = 1.57 (p < 0.0001)
Ambiguous Unambiguous
Stimulus type
Area (std.)
^ = 2e−04 (p < 0.0001)
Ambiguous Unambiguous
Stimulus type
Max x−deviation (std.)
^ = 0.03 (p < 0.0001)
Ambiguous Unambiguous
Stimulus type
Peak speed (std.)
^ = 0.01 (p < 0.0001)
Ambiguous Unambiguous
Stimulus type
Reaction time (std.)
^ = 0.01 (p < 0.0001)
Fig. 3 Violin plots showing standardized outcome data for 1880 trials
(188 subjects) for ambiguous versus unambiguous face stimuli. Violin
contours are mirrored kernel density estimates. Horizontal lines within
violins are medians.
β= GEE estimate of mean difference (ambiguous
- unambiguous); p=pvalue for difference estimated by robust GEE
direction for all stimuli (with p<0.0001 for all outcomes).
Collectively, these results suggest that the software and
methods presented here adequately capture confusion
when implemented through realistic crowdsourced data
Consistency of results across computing systems
As a post hoc secondary analysis to assess the consistency
of these stimulus ambiguity effects across browsers and
operating systems, we refit the regression models including
interaction terms of browser (Firefox vs. Chrome) and of
operating system (Macintosh vs. Windows) with stimulus
ambiguity. The resulting coefficients thus estimate the
differences in the stimulus ambiguity effect on confusion
between browsers or between operating systems. We
excluded subjects who used other, much less common,
browsers and operating systems due to their small sample
sizes, yielding 1720 trials in this analysis. Across the five
outcome models, the browser interaction coefficients ranged
in absolute value from 0.03 to 0.28 with pvalues from 0.32
to 0.54, and the operating system interaction coefficients
ranged in absolute value from 0.004 to 0.26 with pvalues
from 0.14 to 0.62. While this validation study was not
specifically powered to assess for differences in results
across browsers and operating system effects, these results
suggest that any such effects are likely fairly small.
Reproducibility All data, code, and materials required to reproduce the
validation study are publicly available and documented (
Online supplement The Online Supplement, containing the instruc-
tions and alert messages displayed to subjects, is publicly available
Acknowledgments This research was supported by NIH grant R01
CA222147 and by a Harvard University Mind, Brain, & Behavior
grant. The funders had no role in the design, conduct, or reporting of
this research. We thank Jackson Walters for helpful discussions and for
providing open-source JavaScript that helped us develop our software
(Walters, 2018).
Behav Res
Open Access This article is distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://, which permits unrestricted
use, distribution, and reproduction in any medium, provided you give
appropriate credit to the original author(s) and the source, provide a
link to the Creative Commons license, and indicate if changes were
Browser market share worldwide (2018).
browser-market-share/all/worldwide/2018; StatCounter.
Dale, R., & Duran, N. D. (2011). The cognitive dynamics of negated
sentence verification. Cognitive Science,35(5), 983–996.
Dale, R., Kehoe, C., & Spivey, M. J. (2007). Graded motor responses
in the time course of categorizing atypical exemplars. Memory &
Cognition,35(1), 15–28.
Difallah, D. E., Demartini, G., & Cudr´
e-Mauroux, P. (2012).
Mechanical cheat: Spamming schemes and adversarial techniques
on crowdsourcing platforms. In CrowdSearch, (pp. 26–30).
Farmer, T. A., Anderson, S. E., & Spivey, M. J. (2007). Gradiency and
visual context in syntactic garden-paths. Journal of Memory and
Language,57(4), 570–595.
Freeman, J. B., & Ambady, N. (2010). MouseTracker: Software for
studying real-time mental processing using a computer mouse-
tracking method. Behavior Research Methods,42(1), 226–241.
Freeman, J. B., & Johnson, K. L. (2016). More than meets the
eye: Split-second social perception. Trends in Cognitive Sciences,
20(5), 362–374.
Freeman, J. B., Ambady, N., Rule, N. O., & Johnson, K. L. (2008).
Will a category cue attract you? Motor output reveals dynamic
competition across person construal. Journal of Experimental
Psychology: General,137(4), 673.
Freeman, J. B., Pauker, K., & Sanchez, D. T. (2016). A perceptual
pathway to bias: Interracial exposure reduces abrupt shifts in real-
time race perception that predict mixed-race bias. Psychological
Science,27(4), 502–517.
Gosling, S. D., Sandy, C. J., John, O. P., & Potter, J. (2010).
Wired but not weird: The promise of the Internet in reaching
more diverse samples. Behavioral and Brain Sciences,33(2-3),
Kieslich, P. J., & Henninger, F. (2017). Mousetrap: An integrated,
open-source mouse-tracking package. Behavior Research Meth-
ods,49(5), 1652–1667.
Kieslich, P. J., & Hilbig, B. E. (2014). Cognitive conflict in social
dilemmas: An analysis of response dynamics. Judgment &
Decision Making,9(6), 510–522.
Mathur, M. B., & Reichling, D. B. (2009). An uncanny game
of trust: Social trustworthiness of robots inferred from subtle
anthropomorphic facial cues. In 2009 4th ACM/IEEE international
conference on human-robot interaction (HRI), (pp. 313–314):
Mathur, M. B., & Reichling, D. B. (2016). Navigating a social world
with robot partners: A quantitative cartography of the uncanny
valley. Cognition,146, 22–32.
Mori, M. (1970). The uncanny valley. Energy,7(4), 33–35.
Rothman, K. J., Greenland, S., Lash, T. L., & et al. (2008).
Modern epidemiology Vol. 3. Philadelphia: Wolters Kluwer
Health/Lippincott Williams & Wilkins.
Spivey, M. J., Grosjean, M., & Knoblich, G. (2005). Continuous
attraction toward phonological competitors. Proceedings of the
National Academy of Sciences,102(29), 10393–10398.
UI events (2018).
mousemove; W3C Working Draft.
Walters, J. (2018). JavaScript mouse-tracking. GitHub repository.; GitHub.
Wojnowicz, M. T., Ferguson, M. J., Dale, R., & Spivey, M. J. (2009).
The self-organization of explicit attitudes. Psychological Science,
20(11), 1428–1435.
Yamada, Y., Kawabe, T., & Ihaya, K. (2013). Categorization difficulty
is associated with negative evaluation in the “uncanny valley”
phenomenon. Japanese Psychological Research,55(1), 20–32.
Yu, Z., Wang, F., Wang, D., & Bastin, M. (2012). Beyond
reaction times: Incorporating mouse-tracking measures into the
implicit association test to examine its underlying process. Social
Cognition,30(3), 289–306.
Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
... The purpose of this study is to deploy a mousetracking experiment in Qualtrics (based on code provided by Mathur and Reichling, 2019) to assess whether information is processed differently on social media than information presented by itself, and to understand whether those cognitive processes are moderated by health literacy. Mouse-tracking can help reveal whether comprehension and decisions about health information are processed in incremental or distinct stages. ...
... Participants answered demographic information and completed the experiment on Qualtrics (url: The main mouse tracking experiment and code were based on methodology outlined in Mathur and Reichling (2019). During the main experiment, participants were asked to determine as quickly and accurately as possible whether health statements were true or false. ...
Health misinformation is a problem on social media, and more understanding is needed about how users cognitively process it. In this study, participants' accuracy in determining whether 60 health claims were true (e.g., "Vaccines prevent disease outbreaks") or false (e.g., "Vaccines cause disease outbreaks") was assessed. The 60 claims were related to three domains of health risk behavior (i.e., smoking, alcohol and vaccines). Claims were presented as Tweets or as simple text statements. We employed mouse tracking to measure reaction times, whether processing happens in discrete stages, and response uncertainty. We also examined whether health literacy was a moderating variable. The results indicate that information in statements and tweets is evaluated incrementally most of the time, but with overrides happening on some trials. Adequate health literacy scorers were more uncertain when responding to tweets than for statements, but they were more accurate when responding to tweets. Inadequate scorers were more confident on statements than tweets but equally accurate on both. These results have important implications for understanding the underlying cognition needed to combat health misinformation online. Significance Statement Over 70% of the U.S. population has a social media account, and are susceptible to misinformation online, especially those with inadequate health literacy. For example, misinformation about vaccines on Twitter may make users more hesitant to get inoculated. This study examines misinformation related to health risk topics (vaccines, alcohol and tobacco) on social media. We employed mouse tracking to better understand how information presented as a tweet is processed compared to information presented as a simple statement. Public health researchers will benefit from the finding that inadequate health literacy scorers are overconfident when evaluating Tweets. Conversely, cognitive psychologists can use the results to better understand how intuition and deliberate decision making interact in real time. The combined results may inform researchers across fields with information to combat the spread of false information online.
... The purpose of this study is to deploy a mouse-tracking experiment in Qualtrics (based on code provided by Mathur and Reichling, 2019) to assess the cognitive processes involved in veracity judgment and decision-making about health misinformation, to assess whether information is processed differently on social media, and to understand whether those cognitive processes (e.g., using the central/peripheral route) differ based on one's health literacy. By carefully controlling presentation of stimuli and measuring cursor movements over time, mouse-tracking can help reveal whether health information is comprehended and decisions made about its veracity are processed in parallel, or if comprehension and veracity decisions happen in two different, distinct stages. ...
... Participants answered demographic information and completed the experiment on Qualtrics ( The main mouse tracking experiment and code was based on methodology outlined in Mathur and Reichling (2019). During the main experiment, participants were asked to determine as quickly and accurately as possible whether health statements were true or false. ...
Health misinformation is a problem on social media sites like Twitter. Health literacy is an important moderating factor for evaluating health information and misinformation. Health literacy evolves and is influenced by social media consumption, but it may also depend on the ability to process information and other underlying cognitive factors. In this study, we determined the health literacy (adequate or inadequate using the Newest Vital Sign questionnaire) for 178 adults. Their accuracy in determining whether 60 health claims were true (e.g., “Alcohol is a drug”) or false (e.g., “Alcohol is not a drug”) was assessed. Claims were formatted as either Tweets or as simple text statements on a computer screen. We also employed mouse tracking to measure reaction times and response uncertainty. The 60 statements were related to cancer risk behavior (i.e., smoking, alcohol and HPV vaccines). The results indicate that tweets affect adequate and inadequate health literacy scorers differently: adequate scorers become more accurate and less confident, but inadequate scorers become less accurate and more confident.
... By this procedure, every trial started with the cursor positioned on the Next button. We adopted this protocol from Mathur & Reichling [25]. ...
... The cursor's position (x, y) during decision making was sampled at 60 Hz, using a JavaScript program made by Mathur & Reichling [25]. All experimental procedures were carried out using Qualtrics ( ...
Full-text available
Humans dislike unequal allocations. Although often conflated, such ‘inequality-averse’ preferences are separable into two elements: egalitarian concern about the variance and maximin concern about the poorest (maximizing the minimum). Recent research has shown that the maximin concern operates more robustly in allocation decisions than the egalitarian concern. However, the real-time cognitive dynamics of allocation decisions are still unknown. Here, we examined participants' choice behaviour with high temporal resolution using a mouse-tracking technique. Participants made a series of allocation choices for others between two options: a ‘non-Utilitarian option’ with both smaller variance and higher minimum pay-off (but a smaller total) compared with the other ‘Utilitarian option’. Choice data confirmed that participants had strong inequality-averse preferences, and when choosing non-utilitarian allocations, participants' mouse movements prior to choices were more strongly determined by the minimum elements of the non-Utilitarian options than the variance elements. Furthermore, a time-series analysis revealed that this dominance emerged at a very early stage of decision making (around 500 ms after the stimulus onset), suggesting that the maximin concern operated as a strong cognitive anchor almost instantaneously. Our results provide the first temporally fine-scale evidence that people weigh the maximin concern over the egalitarian concern in distributive judgements.
... com). The main mouse tracking experiment and code were based on methodology outlined in Mathur and Reichling (2019). During the main experiment, participants were asked to determine as quickly and accurately as possible whether health statements were true or false. ...
Full-text available
Health misinformation is a problem on social media, and more understanding is needed about how users cognitively process it. In this study, participants’ accuracy in determining whether 60 health claims were true (e.g., “Vaccines prevent disease outbreaks”) or false (e.g., “Vaccines cause disease outbreaks”) was assessed. The 60 claims were related to three domains of health risk behavior (i.e., smoking, alcohol and vaccines). Claims were presented as Tweets or as simple text statements. We employed mouse tracking to measure reaction times, whether processing happens in discrete stages, and response uncertainty. We also examined whether health literacy was a moderating variable. The results indicate that information in statements and tweets is evaluated incrementally most of the time, but with overrides happening on some trials. Adequate health literacy scorers were equally certain when responding to tweets and statements, but they were more accurate when responding to tweets. Inadequate scorers were more confident on statements than on tweets but equally accurate on both. These results have important implications for understanding the underlying cognition needed to combat health misinformation online.
... Metrics. En 2019, un logiciel open-source à implémenter dans la plateforme Qualtrics a été développé permettant de réaliser à distance une tâche de MT (Mathur & Reichling, 2019). Ce développement montre l'intérêt grandissement pour la réalisation de la tâche de MT à distance également d'un point de vue académique. ...
Cette thèse propose de nouvelles méthodologies permettant de mesurer de façon indirecte et quantitative les composantes affectives du consommateur. Dans une première série d’études, nous avons observé que des variables socio-affectives influencent la perception de l’espace. Plus particulièrement, des variables comme l’estime de soi et l’anxiété sociale modèrent la façon dont les individus perçoivent une ouverture. Nos résultats suggèrent que ce type de tâche pourrait être, à terme, utilisée pour évaluer l’effet socio-affectif d’un produit porté. Dans une seconde série d’étude, nous avons analysé le mouvement de la souris lorsque des consommateurs devaient réaliser une tâche de catégorisation dichotomique. Cette méthode semble permettre d’identifier et de hiérarchiser certaines caractéristiques relatives à l’identité d’une marque. Ces résultats suggèrent que cette méthode pourrait être, à terme, utilisée afin de prédire les comportements d’achats. En conclusion, ces travaux proposent de nouvelles mesures indirectes, basées sur des variables sensori-motrices, pour l’étude du consommateur.
... Qualtrics© was able to capture data from the participants online, using interactive information forms, which both stores and processes the data (Mathur & Reichling, 2019). The Qualtrics© sampling system is an established, online tool that facilitates gathering and analyzing data (Ginsberg, 2011). ...
Full-text available
Perspectives of higher education administrators on the perceived economic impact of adopting the 2016 proposed Fair Labor Standards Act policy changes on institutions of higher education in Tennessee
... This toolbox is additionally distributed as an R package available on CRAN [23,24]. Recently, Mathur and Reichling [25] also proposed another mouse-tracking JavaScript software tool (working along R code) which is adapted to Qualtrics platform and could be used in category-competition experiments. ...
Full-text available
The present study introduces a new MATLAB toolbox, called MatMouse, suitable for the performance of experimental studies based on mouse movements tracking and analysis. MatMouse supports the implementation of task-based visual search experiments. The proposed toolbox provides specific functions which can be utilized for the experimental building and mouse tracking processes, the analysis of the recorded data in specific metrics, the production of related visualizations, as well as for the generation of statistical grayscale heatmaps which could serve as an objective ground truth product. MatMouse can be executed as a standalone package or integrated in existing MATLAB scripts and/or toolboxes. In order to highlight the functionalities of the introduced toolbox, a complete case study example is presented. MatMouse is freely distributed to the scientific community under the third version of GNU General Public License (GPL v3) on GitHub platform.
Technical Report
Lebensmittelverschwendung stellt ein ökologisches, ökonomisches und ethisches Problem dar. Maßnahmen zur Reduktion der Lebensmittelverschwendung werden in Politik, Wirtschaft und Gesellschaft diskutiert. Die Vermarktung von Suboptimal Food, speziell Obst und Gemüse mit optischen Mängeln, ist Teil dieser Diskussionen. Mit dem Forschungsvorhaben "Marketing von Suboptimal Food im Öko-Handel" wurde das Ziel verfolgt Kaufbarrieren für Suboptimal Food von Öko-Konsument*innen zu identifizieren sowie Maßnahmen zur Verkaufsförderung zu diskutieren und exemplarisch praktisch zu erproben. Die Ergebnisse des Projekts lassen ein grundsätzliches Marktpotential für Suboptimal Food im Öko-Handel erkennen. Öko-Konsument*innen verfügen über ein hohes Problembewusstsein für Lebensmittelverschwendung und äußern selten ausgeprägte Qualitätsbedenken gegenüber Suboptimal Food. Statt von optischen Auffälligkeiten auf die innere Qualität zu schließen wird dies als Zeichen von Natürlichkeit und biologischer Produktion verstanden. Klare Präferenzen für unterschiedliche Formen von Suboptimalität werden nicht deutlich. Preisreduktionen zeigen in den Befragungen eine akzeptanzsteigernde Wirkung und die exemplarisch ermittelten durchschnittlich geforderten Preisreduktionen liegen zwischen 20 % und 30 %. Die Zahlungsbereitschaft für Suboptimal Food wird durch Umweltbewusstsein, Kaufintensität von Bio-Lebensmitteln und Kaufhäufigkeit von suboptimalem Obst und Gemüse positiv beeinflusst. Die Verkaufstests im Öko-Handel zeigen, dass Produkte mit geringfügigen Beeinträchtigungen der Optik sehr gut und ohne Preisnachlass von den Kund*innen angenommen werden. Bei eindeutigen optischen Mängeln bleiben die Produkte trotz Preisreduktion unverkäuflich. Die getesteten zwei Kommunikationsstrategien konnten den Absatz suboptimaler Produkte leicht steigern, wobei kein Unterschied zwischen den Strategien erkennbar ist.
Full-text available
In this rapidly digitizing world, it is becoming ever more important to understand people’s online behaviors in both scientific and consumer research settings. A cost-effective way to gain a deeper understanding of these behaviors is to examine mouse movement patterns. This research explores the feasibility of inferring personality traits from these mouse movement features (i.e., pauses, fixations, cursor speed, clicks) on a simple image choice task. We compare the results of standard univariate (OLS regression, bivariate correlations) and three forms of multivariate partial least squares (PLS) analyses. This work also examines whether mouse movements can predict task attentiveness, and how these might be related to personality traits. Results of the PLS analyses showed significant associations between a linear combination of personality traits (high Conscientiousness, Agreeableness, and Openness, and low Neuroticism) and several mouse movements associated with slower, more deliberate responding (less unnecessary clicks, more fixations). Additionally, several click-related mouse features were associated with attentiveness to the task. Importantly, as the image choice task itself is not intended to assess personality in any way, our results validate the feasibility of using mouse movements to infer internal traits across experimental contexts, particularly when examined using multivariate analyses and a multiverse approach.
Previous work reveals that political orientation is a relevant social identity for many people and that the desire to conform to political ingroup norms can drive belief and behavior change. Because pro-environmental behaviors are viewed as stereotypically liberal in the US, American conservatives may be less likely to engage in pro-environmental behavior, particularly when political identity and normative information are made salient. In four studies, we examine whether heightening the salience of political identity and providing information that one is conforming to or failing to conform to political group norms influences engagement in a pro-environmental behavior (recycling). Study 1 showed that undergraduates falsely believed that liberal students at their university recycled more than conservatives. In turn, while liberal and moderate students' self-reported recycling behavior was predicted by their perceptions of liberals' (but not conservatives') behavior, conservative students' behavior was predicted by perceptions of other conservatives' (but not liberals’) behavior. Studies 2–4 use a novel computerized recycling task and mouse-tracking software to examine whether, among politically conservative Americans, receiving feedback that their recycling behavior is inconsistent with stereotypic ingroup norms modifies behavior and motivates individuals to “recycle” less in the computerized task. In Studies 2 (university student sample) and 3 (preregistered; MTurk worker sample), those who received this feedback adjusted their automatic, but not deliberate responses, although patterns differed slightly between studies. However, in Study 4 (preregistered; MTurk worker sample), this effect was not found. Collectively, these studies suggest that inaccurate meta-beliefs may drive political polarization with respect to pro-environmental behavior, but inconsistencies in results across studies leave open questions about how they do so. This research also contributes to the literature by introducing new methodologies to study pro-environmental decision-making processes.
Full-text available
Mouse-tracking - the analysis of mouse movements in computerized experiments - is becoming increasingly popular in the cognitive sciences. Mouse movements are taken as an indicator of commitment to or conflict between choice options during the decision process. Using mouse-tracking, researchers have gained insight into the temporal development of cognitive processes across a growing number of psychological domains. In the current article, we present software that offers easy and convenient means of recording and analyzing mouse movements in computerized laboratory experiments. In particular, we introduce and demonstrate the mousetrap plugin that adds mouse-tracking to OpenSesame, a popular general-purpose graphical experiment builder. By integrating with this existing experimental software, mousetrap allows for the creation of mouse-tracking studies through a graphical interface, without requiring programming skills. Thus, researchers can benefit from the core features of a validated software package and the many extensions available for it (e.g., the integration with auxiliary hardware such as eye-tracking, or the support of interactive experiments). In addition, the recorded data can be imported directly into the statistical programming language R using the mousetrap package, which greatly facilitates analysis. Mousetrap is cross-platform, open-source and available free of charge from .
Full-text available
In two national samples, we examined the influence of interracial exposure in one’s local environment on the dynamic process underlying race perception and its evaluative consequences. Using a mouse-tracking paradigm, we found in Study 1 that White individuals with low interracial exposure exhibited a unique effect of abrupt, unstable White-Black category shifting during real-time perception of mixed-race faces, consistent with predictions from a neural-dynamic model of social categorization and computational simulations. In Study 2, this shifting effect was replicated and shown to predict a trust bias against mixed-race individuals and to mediate the effect of low interracial exposure on that trust bias. Taken together, the findings demonstrate that interracial exposure shapes the dynamics through which racial categories activate and resolve during real-time perceptions, and these initial perceptual dynamics, in turn, may help drive evaluative biases against mixed-race individuals. Thus, lower-level perceptual aspects of encounters with racial ambiguity may serve as a foundation for mixed-race prejudice.
Full-text available
Crowdsourcing is becoming a valuable method for companies and researchers to complete scores of micro-tasks by means of open calls on dedicated online platforms. Crowdsourcing results remains unreliable, however, as those platforms neither convey much information about the workers' identity nor do they ensure the quality of the work done. Instead, it is the responsibility of the requester to filter out bad workers, poorly accomplished tasks, and to aggregate worker results in order to obtain a final outcome. In this paper, we first review techniques currently used to detect spammers and malicious workers, whether they are bots or humans randomly or semi-randomly completing tasks; then, we describe the limitations of existing techniques by proposing approaches that individuals, or groups of individuals, could use to attack a task on existing crowdsourcing platforms. We focus on crowdsourcing relevance judgements for search results as a concrete application of our techniques.
Full-text available
Android robots are entering human social life. However, human-robot interactions may be complicated by a hypothetical Uncanny Valley (UV) in which imperfect human-likeness provokes dislike. Previous investigations using unnaturally blended images reported inconsistent UV effects. We demonstrate an UV in subjects' explicit ratings of likability for a large, objectively chosen sample of 80 real-world robot faces and a complementary controlled set of edited faces. An "investment game" showed that the UV penetrated even more deeply to influence subjects' implicit decisions concerning robots' social trustworthiness, and that these fundamental social decisions depend on subtle cues of facial expression that are also used to judge humans. Preliminary evidence suggests category confusion may occur in the UV but does not mediate the likability effect. These findings suggest that while classic elements of human social psychology govern human-robot social interaction, robust UV effects pose a formidable android-specific problem.
Full-text available
Recently, it has been suggested that people are spontaneously inclined to cooperate in social dilemmas, whereas defection requires effortful deliberation. From this assumption, we derive that defection should entail more cognitive conflict than cooperation. To test this hypothesis, the current study presents a first application of the response dynamics paradigm (i.e., mouse-tracking) to social dilemmas. In a fully incentivized lab experiment, mouse movements were tracked while participants played simple two-person social dilemma games with two options (cooperation and defection). Building on previous research, curvature of mouse movements was taken as an indicator of cognitive conflict. In line with the hypothesis of less cognitive conflict in cooperation, response trajectories were more curved (towards the non-chosen option) when individuals defected than when they cooperated. In other words, the cooperative option exerted more "pull" on mouse movements in case of defection than the non-cooperative option (defection) did in case of cooperation. This effect was robust across different types of social dilemmas and occurred even in the prisoner's dilemma, where defection was predominant on the choice level. Additionally, the effect was stronger for dispositional cooperators as measured by the Honesty-Humility factor of the HEXACO personality model. As such, variation in the effect across individuals could be accounted for through cooperativeness.
Full-text available
Although the Implicit Association Test (IAT) has been widely used as an implicit measure over the past decade, research into its underlying mechanism remains woefully insufficient, partly due to the limitation of reaction time measures it uses. In two experiments, we modified the procedures of flower-insect IAT and two implicit self-esteem IATs by instructing participants to respond with a computer mouse instead of pressing keys. Analysis of motor trajectories showed that, although participants chose the correct response button in most of the trials, their mouse movement was continuously attracted toward the alternative response button, suggesting that both response representations are partially and simultaneously activated during the process. Furthermore, analysis of velocity profiles indicated that mouse movements toward the correct response button were slower in incompatible trials than in compatible trials, especially for attribute stimuli. Theoretical and methodological implications of these results are discussed.
Full-text available
We explored the influence of negation on cognitive dynamics, measured using mouse-movement trajectories, to test the classic notion that negation acts as an operator on linguistic processing. In three experiments, participants verified the truth or falsity of simple statements, and we tracked the computer-mouse trajectories of their responses. Sentences expressing these facts sometimes contained a negation. Such negated statements could be true (e.g., "elephants are not small") or false (e.g., "elephants are not large"). In the first experiment, as predicted by the classic notion of negation, we found that negation caused more discreteness in the mouse trajectory of a response. The second experiment induced a simple context for these statements, yet negation still increased discreteness in trajectories. A third experiment enhanced the pragmatic context of sentences, and the discreteness was substantially diminished, with one primary measure no longer significantly showing increased discreteness at all. Traditional linguistic theories predict rapid shifts in cognitive dynamics occur due to the nature of negation: It is an operator that reverses the truth or falsity of an interpretation. We argue that these results support both propositional and contextual accounts of negation present in the literature, suggesting that contextual factors are crucial for determining the kind of cognitive dynamics displayed. We conclude by drawing broader lessons about theories of cognition from the case of negation.
Recent research suggests that visual perception of social categories is shaped not only by facial features but also by higher-order social cognitive processes (e.g., stereotypes, attitudes, goals). Building on neural computational models of social perception, we outline a perspective of how multiple bottom-up visual cues are flexibly integrated with a range of top-down processes to form perceptions, and we identify a set of key brain regions involved. During this integration, 'hidden' social category activations are often triggered which temporarily impact perception without manifesting in explicit perceptual judgments. Importantly, these hidden impacts and other aspects of the perceptual process predict downstream social consequences - from politicians' electoral success to several evaluative biases - independently of the outcomes of that process.
Human observers often experience strongly negative impressions of human-like objects falling within a particular range of visual similarity to real humans (the “uncanny valley” phenomenon). We hypothesized that negative impressions in the uncanny valley phenomenon are related to a difficulty in object categorization. We produced stimulus images by morphing two of each of real, stuffed, and cartoon human face images (Experiment 1). Observers were asked to categorize each of these images as either category and evaluated the likability of the image. The results revealed that the longest latency, the highest ambiguity in categorization, and the lowest likability score co-occurred at consistent morphing percentages. Similar results were obtained even when we employed stimulus images that were created by morphing two of each of real, stuffed, and cartoon dog images (Experiment 2). However, the effect of categorization difficulty on evaluation was weak when two real human faces were morphed (Experiment 3). These results suggest that the difficulty in categorizing an object as either of two dissimilar categories is linked to negative evaluation regardless of whether the object is human-related or not.
Conference Paper
Modern android robots have begun to penetrate the social realm of humans. This study quantitatively probed the impact of anthropomorphic robot appearance on human social interpretation of robot facial expression. The Uncanny Valley"theory describing the disturbing effect of imperfect human likenesses has been a dominant influence in discussions of human-robot social interaction, but measuring its effect on human social interactions with robots has been problematic. The present study addresses this issue by examining social responses of human participants to a series of digitally composed pictures of realistic robot faces that span a range from mechanical to human in appearance. Our first experiment provides evidence that an Uncanny Valley effect on social attractiveness is indeed a practical concern in the design of robots meant to interact socially with the lay public. In the second experiment, we employed game-theory research methods to measure the effect of subtle facial expressions in robots on human judgments of their trustworthiness as social counterparts. Our application of game-theory research methods to the study of human-robot interactions provides a model for such empirical measurement of human's social responses to android robots.