ArticlePDF Available

Abstract and Figures

In recent years, machine learning has developed rapidly, enabling the development of applications with high levels of recognition accuracy relating to the use of speech and images. However, other types of data to which these models can be applied have not yet been explored as thoroughly. Labelling is an indispensable stage of data pre-processing that can be particularly challenging, especially when applied to single or multi-model real-time sensor data collection approaches. Currently, real-time sensor data labelling is an unwieldy process, with a limited range of tools available and poor performance characteristics, which can lead to the performance of the machine learning models being compromised. In this paper, we introduce new techniques for labelling at the point of collection coupled with a pilot study and a systematic performance comparison of two popular types of deep neural networks running on five custom built devices and a comparative mobile app (68.5–89% accuracy within-device GRU model, 92.8% highest LSTM model accuracy). These devices are designed to enable real-time labelling with various buttons, slide potentiometer and force sensors. This exploratory work illustrates several key features that inform the design of data collection tools that can help researchers select and apply appropriate labelling techniques to their work. We also identify common bottlenecks in each architecture and provide field tested guidelines to assist in building adaptive, high-performance edge solutions.
Content may be subject to copyright.
LabelSens: enabling real-time sensor data labelling at the point
of collection using an artificial intelligence-based approach
Kieran Woodward
&Eiman Kanjo
&Andreas Oikonomou
&Alan Chamberlain
Received: 1 May 2020 /Accepted: 18 June 2020
#The Author(s) 2020
In recent years, machine learning has developed rapidly, enabling the development of applications with high levels of recognition
accuracy relating to the use of speech and images. However, other types of data to which these models can be applied have not yet
been explored as thoroughly. Labelling is an indispensable stage of data pre-processing that can be particularly challenging,
especially when applied to single or multi-model real-time sensor data collection approaches. Currently, real-time sensor data
labelling is an unwieldy process, with a limited range of tools available and poor performance characteristics, which can lead to
the performance of the machine learning models being compromised. In this paper, we introduce new techniques for labelling at
the point of collection coupled with a pilot study and a systematic performance comparison of two popular types of deep neural
networks running on five custom built devices and a comparative mobile app (68.589% accuracy within-device GRU model,
92.8% highest LSTM model accuracy). These devices are designed to enable real-time labelling with various buttons, slide
potentiometer and force sensors. This exploratory work illustrates several key features that inform the design of data collection
tools that can help researchers select and apply appropriate labelling techniques to their work. We also identify common
bottlenecks in each architecture and provide field tested guidelines to assist in building adaptive, high-performance edge
Keywords Labelling methods .Data .Machine learning .Artificial intelligence .AI .Multi-modal recognition .Pervasive
computing .Tangible computing .Internet of things .IoT .HCI
1 Introduction
Deep neural networks (DNNs) are attracting more and more
attention and are commonly seen as a breakthrough in the
advance of artificial intelligence demonstrating DNNspoten-
tial to be used to accurately classify sensory data. However, in
order to train DNNs, vast quantities of data must first be col-
lected and labelled. This data can include videos, images,
audio, physical activity-related data, temperature and air qual-
ity, inevitably resulting in huge datasets containing data relat-
ing to all types of actions and behaviours. Labelling such data
is not a trivial task, especially when the premise of such sys-
tems is to enable real-time machine learning, such as
recognising emotions or security threats. So far, most of the
attention has been focused on the processing power of edge
computing devices [1,2], and little attention has been paid on
how to obtain clean and efficient labelled data to train models
When collecting data in the wild, in the real-world out-
side the confines of the research lab [4], a participant could be
doing anything from driving a car to eating in a restaurant.
Labelling can be either automatic or manual, which can be
particularly challenging when people are engaged in physical
activities. Taking this into account, the nature of each activity
needs to be considered, both at a UX and user interface design
level, as well as for data sources and providers, and at the
application level.
It is crucial to label sensor data in real time, because unlike
images and audio, it is not usually possible to label the data
offline using the raw data itself without the real-time context.
In pervasive sensing, there are three data collection methods
[4]; these are (1) passive data sensing using smartphones or
other sensors to record unlabelled data in the background [4]
*Eiman Kanjo
Nottingham Trent University, Nottingham, UK
University of Nottingham, Nottingham, UK
/ Published online: 27 June 2020
Personal and Ubiquitous Computing (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
often used to collect weather-related data [5], health data [6,7]
and environmental data [8]; (2) alternatively, active data sens-
ing enables users to label the data in real time through self-
reporting often; this is often used to report well-being or phys-
ical activity; (3) hybrid data sensing combines both passive
and active data collection as it involves users actively labelling
the passive sensor data that is recorded in the background [9].
The choice of a labelling approach depends on the com-
plexity of a problem, the required training data, the size of a
data science team and the financial and time resources a com-
pany can allocate to implement a project. The best approach to
label data often fundamentally depends on the data and source
type being recorded, e.g. sensor data can utilise mobile phone
applications to collect labelled data whereas labelling images
and audio may utilise post-processing techniques to implicitly
crowdsource the labels such as Googles reCAPTCHA [10].
The labelling rate of sensor data can also dictate which
approach to choose as data that frequently changes may re-
quire a higher labelling rate along with a more convenient
labelling approach. The sample size is another factor that
can dictate a labelling approach; the labelling of images can
be automated or crowdsourced whereas a large sample size of
sensor data requires recruiting many participants, for what
could be an extended period of time. Crowdsourcing labels
using web-based applications is often employed for images
and audio data tagging, as it is most commonly processed
offline [11]. This is not possible with time-series data which
has to be labelled online in real time at the point of collection
due to the nature of the data. Outsourcing the labelling of
image, video and audio data to private companies is also
gaining popularity although this is also not possible for sensor
data as activities cannot be deduced from the raw data, mean-
ing real-time labelling techniques must be developed [12].
Tangible user interfaces (TUIs) [13] present significant op-
portunities for the real-time labelling of sensor data. Tangible
interfaces are physical interfaces that enable users to interact
with digital information. These interfaces can be embedded
with a variety of sensors including those which are not com-
monplace in smartphones such as physiological sensors (elec-
trodermal activity (EDA), heart rate variability (HRV)) and
environmental sensors (barometric pressure, ultraviolet
(UV)) enabling the collection of in situ data for all sensors.
TUIs can vary in size and shape but contain ample space to
include the necessary sensors in addition to a real-time label-
ling technique.
To address the above challenges, we introduce LabelSens,
a new framework for labelling sensor data at the point of
collection. Our approach helps developers in adopting label-
ling techniques that can achieve higher performance levels. In
this paper, we present five prototypes utilising different tangi-
ble labelling mechanisms and provide a comprehensive per-
formance comparison and analysis of these prototypes. In par-
ticular, two popular deep learning networks were tested: long
short-term memory (LSTM) and gated recurrent unit (GRU).
Both were used to classify human generated, physiological
activity data collected from 10 users.
Activity recognition is an established field; however, the
methods used to label the sensor data collected are greatly
underexplored. Researchers often manually label the activity
participants undertake [14] which typically prevents the col-
lection of in situ data as it requires the researcher to continu-
ously video participantsactivities so that the data might be
labelled offline. Previous research has utilised smartphone
applications to enable users to self-label their current activity
using onscreen buttons [15]. However, it is not possible to use
smartphones to collect data when additional sensors that are
not embedded within smartphones are required, e.g. EDA or
UV. It is possible for a combination of a smartphone applica-
tion (for labelling), and YUIs (for sensory data collection)
could be used, but this increases the complexity of the system
by forcing users to use 2 devices; it also requires a continuous
stable wireless connection between the 2 devices.
Little research has been conducted to evaluate the feasibil-
ity and performance of other real-time labelling techniques
that would be suitable for edge devices. Looking beyond the
data collection stage, we also start to examine the classifica-
tion accuracy of different labelling techniques.
In this paper our contribution is two-fold; firstly, we intro-
duce a novel mechanism to label sensory data on edge com-
puting and TUIs while conducting a pilot study to collect
training data for machine learning algorithms, and secondly
we present a systematic way to assess the performance of
these labelling mechanisms. Our results show that using
LabelSens can be an effective method for collecting labelled
data. The remainder of the paper is organized as follows:
Section 2presents related work, while Section 3intro-
duces our experimental methods. Labelling rate results are
presented in Section 4.InSection 5,wepresentthealgo-
rithms used in the research; this is followed by the
Discussion—“Section 6. Potential applications and future
work are further explored and discussed in Section 7,while
Section 8concludes the paper.
2 Background
2.1 Data Labelling
As we have already seen, there are numerous approaches to
labelling which vary depending on the data being collected.
Sensor data is most commonly labelled using a hybrid ap-
proach where the sensor data is recorded continuously, while
the user occasionally records a label against all or part of the
previously recorded data. The labelling of human activities
increasingly relies on hybrid data collection techniques using
smartphones to continuously record accelerometer data as
710 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
well as enabling users to self-report their current activity [15].
Smartphone applications are becoming increasingly popular
to label sensor data as they provide a familiar, always acces-
sible interface for users, although recently the use of new
smartphone labelling techniques such as NFC and the use of
volume buttons have proved to be an intuitive and popular
approach when using an alternative approach is not possible
[4].Active learning [16] can be used to label data, needing few
labelled training instances as the machine learning algorithm
chooses the data from which it learns. Active learning could
be beneficial for data where it is challenging to crowdsource
labels, such as raw sensor data that is not sufficiently labelled
[17]. Legion:AR [12] used the power of crowdsourcing com-
bined with active learning to label human activities. Active
learning was used to automate the labelling process where it
was paired with real-time human labellers to label the data that
could not be correctly labelled automatically. However, this
approach requires cameras to continually record users in order
that the unlabelled activities can be tagged offline. This may
be feasible in specific scenarios, such as the workplace, but
would not be plausible in the wild. Another method to
crowdsource human activities requires users to record short
video clips ofthemselves performing different actions at home
[18]. While crowdsourcing videos can result in ample data, it
only allows for video data to be captured with no other sensor
feeds and relies on the willingness of people to perform dif-
ferent activities on video.
As we have noted, the techniques used to label data vary de-
pending on the data type as images can be labelled offline using an
automated process based on click-through data, greatly reducing
the effort required to create a labelled dataset [19]. Additionally,
online tools have been developed that enable users to highlight and
label objects within images. The use of an online tool allowed
people from around the world to help label objects within images
which are simply not possible with the sensor data [20].
Labelling audio data uses an approach similar to that of
images, as spoken words are often labelled in-houseby
linguistic experts or may be crowdsourced. There are many
forms of audio labelling including genre classification, vocal
transcription and labelling various sounds within the audio,
e.g. labelling where bird calls begin and finish. One labelling
solution primarily focused on the artwork albums (recorded
music), text reviews and audio tracks to label over 30,000
albums in relation to one of 250 labels provided, using deep
learning to provide a related multi-label genre classification
[21]. While labelling sounds can be crowdsourced, encourag-
ing individuals to correctly label data can be a challenging task
as it can be tedious. To increase compliance and engagement
during labelling, previous research has developed games such
as Moodswings [22] and TagATune [23] where players label
different sounds. TagATune demonstrates the ability to en-
gage users in labelling data as 10 out of 11 players said they
were likely to play the game again.
Textual data from social media websites can be automati-
cally labelled using the hashtags and emojis contained within
posts as these often describe the contents of the post; however,
this can result in noisy data [24]. Alternatively, text can be
manually labelled but this is a labour-intensive process. One
solution to this problem has involved training a machine learn-
ing model using a manually labelled dataset and then combin-
ing this with noisy emoticon data to refine the model through
smoothing [25]. This method of combining labelled and noisy
data outperformed models trained using just one data type.
2.2 Human machine interaction
The real-time labelling of sensor data is a more challenging
proposition and often relies on the physical interaction with
tangible interfaces. Recent advances in pervasive technologies
have allowed engineers to transform bulky and inconvenient
monitors into relatively small, comfortable and ergonomic
research tools.Emoball [26] has been designed to enable users
to self-label their mood by squeezing an electronic ball. While
this device only allows users to report a limited number of
emotions, participants said it was simple to use and liked the
novel interaction approach. An alternative method to label
mood was explored using a cube containing a face
representing a different emotion of each face of the cube
[27]. Users simply moved the cube to display the face that
most represented their mood providing a simple, intuitive
way for people to label data albeit limited by the number of
faces on the cube. Mood TUI [28] goes beyond self-reporting
to a hybrid approach in order for users to record their emotions
and collect relevant data from the users smartphone including
location and physiological data such as heart rate. Participants
found the use of TUIs very exciting, demonstrating the poten-
tial to increase the usability and engagement of labelling, but
thus far, they have not been widely utilised outside of self-
reporting emotions.
Numerous methods that used to self-report emotions have
been explored including touch, motion and buttons. These
interaction techniques have paved the way for unique interac-
tions with devices, but limited sensor data has been recorded,
and the accuracy of the techniques has not been evaluated as
previous research has not used data collected for machine
learning but purely as a method for individuals to self-report
their well-being.
Sometimes it is not physically possible to interact with
physical devices to label sensor data, such as when an indi-
vidual is driving. A solution to this problem has been the use
of the participantsvoice, for example, to label potholes in the
road [29]. When labelling rapidly changing data, such as road
conditions, it can be difficult to label the data at the exact time
when a label needs to be added, so techniques may be used to
analyse sensor data windows near the label to allow the exact
pothole readings to be correctly labelled. Techniques such as
711Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
these are vital to ensure that the sensor data is correctly la-
belled as incorrectly labelled data will result in inaccurate
machine learning models that will not be able to correctly
classify any future data. However, vocal labelling is not prac-
tical if the device is to be used in public for task that use
sensitive label data such as relating to emotional well-being
or, for example, if there is a considerable amount of back-
ground noise in the environments that would interfere with
the recognition levels.
Table 1(above) shows the current labelling approaches
used including in-house labelling and crowdsourced labelling,
requiring user activities to be video recorded enabling offline
manual data labelling. Similarly, automatic labelling can use
large amounts of labelled video or sensor data to enable future
data to be automatically labelled, dramatically reducing the
time required to label but also reducing the accuracy in which
the data is labelled. Alternatively, Generative Adversarial
Networks (GAN) can be used to automatically generate fur-
ther labelled data, but a vast labelled dataset is initially re-
quired, and the synthetic data labels may be highly inaccurate.
In comparison, labelling at the point of collection is highly
accurate, because it is done real time, it is cost-effective, it is
time-effective and it enables in situ data to be collected. Thus
far, however, labelling at the point of collection has had lim-
ited use, the main area of use has been smartphone applica-
tions. There are numerous scenarios where labelling sensor
data at the point of collection would result in the most effec-
tive and accurate data, but there is currently no established
framework to accomplish this. When providing participants
with tangible interfaces to collect a wide array of sensory data,
embedding a labelling method directly into the device sim-
plifies the labelling process and allows for numerous sensors
that are not embedded within smartphones to be utilised. This
concept creates a simple, tangible, easy to use method to label
sensor data in real time and in situ, aiming to improve the
quantity and reliability of labelled data and therefore increas-
ing the accuracy of machine learning models which might be
Overall, there are numerous possibilities for text, audio and
images to be labelled offline, unlike raw sensor data which, as
we have previously noted, must be labelled in real time. TUIs
have previously been used to self-report, but the data is often
not collected to train machine learning models, which has
meant the accuracy and validity of the labelling techniques
has never been evaluated. Human activity recognition has
been well-researched,but the techniques to label the data have
always either involved offline labelling or a mobile phone
application which limits the availability of sensors. The use
of tangible interfaces containing different labelling methods in
addition to a wide range of sensors has not been considered
but could aid the real-time collection of labelled data. This
research aims to explore the impact that different labelling
techniques embedded within TUIs have on the accuracy of
labelling, label rate, usability and classification performance.
3 LabelSens framework
3.1 Configuration and system architecture
Labelling at the point of data collection provides many bene-
fits, which include lower associated costs, reduced time (on-
task) and the ability to label data in situ. TUIs present many
opportunities to embed unique physical labelling techniques
that may be easier to use than comparative virtual labelling
techniques used to collect in situ labelled data. In addition,
TUIs provide ideal interfaces to directly embed a magnitude
of sensors, negating the need for participants to carry the sen-
sors in as well as a separate labelling mechanism.
Table 1 Comparison of frequently used labelling techniques
Labelling technique Data
Related work Description Accuracy Time Cost
Human In-house labelling Video Activity recognition [14] Labelling carried out
by in house trained team
High Long Low
Crown source labelling Video reCAPTCHA
Labelling carried out by
external third parties
(not trained)
Low Long High
Labelling at the point of
Mobile Mobile app [30][31] Labelling carried out by
the user in situ and in real time
High Short Low
Automatic Sensor/video Fujitsu [32] Generating time-series data automatically
from a previous extended data
collection period
Low Short Low
Synthetic data Sensor/video GAN [33] Generating synthetic labelled dataset
with similar attributes recently using
Generative Adversarial Networks
Very low Short Low
712 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
TUIs can vary in shape and size ranging from small wear-
ables to physical devices that are designed to be frequently
interacted with such as stress balls embedding force sensitive
resistors to measure touch. This enables a wide array of op-
portunities to embed sensors within a variety of objects and
combined with machine learning classifiers that could be used
to infer behaviour change, emotions, movement and more.
However, before machine learning models can be trained, a
vast amount of labelled data is first required. By embedding a
labelling technique along with the sensors within TUIs, it
ensures the sensor data and label are both being collected in
real time aiming to improve data collection rates and accuracy.
Figure 1demonstrates the concept of the LabelSens frame-
work, pairing time-series sensor data with a physical labelling
technique inside a TUI to collect in situ labelled sensor data.
3.2 Labelling mechanisms
To understand the feasibility of labelling techniques for TUIs,
we propose a range of alternative approaches to traditional
labelling techniques. In this section, we present five new pro-
totypes thateach contain a unique labelling technique and will
be used to label human activity (walking, climbing downstairs
and climbing upstairs) along with a comparative mobile
&Two adjacent buttons (press one button for climbing up-
stairs, press the other button for climbing downstairs and
press both buttons simultaneously to record walking)
&Two opposite buttons (press one button for climbing up-
stairs, press the other button for climbing downstairs and
press both buttons simultaneously to record walking)
&Three buttons (one button each for climbing upstairs,
climbing downstairs and walking)
&Force sensitive resistor to measure touch (light touch for
walking, medium touch for climbing downstairs, hard
touch for climbing upstairs)
&Slide potentiometer (slide to the left for climbing down-
stairs, slide to the middle for walking and slide to the right
for climbing upstairs)
&An Android mobile application provided on a Google
Pixel 3 smartphone with 3 virtual buttons to label walking,
climbing downstairs and climbing upstairs
Each TUI is a 6 cm × 6 cm × 6 cm 3d printed cube that
contains a labelling technique combined with the required
sensor and microcontroller. The size of the TUI could be re-
duced dependent on the labelling technique used and the sen-
sors required, but here all interfaces were on the same size to
reduce bias. The embedded electronics include:
&Arduino Nano microcontroller. Due to its small size, being
open source and being compatible with a range of sensors.
&Inertial measurement unit (IMU). To record motion data.
An IMU with 9 degrees of freedom has been used as it
integrates sensors: an accelerometer, a magnetometer and
a gyroscope to provide better accuracy, adding additional
&Micro SD card reader to locally record the IMU sensor
data along with the user inputted label.
The buttons and slide potentiometer enable users to easily
visualise the activity they are labelling; when using the touch
sensor, it is difficult to distinguish between the three levels of
force. To visualise the selected label, a multicoloured LED has
also been incorporated into the device that changes from green
to yellow to red when the device is touched with low, medium
and high force. Figure 2shows the electronic circuit and the
developed TUI for the three buttons labelling and slider
The mobile application was developed for the Android
operating system and was tested using a Google Pixel 3. The
application consisted of three virtual buttons in the centre of
the screen labelled downstairs,walking and upstairs when a
button is pressed the text at the top of the screen changes to
show the currently selected label. Finally, at the bottom of the
screen are the two additional virtual buttons to begin and end
the recording of data. The sensor data along with its label is
then saved to a CSV file stored on the phones internal stor-
age. A challenge when the mobile app for data labelling was
developed was the frequency of the data, as the gyroscopic
data had a significantly lower frequency than the accelerome-
ter data resulting in the reduction of data sampling frequency.
We envision TUIs being used to label a maximum of 5
classes to ensure users are not overwhelmed and can
Fig. 1 LabelSens framework: real-time sensor data fused with a label
713Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
sufficiently label at all times. Additional buttons could be
added, e.g. 1 button each for up to 5 classes, but as we are
only classifying 3 activities, the impact of having varying
number of buttons (2 or 3) can be explored. This novel ap-
proach to in situ labelling provides an easy to use interface that
facilitates the collection of real-time labelled data. The mobile
app presents easier opportunities to include more labels, but
users may still be overwhelmed by numerous virtual buttons.
Overall, the five prototypes demonstrate the variety of label-
ling techniques that can be used in comparison to traditional
app based or offline labelling.
3.3 Experimental setup
An experiment was designed and conducted that explored the
feasibility of the different self-labelling techniques in the
interfaces.This pilot study involved ten participants who were
initially shown a set route to follow to ensure sufficient data
was collected for all three activities. Participants were
instructed that the label should only be recorded when com-
mencing a new activity, and if an incorrect label is recorded,
then the correct label should be recorded as soon as possible to
simulate real-world labelling. Each participant then used all of
the interfaces containing the 5 different labelling techniques
and the mobile apps for 3 min each while undertaking 3 ac-
tivities: walking, climbing upstairs and climbing downstairs.
Ideally the labelling system should be unobtrusive, in a way
that the process of labelling the data should not alter of that
affect the data being collected. Therefore, participants were
not accompanied during the data collection period to realisti-
cally simulate in situ data collection which is the fundamental
purpose of these interfaces. No issues arose during the data
Fig. 2 Example of two electronic
circuits and developed tangible
interface with three buttons and
slider labelling interfaces
714 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
collection with each user understanding how to use each of the
interfaces and successfully collecting data from all devices.
The three activities were allowed for each participant to expe-
rience the different labelling techniques as well as collect sen-
sor data which can be used to examine the accuracy and per-
formance of each labelling technique.
4 Examining labelling rate
The maximum labelling rate of the devices is a key factor in
deciding a labelling technique as some forms of sensor data
can frequently change requiring a new label to be recorded
multiple times every minute. To measure the maximum rate at
which it is possible to label data, each interface was used
continuously for 2 min to record the maximum number of
label changes as possible. Figure 3shows the total number
of times each label was recorded on each of the devices.
The devices with only 2 buttons show the lowest data rate
for each of the three labels because of a delay that was re-
quired to prevent mislabelling when simultaneously clicking
both buttons torecord the third label. The delay ensures that if
a user releases one button slightly before the other when press-
ing both buttons to record the third label, the third label will
still be recorded rather than the label for the button released
last. The app shows a higher labelling rate than the devices
with two buttons but is not significantly greater due to the
difficulty in pressing virtual buttons that can easily be missed
compared with physical buttons.
Three buttons show significantly more data recorded al-
though very little data was recorded for one of the buttons
possibly due to the third button being more difficult to reach
as each button is located on a different face of the cube. The
touch sensor recorded a high labelling rate for all three labels
as to reach label 2 (high setting); by pressing the sensor, the
user must first record label 0 and 1 as they increase the force
exhorted on the sensor. The slider shows high labelling rates
for label 0 and label 2 but not label 1 because it is simple to
slide the slider from one end to the other, but the slider was
rarely located in the middle of the device long enough for the
label to be recorded. This shows the touch and slider tech-
niques are easy to label the extreme values, but intermediary
values are more challenging to frequently label. If all labels
need to be frequently labelled, then buttons may be the best
labelling technique although the position of the buttons can
greatly impact the ease of which labelling can occur.
It is also vital to compare the number of times the label
changed over the 2-min period to evaluate how simple it is
to change label for each technique. Figure 4shows the slider
recorded the most label changes overall because of the sim-
plicity to navigate between the labels followed by two oppo-
site buttons which is surprising due to its low labelling rate.
This demonstrates that while the use of buttons does not result
in the highest labelling rate, it is simple to switch between the
different labels and should be used when the label will change
frequently. Touch, three buttons, the mobile app, and two
adjacent buttons all performed similarly well showing there
is little difference in accessing all of the labels when using
these devices.
Once all the participants used each device to label while walk-
ing, climbing downstairs and climbing upstairs, the data was ex-
tracted, enabling comparisons to be established as shown in Fig. 4.
The rate at which labels were changed from one label to another
during the collection of activity data shows that the three buttons
recorded the fewest in situ labelling changes for all users, while the
two opposite buttons had the highest overall rate of in situ labelling
changes albeit much lower than the maximum rate of labelling
changes demonstrating fewer buttons increased ease of use.
Labelling via touch had a consistently high rate of labelling chang-
es for users, but this again could be due to the requirement of
looping through all of the labels to reach the desired label. The
mobile app achieved a slightly higher rate than three buttons and
slider but not as high as the two buttons or touch. Overall the slider
and the three buttons produced the lowest rate of label changes
Fig. 3 Maximum labelling rate
for each label per device
715Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
during data collection showing these labelling techniques should
not be utilised with data that requires frequent labelling changes
because of the difficulty in accessing all the three labels.
Figure 5shows the total number of in situ recorded labels
from all participants for each of the devices. Touch and slider
have the highest total number of labelsrecorded as when using
these labelling techniques, each label must be cycled through
to change the label. Two opposite buttons had the smallest
number of labels which is to be expected as a delay had to
be added after a button is pressed to prevent incorrect label-
ling. Because of the delay, it was expected that the two adja-
cent buttons would similarly have a low data rate, but it
achieved a higher rate than three buttons, possibly, because
of the difficulty of accessing the three different buttons on
different faces of the cube. This shows the position of the
buttons has a greater impact on the number of labels recorded
than the number of labelling interfaces embedded into the
device. The comparative mobile app performed better than
the buttoned devices but not as well as the slider or touch
interfaces demonstrating the benefit of TUIs when a high la-
belling rate is required.
While all interfaces recorded more walking labels than any
other label as expected due to the set route having more walking
than stairs, the app had the fewest downstairs labels recorded
demonstrating the difficulty in accessing virtual buttons in the
same way as physical buttons where the buttons position can
have a major impact on its ease of use. Similarly, two adjacent
buttons had a smaller proportion of upstairs and downstairs labels
which is surprising as these labels are the easiest to access (by
clicking a single button) compared with labelling walking that
required both buttons to be pressed simultaneously. It is also
likely that touch and slider have more downstairs labels than
upstairs labels as downstairs must first be cycled through to reach
either the walking or upstairs label.
5 Algorithms
In order to identify the three activities from the sensor data
collected, deep neural networks were used to develop three
predictive models. The performance of the three supervised,
deep learning algorithms were tested to classify the sensor
Fig. 5 Total number of recorded
in situ labels for each device
Fig. 4 Comparison of total
maximum label changes per
716 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
data into three activity classes. A multilayer recurrent neural
network (RNN) [34] with long short-term memory (LSTM)
[35], a multilayer RNN with gated recurrent unit (GRU) [36]
and multilayer RNN with a stacked LSTM-GRU were select-
ed due to their high performance and capabilities in classifying
time-series data.
It is vital to utilise LSTM or GRU cells when working with
sequential data such as human activity and RNNs to capture
long-term dependencies and remove the vanishing gradient.
Recently the use of GRU cells is becoming increasingly pop-
ular due to its simpler design, which uses only two gates, a
reset gate and an update gate rather than the three gates used
by an LSTM: a forget gate, an input gate and an output gate.
The use of a GRU cells can significantly reduce the time
required to train models because of its simpler structure ex-
posing the full hidden content to the next cell. GRU models
have also been shown to outperform LSTM networks when
there is a smaller training dataset, but LSTM models remem-
ber longer sequences than GRU models outperforming them
in tasks requiring modelling long-distance relations [3639].
Figure 6shows the differences between the LSTM and GRU
cells. Meanwhile, the stacked model will explore whether
combining the LSTM and GRU cells within a single network
improves or degrades performance in comparison with the
base models. Batch normalisation was used on all models to
normalise the inputs of each layer, so they have a mean of 0
and standard deviation of 1; this enables the models to train
quicker, allows for higher learning rates and makes the
weights easier to initialise [40].
The dataset collected from each of the five interfaces and
mobile app was used to train the three models over 10 epochs
with 10-fold cross-validation. The initial learning rate of the
model was set to 0.0025 and a batch size of 32. The data
sequences used during training have a length of T= 100 with
an overlap of 20. Figure 7shows the accuracy of each model.
The stacked LSTM-GRU displayed little impact compared
with the LSTM. Meanwhile, the GRU outperformed the
LSTM and stacked models for most labelling techniques with
the exception of two adjacent buttons where the LSTM net-
work achieved the highest accuracy of all the labelling tech-
niques at 92.8%. The overall GRU accuracies ranged between
68.5 and 89% demonstrating the impact different labelling
techniques have on a dataset and thus the accuracy of a clas-
sification model.
The two adjacent buttons labelling technique achieved the
highest accuracy of all the devices which is unexpected due to
its complex nature where 2 buttons represent 3 labels. The
second most accurate device, touch, was also unexpected
due to the more complex interaction required of pressing the
device using varying levels of force to record the different
labels. It is possible that the more complex action forced users
to have a greater focus on labelling their activity, resulting in
more accurate labelling. This however may not be sustained if
the device was to be used for several days. Even though three
buttons and the slider labelling techniques resulted in the low-
est changing labelling rate, they achieve consistently high ac-
curacies in the three trained models. This demonstrates that
although it may be more difficult to collect fast-changing data
with these techniques, the collected data is reliable and capa-
ble of producing accurate classification models. The mobile
app again performed moderately achieving 77.8% accuracy
which although is not as high as touch, two adjacent buttons
or three buttons; it is greater than slider and two opposite
Figure 8shows the accuracy and loss of the combined user
test data for all of the labelling interfaces during each epoch
when trained using the RNN with GRU. The loss for each of
the models gradually decreases, but the loss for the touch and
slider decrease significantly as would be expected due to these
interfaces achieving the highest overall accuracy.
It is possible that the datasets may contain potential biases,
for example, if one user was particularly poor as labelling with
one device; it may significantly impact the quality of the train-
ing dataset. To evaluate potential bias, the GRU model was
trained using the data from five users using each interface as
shown in Fig. 9.
There are extremely wide variations in the model accuracy
ranging from 33.3 to 97.1%. Two opposite buttons and three
buttons demonstrate the widest variation in model accuracy
with accuracies reduced to 42.9% for user 1 using two oppo-
site buttons and 33.3% for user 1 using three buttons. As the
lowest accuracies were all performed by the same user, it
indicates that this user experienced more difficulty using the
interfaces than the other users. However, two opposite buttons
Fig. 6 Comparison of LSTM
(left) and GRU (right) cells
717Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
also demonstrated poor accuracy (42.9%) when trialled by
user 5; thus, it is shown that this interface results in poorer
data collection as the data from the same user achieved con-
sistently high accuracies for all other interfaces ranging be-
tween 74.2 and 87.5%. When comparing average model ac-
curacy for each user, it shows some users can result in signif-
icantly better data collection and therefore model accuracy; for
example, the overall accuracy between all interfaces for user 2
was 85.4%. The mobile app, two adjacent buttons, touch and
slider all achieved high levels of accuracy when tested with
each user demonstrating the reliability for those interfaces to
consistently collect accurately labelled data. The touch inter-
face achieved the highest overall accuracy at 97.1% when
trained using data collected by user 4 although the data from
the other interfaces collected by user 4 did not result in as
higher accuracy demonstrating that user preference and famil-
iarity with an interface plays an important role in the quality of
data collected.
Classification accuracy alone does not provide an informed
overview of the most beneficial labelling technique. The f1
score, a harmonic average of the precision and recall, for each
label and device has been calculated, as shown in Table 2.
Overall, the walking label has consistently higher precision
and recall compared with the upstairs label which has the
lowest f1 scores. The mobile app demonstrates good precision
and recall when classifying upstairs but extremely poor
precision and recall when classifying downstairs, potentially
due to more mislabelling occurring when labelling climbing
downstairs. The slider, two adjacent buttons and touch show
the highest f1 scores which demonstrate their consistency as a
useful labelling technique. Even though three buttons had a
higher accuracy than slider, its f1 score is extremely low when
labelling upstairs, demonstrating its unreliability in classi-
fying this class.
Cochrans Q test was performed to evaluate the three dif-
ferent models (L= 3) for each labelling technique providing a
chi-square value and Bonferroni-adjusted pvalue as shown in
Table 3.Cochrans Q test is used to test the hypothesis that
there is no difference between the classification accuracies
across multiple classifiers distributed as chi-square with L-1
degrees of freedom. Cochrans Q test is similar to one-way
repeated measures ANOVA and Friedmanstestbutfordi-
chotomous data as the classification will either be correct or
incorrect and can be applied across more than two groups
unlike McNemarstest[41].
Assuming a significance level of = 0.05, Cochrans Q test
shows for touch, two adjacent button, three buttons and the
mobile app; the null hypothesis can be rejected as all three
classifiers perform equally well. For the remaining labelling
techniques, the null hypothesis has failed to be rejected show-
ing there is a significant difference for the classifiers on those
datasets. The Ftest was also performed to compare the three
Fig. 7 Comparison of deep
learning techniques on the
combined data collected from
each devices
Fig. 8 Comparison of training
accuracy and loss when using
GRU on the total data collected
for each device
718 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
classifiers as it is regarded analogous to Cochrans Q test.
Assuming the same level of significance, the slider rejects
the null hypothesis in addition to two adjacent buttons
confirming Cochransresults.
Cochrans Q test shows there is a significant differ-
ence between the three models when trained on the two
opposite buttons and slider datasets but does not show
where the differences lie. To see which models contain
the significant differences, the McNemarstestwasper-
formed to compare the predictive accuracy of each model
using the 2 datasets.
Table 4shows the resulting pvalues when McNemarstest
was performed. There is a significant differencebetween all of
the models for both two opposite buttons and slider with the
largest difference being between LSTM and the stacked net-
work for both datasets. This demonstrates that both the label-
ling technique and the network architecture result in signifi-
cant differences in the modelsaccuracy and reliability.
6 Discussion
To ensure the effectiveness of the labelling techniques, it is
also vital to gain userspreference. Fifty users were asked
which labelling technique they preferred. Figure 10 shows
the results from the 50 users with 22% preferring the three
buttons as it was simple to understand and use due to their
being one label per button although this labelling technique
did not result in accurate models. Similarly, 22% of people
preferred two adjacent buttons with the mobile app following
which is surprising as majority of the people are familiar with
mobile apps, so it would be expected to be the most popular.
The users found three buttons and two adjacent buttons to be
simpler to operate than the mobile app due to the physical
buttons being quicker and easier to press than the virtual but-
ton on the app, which were often missed. Two opposite but-
tons followed again possibly due to the simplicity and famil-
iarity of the buttons to label data. The slider was well received,
but the granular control made the middle label more difficult
to access meaning careful consideration had to be made to
ensure actions were being correctly labelled. Finally, the
fewest number of people preferred the touch-based labelling
technique due to the complexity of having to touch with vary-
ing levels of pressure to correctly label the data. However,
touch did result in highly accurate models showing that while
the increased attention required is not preferred,it does ensure
accurate data labelling, but this may not be sustained over the
long periods.
Fig. 9 Model accuracy when individually trained on 5 usersdata
Table 2 F1 score for each label when trained using each device
Downstairs Walking Upstairs
Slider 0.7 0.82 0.69 s
Two adjacent buttons 0.82 0.91 0.75
Touch 0.69 0.94 0.83
Three buttons 0.59 0.8 0.3
Two opposite buttons 0.58 0.75 0.42
App 0.23 0.60 0.82
Table 3 CochranstestandFtest comparing classification models
Ftest Ftest p
Slider 1.4 0.498 0.699 0.498
Two adjacent
7.167 0.028 3.76 0.026
Touch 7.457 0.025 3.729 0.025
Three buttons 6.143 0.046 3.136 0.046
Two opposite
2.533 0.282 1.277 0.285
App 13.241 0.001 6.852 0.001
Table 4 McNemars test comparing 2 opposite buttons and slider
Two opposite buttons Slider
GRU LSTM Stacked GRU LSTM Stacked
GRU BA 0.228 0.125 NA 0.286 0.596
LSTM 0.228 NA 0.546 0.286 NA 0.845
Stacked 0.125 0.546 NA 0.596 0.845 NA
719Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
While the user preference of a given labelling technique
does not correlate with the accuracy achieved for each meth-
od, it shows the benefits of using buttons as they are well-
received by users and also achieve high accuracy. A lower
number of buttons than labels was jointly preferred by the
users and achieved the highest accuracy, but the number of
buttons must remain equal to the number of labels to ensure
the users do not experience confusion when labelling. The
position of the buttons has also shown to impact on user pref-
erence. In terms of labelling rate and model accuracy, two
adjacent buttons were preferred by users and resulted in
24.3% higher model accuracy than two opposite buttons
which had a higher total number of recorded in situ labels
but a lower labelling rate. It is imperative to balance user
preference with the rate at which the data needs to be labelled
and the accuracyis required from the model when selecting an
appropriate labelling technique.
Novel labelling methods including the slider and touch
displayed their own strengths and weaknesses. Labelling
using touch resulted in highmodel accuracy and labelling rate
but was the least favoured by users. If accurate labelling is
required for only short periods, labelling via touch could be
ideal. The slider was liked by the users and had the highest
labelling rate but achieved the second worse accuracy of all
the devices at73.4% showing the slider is best for continually
changing or granular data that would be more difficult to label
with buttons.
Surprisingly the mobile app was not the most popular la-
belling technique even though all participants were more fa-
miliar with apps than the other interfaces. The data collected
from the mobile app shows it achieved a moderate labelling
rate and model accuracy despite participantsfamiliarity. A
possible reason why the mobile app did not result in the most
accurate data is that virtual buttons can be easier to miss than
physical interfaces. However, when used in real world envi-
ronments, apps are easier to deploy, but solely using an app
does not allow for any additional sensors that are not embed-
ded within the smartphone to be used. Apps possess many
benefits when used to label motion data including ease of
access, but when additional sensors are required, using apps
for purely labelling is not recommended over physical label-
ling techniques.
One of the mostsignificant challenges encountered was the
inconsistent quality of labelled data, as when collecting in situ
data to train machine learning models, it is not possible to
ensure all users are successfully labelling their actions. For
example, the wide variation in labelling rates was most likely
due to users not following the set route as they were unaccom-
panied during the labelling process to better replicate in-situ
data collection.
Additionally, as users had to repeat the experiment five
times to enable them to use each device, their labelling rate
may change as they become more familiar with the experi-
ment. To combatthis, users were provided with the devices in
varying orders preventing the same device from being used by
all users at the same stage of the experiment.
Overall, when labelled in situ sensor data is required, the
use of physical labelling interfaces should be considered as
they have demonstrated their ability to improve labelling rate,
accuracy and user preference in comparison with mobileapps,
which are most commonly used to label sensor data.
7 Applications and future work
AI-powered edge computing has numerous potential applica-
tions as it is not always possible to label real-time data using a
smartphone application. Common uses for tangible labelling
techniques include times when users may be engaged in other
activities such as labelling while physically active.
Additionally, tangible labelling techniques are required in
cases where specialist sensors are required to collect labelled
data such as physiological sensors used to label mental well-
being or environmental sensors to measure pollution. The la-
belling techniques discussed provide new opportunities to la-
bel real-time sensor data that has traditionally been challeng-
ing to label. This data can then be used to train models, pos-
sibly on the device using edge computing to classify sensor
data in real time.
Fig. 10 Comparison of 50 users
labelling preference
720 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
In the future, these labelling techniques could be evaluated
on other data types including the use of more specialist sensors
to further prove their effectiveness. Additionally, longer data
collection trials could be conducted as while this experiment
was short, it demonstrates the requirements of using tangible
labelling techniques to improve labelling rate and overall
model accuracy.
8 Conclusion
Tangible user interfaces are ideal interfaces for data col-
lection and running real-time machine learning classifiers,
but first real-world-labelled data must be collected.
Images, video and audio data can all be labelled offline,
but this is not possible with time-series sensor data. To
address this issue and collect in situ labelled sensor data,
five different labelling techniques have been embedded
into TUIs including two opposite buttons, two adjacent
buttons, three buttons, slider, touch and a comparative
mobile application. The interfaces were used by the par-
ticipants to label three physical activities enabling the
performance of each technique to be evaluated. It is vital
to compare the different labelling techniques as machine
learning models can only be as accurate as the labelled
data they are trained on.During this pilot study, partici-
pants used six labelling interfaces to collect data that was
used to train various RNNs. The results demonstrate that
while a touch interface results in a high labelling rate and
high model accuracy, it is the least favoured by the users
due to the high level of attention required to use the de-
vice. The mobile app was popular with users due to its
familiarity but only achieved the fourth highest accuracy.
The slider resulted in high user preference and labelling
rate but poor model accuracy, while two adjacent buttons
achieved both high user preference and the highest model
accuracy showing it is the most beneficial technique for
this data collection.
Overall, this exploratory work demonstrates embedding
labelling techniques within TUIs addresses many of the chal-
lenges facing the collection of in situ, time-series sensor data.
When collecting labelled data, the nature of the data, labelling
rate, duration of data collection and user preference all need to
be considered to ensure the most effective labelling technique
is used. This will increase the reliability of in situ labelled
datasets and enable the development of more accurate ma-
chine learning classifiers.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article's Creative Commons licence, unless indicated oth-
erwise in a credit line to the material. If material is not included in the
article'sCreative Commons licence and yourintendeduse is not permitted
by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of
this licence, visit
1. Nvidia (2019) NVIDIA Jetson Nano Developer Kit | NVIDIA
2. Google (2019) Google Coral.
3. Lara ÓD, Labrador MA (2013) A survey on human activity recog-
nition using wearable sensors. IEEE Commun Surv Tutorials 15:
4. Younis EMG, Kanjo E, Chamberlain A (2019) Designing and eval-
uating mobile self-reporting techniques: crowdsourcing for citizen
science. Pers Ubiquitous Comput:110
5. Kwan V, Hagen G, Noel M, Dobson K, Yeates K (2017)
Healthcare at your fingertips: the professional ethics of smartphone
health-monitoring applications. Ethics Behav 27:615631. https://
6. Kanjo E, Younis EMG, Sherkat N (2018) Towards unravelling the
relationship between on-body, environmental and emotion data
using sensor information fusion approach. Inf Fusion 40:1831
7. Al-barrak L, Kanjo E, Younis EMG (2017) NeuroPlace: categoriz-
ing urban places according to mental states. PLoS One 12:
8. Kanjo E (2010) NoiseSPY: a real-time mobile phone platform for
urban noise monitoringand mapping. Mob Networks Appl 15:562
9. Kanjo E, Kuss DJ, Ang CS (2017) NotiMind: utilizing responses to
smart phone notifications as affective sensors. IEEE Access 5:
10. Google (2019) reCAPTCHA: Easy on Humans, Hard on Bots. Accessed 8
Apr 2019
11. Vaughan JW (2019) Making better use of the crowd: how
crowdsourcing can advance machine learning research. JMLR 18
12. Lasecki WS, Song YC, Kautz H, Bigham JP (2012) Real-time
crowd labeling for deployable activity recognition
13. Ullmer B, Ishii H (2000) Emerging frameworks for tangible user
interfaces. IBM Syst J 39:915931.
14. Tapia EM, Intille SS, Haskell W, Larson K, Wright J, King A,
Friedman R (2007) Real-time recognition of physical activities
and their intensities using wireless accelerometers and a heart rate
monitor. Proceedings - International Symposium on Wearable
Computers, ISWC, In
15. Kwapisz JR, Weiss GM, Moore SA (2010) Activity recognition
using cell phone accelerometers
16. Settles B (2010) Active learning literature survey. Univ Wisconsin,
17. Hunh T, Schiele B (2006) Towards less supervision in activity
recognition from wearable sensors. Proceedings - International
Symposium on Wearable Computers, ISWC, In
18. Sigurdsson GA, Varol G, Wang X, Farhadi A, Laptev I, Gupta A
(2016) Hollywood in homes: crowdsourcing data collection for
activity understanding. Springer, Cham, pp 510526
19. Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Image
annotation using clickthrough data
721Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
20. Russell BC, Torralba A, Murphy KP, Freeman WT (2008)
LabelMe: a database and web-based tool for image annotation.
Int J Comput Vis 77:157173.
21. Oramas S, Nieto O, Barbieri F, Serra X (2017) Multi-label music
genre classification from audio, Text, and Images Using Deep
22. Kim YE, Schmidt E, Emelle L (2008) MoodSwings: a collaborative
game for music mood label collection. In: ISMIR 2008 - 9th
International Conference on Music Information Retrieval
23. Law ELM, Ahn L Von, Dannenberg RB, Crawford M (2007)
Tagatune: a game for music and sound annotation. In:
Proceedings of the 8th International Conference on Music
Information Retrieval, ISMIR 2007
24. Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment
learning using Twitter Hashtags and smileys
25. Liu K-L, Li W-J, Guo M (2012) Emoticon smoothed language
models for twitter sentiment analysis. Twenty-Sixth AAAI Conf
Artif Intell
26. Bravo J, Hervás R, Villarreal V (2015) Ambient intelligence for
health first international conference, AmIHEALTH 2015 Puerto
Varas, Chile, December 14, 2015 proceedings. Lect notes
Comput Sci (including Subser Lect notes Artif Intell Lect notes
bioinformatics) 9456:189200 .
27. Sarzotti F, Lombardi I, Rapp A, Marcengo A, Cena F (2015)
Engaging users in self-reporting their data: a tangible Interface for
quantified self. Springer, Cham, pp 518527
28. Sarzotti F (2018) Self-monitoring of emotions and mood using a
tangible approach. Computers 7:7
29. Tai Y, Chan C, Hsu JY (2010) Automatic road anomaly detection
using smart mobile device. 2010 15th Conf Artif Intell Appl
30. Kanjo E, Younis EMG, Ang CS (2018) Deep learning analysis of
Mobile physiological, Environmental and Location Sensor Data for
Emotion Detection J Inf Fusion 133
31. Kanjo E, Younis EMG, Sherkat N (2018) Towards unravelling the
relationship between on-body, environmental and emotion data
using sensor information fusion approach. Inf Fusion 40:1831.
32. Fujitsu (2019) Fujitsu Develops Automatic Labeling Technology to
Accelerate AI Use of Time-Series Data - Fujitsu Global. https://
33. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D,
Ozair S, Courville A, Bengio Y (2014) Generative adversarial
34. Pearlmutter (1989) Learning state space trajectories in recurrent
neural networks. In: International Joint Conference on Neural
Networks. IEEE, pp 365372 vol.2
35. Hochreiter S, Schmidhuber J (1997) Long short-term memory.
Neural Comput 9:17351780.
36. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evalua-
tion of gated recurrent neural networks on sequence modeling
37. Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical explo-
ration of recurrent network architectures. Proc. 32nd Int. Conf. Int.
Conf. Mach. Learn. - Vol. 37 23422350
38. Kaiser Ł, Sutskever I (2015) Neural GPUs learn algorithms
39. Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of
CNN and RNN for natural language processing
40. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep
network training by reducing internal covariate shift. In: 32nd in-
ternational conference on machine learning, ICML 2015.
International machine learning society (IMLS), pp 448456
41. Demšar J (2006) Statistical comparisons of classifiers over multiple
data sets. J Mach Learn Res 7:130
Publishersnote Springer Nature remains neutral with regard to jurisdic-
tional claims in published maps and institutional affiliations.
722 Pers Ubiquit Comput (2020) 24:709–722
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
... 1 State-of-the-art neural network classifiers could and should achieve cutting edge applications. 2 The required data labelling cost ( [37,41]) and reliability ( [18,2]) of these networks, however, limits them from achieving their full potential. 3 In this study, we specifically address the demand for neural training that output class-labels are available for back-propagated learning [13,Chapter 5]. 4 On the one hand, this input-output design results in high training-dataset preparation cost when concerning manual labelling. ...
... 40 As to say, X U contains the same and different classes to X L . 41 The difference between semi-supervised novel class detection and SsLAC is not in their training criteria, but rather their required out-come. 42 Semi-supervised novel class detectors are binary classifiers aimed to separate the different novel classes in X U from the known classes in X L (i.e. ...
... 40 Also including the pseudo-labelled samples produced by the generator in the supervised criteria would train the required open set novel class detection as described above. 41 The final generator and discriminator loss functions of our proposed open-SsLAC method are given below, noting that X P is now equal to the output of the generator, X P := G(z). Loss-min ...
Full-text available
The ability to (a) train off partially labelled datasets and (b) ensure resulting networks separate data outside the domain of interest hugely expands the practical and cost-effective applicability of neural network classifiers. We design a classifier based off generative adversarial networks (GANs) that trains off a practical and cost-saving semi-supervised criteria which, specifically, allows novel classes within the unlabelled training set. Furthermore, we ensure the resulting classifier is capable of absolute novel class detection, be these from the semi-supervised unlabelled training set or a so-called open set. Results are both state-of-the-art and a first of its kind. We argue this technique greatly decreases training cost in respect to labelling while greatly improving the reliability of classifications.
... A unified dataset was created, including all the frames in a file. Next, each activity was labeled, following the labelling technique [10], adding the information corresponding to the activity to which the sample belongs. The raw data was processed using sliding windows [11]. ...
Conference Paper
Full-text available
Human Activity Recognition (HAR) plays an important role in behavior analysis, video surveillance, gestures recognition, gait analysis, and posture recognition. Given the recent progress of Artificial Intelligence (AI) applied to HAR, the inputs that are the data from wearable sensors can be treated as time-series from which movement events can be classified with high accuracy. In this study, a dataset of raw sensor data served as input to four different deep learning networks (DNN, CNN, LSTM, and CNN-LSTM). Differences in accuracy and learning time were then compared and evaluated for each model. An analysis of HAR was made based on an attempt to classify three activities: walking, sit-to-stand, and squatting. We also compared the performance of two different sensor data types: 3-axis linear acceleration measured from two inertial measurement units (IMUs) versus 3D acceleration of two retro-reflective markers from the high-end optoelectronic motion capture system (MOCAP). The dataset created from observations of ten subjects was preprocessed with labelling and sliding windows and then used as input to the four frameworks. The results indicate that, for HAR prediction, linear accelerations estimated using IMUs are as reliable as those measured using the MOCAP system. Also, the use of the hybrid CNN-LSTM framework for both methods resulted in higher accuracy (99%).
... Furthermore, WT must be coupled to artificial intelligence technologies as standard methods of data processing for successful outcomes [18][19][20]. Methods proposed for processing data from wearable devices include machine learning in varied forms, with algorithms related to unsupervised and supervised learning [18,[21][22][23]. Microfluidic paper-based analytical devices (μPADS) are the most convenient alternative to conventional devices due to low manufacturing cost, ease of use, and disposability. ...
The technology of electrochemical flexible nanosensors is growing rapidly due to the innumerable advantages that these devices offer, such as quick detection of pathogens or biomarkers, high sensitivity of detection, and accessibility of the tools. With the advances in data sciences, it is possible to connect these sensors with the Internet of Things (IoT) and make use of artificial intelligence systems as a diagnostic tool. Also, these sensors avoid the costs of large equipment and complex laboratories for detection. These systems have evolved effective methods for diagnosing diseases, detecting pathogens, and improving treatment efficacy, consequently improving patient quality of life. This chapter gives a brief overview of flexible electrochemical smart nanosensors, including definitions, smart materials, and advances in this field regarding preventive and personalized medical devices.
... Language is considered as one of the hugest accomplishments of people that has quickened the advancement of mankind. Thus, it's anything but an unexpected that there is a lot of work being done to incorporate language into the field of AI as Natural Language Processing (NLP) [21][22][23]. Common techniques used in NLP includes, Named Entity Identification, Aspect Mining, SA, Script Summarization and Subject Modelling [24]. This paper plots the way toward demonstrating at dynamic level, and afterward, concocts a solid model for confirmation the practicality. ...
Full-text available
Abstract. World Wide Web (WWW) has become a monstrous wellspring of client delivered content and opinionative data. Utilizing web-based social networking, for example, Twitter, Facebook, LinkedIn and so forth client share their points of view, feelings in a helpful way, where a great many individuals express their perspectives in their everyday connection, which can be their conclusions and assessments about specific thing. These routinely creating unique data are, beyond question, a staggeringly rich wellspring of information for such an essential administration process. To mechanize the examination of such information, the region of Sentiment Analysis (SA) has developed. It targets recognizing opinionative data in the Web and gathering them as demonstrated by their furthest point. This is achieved by introspecting the various data uploaded by the user. These data are categorized under different emotions such as social awareness, curiosity, emotions, creative, advertising and so forth., with the assistance of the keywords used in the data uploaded. In that conduct dissect we are utilizing content based separating, Collaborative Filtering and Natural Language pre-processing calculation. Relies on this calculation we will order the users.
... 1. User availability, incentivisation and willingness to participate in longitudinal studies (or increasing study drop-outs beyond the first few months) [5] 2. Privacy, ethics and data protection issues [6], [7] 3. Data integrity and accuracy [8] 4. Costs and availability of monitoring devices [9], [10] 5. Requirement to set up the device and extract the data by expert personnel needing specialized equipment [11] 6. Time consuming nature of real-time self labelling [12] In order to address these well reported problems, Transfer Learning (TL) is often used by training a base model using labelled data from a different domain and transferring the learned knowledge to the new target domain [13], [14]. Pre-trained models are often used to encompass methods that discover shared characteristics between prior tasks and a target task [14]. ...
Full-text available
The quantification of emotional states is an important step to understanding wellbeing. Time series data from multiple modalities such as physiological and motion sensor data have proven to be integral for measuring and quantifying emotions. Monitoring emotional trajectories over long periods of time inherits some critical limitations in relation to the size of the training data. This shortcoming may hinder the development of reliable and accurate machine learning models. To address this problem, this paper proposes a framework to tackle the limitation in performing emotional state recognition on multiple multimodal datasets: 1) encoding multivariate time series data into coloured images; 2) leveraging pre-trained object recognition models to apply a Transfer Learning (TL) approach using the images from step 1; 3) utilising a 1D Convolutional Neural Network (CNN) to perform emotion classification from physiological data; 4) concatenating the pre-trained TL model with the 1D CNN. Furthermore, the possibility of performing TL to infer stress from physiological data is explored by initially training a 1D CNN using a large physical activity dataset and then applying the learned knowledge to the target dataset. We demonstrate that model performance when inferring real-world wellbeing rated on a 5-point Likert scale can be enhanced using our framework, resulting in up to 98.5% accuracy, outperforming a conventional CNN by 4.5%. Subject-independent models using the same approach resulted in an average of 72.3% accuracy (SD 0.038). The proposed CNN-TL-based methodology may overcome problems with small training datasets, thus improving on the performance of conventional deep learning methods.
... The size of the cube is 6cm × 6cm × 6cm making it handheld and ideal to embed all of the necessary sensors. Two buttons (1 green and 1 red) were also embedded within the interfaces to enable the realtime labelling of positive and negative states of wellbeing respectively [36]. Multiple physiological sensors are included within the interfaces to measure wellbeing. ...
Full-text available
The ability to unobtrusively measure mental wellbeing states using non-invasive sensors has the potential to greatly improve mental wellbeing by alleviating the effects of high stress levels. Multiple sensors, such as electrodermal activity, heart rate and accelerometers, embedded within tangible devices pave the way to continuously and non-invasively monitor wellbeing in real-world environments. On the other hand, fidgeting tools enable repetitive interaction methods that may help to tap into individual's psychological need to feel occupied and engaged; hence potentially reducing stress. In this paper, we present the design, implementation, and deployment of Tangible Fidgeting Interfaces (TFIs) in the form of computerised iFidgetCubes. iFidgetCubes embed non-invasive sensors along with fidgeting mechanisms to aid relaxation and ease restlessness. We take advantage of our labeling techniques at the point of collection to implement multiple subject-independent deep learning classifiers to infer wellbeing. The obtained performance demonstrates that these new forms of tangible interfaces combined with deep learning classifiers have the potential to accurately infer wellbeing in addition to providing fidgeting tools.
... Furthermore, even if the data is labelled by experts it might not always reflect the true internal state of the user. A hybrid approach of selfreporting and continuous data collection would enable more accurately labelled data to be collected but this relies on users continuously reporting their well-being [140]. ...
Full-text available
Mental health problems are on the rise globally and strain national health systems worldwide. Mental disorders are closely associated with fear of stigma, structural barriers such as financial burden, and lack of available services and resources which often prohibit the delivery of frequent clinical advice and monitoring. Technologies for mental well-being exhibit a range of attractive properties which facilitate the delivery of state of the art clinical monitoring. This review article provides an overview of traditional techniques followed by their technological alternatives, sensing devices, behaviour changing tools, and feedback interfaces. The challenges presented by these technologies are then discussed with data collection, privacy, and battery life being some of the key issues which need to be carefully considered for the successful deployment of mental health toolkits. Finally, the opportunities this growing research area presents are discussed including the use of portable tangible interfaces combining sensing and feedback technologies. Capitalising on the data these ubiquitous devices can record, state of the art machine learning algorithms can lead to the development of robust clinical decision support tools towards diagnosis and improvement of mental well-being delivery in real-time.
... However, although this is an early study in this area, the data was recorded prior and did not report in real-time which would have given a deeper understanding into the overall impact. The use of self-reporting to record wellbeing is becoming increasingly popular [23] especially through mobile systems such as Mappiness [13] and WiMO [15] because of their ability to link the individual's emotion to a particular location. ...
Full-text available
The growth of mobile sensor technologies have made it possible for city councils to understand peoples' behaviour in urban spaces which could help to reduce stress around the city. We present a quantitative approach to convey a collective sense of urban places. The data was collected at a high level of granularity, navigating the space around a highly popular urban environment. We capture people's behaviour by leveraging continuous multi-model sensor data from environmental and physiological sensors. The data is also tagged with self-report, location coordinates as well as the duration in different environments. The approach leverages an exploratory data visualisation along with geometrical and spatial data analysis algorithms, allowing spatial and temporal comparisons of data clusters in relation to people's behaviour. Deriving and quantifying such meaning allows us to observe how mobile sensing unveils the emotional characteristics of places from such crowd-contributed content.
Conference Paper
With the ever-increasing elderly population, human activity trackers can help monitor the daily physical activities performed by the elderly in order to contribute towards improvements in independent living and quality of life. Historically, research has explored activity recognition using either multimodal wearable sensors or Historically research has explored activity recognition using only a single inertial measurement unit (IMU) and only explored simple human activities such as walking, but this does not benefit the monitoring of the elderly population wherein the complexity in the details of changes are not captured. This work proposes a multi-sensor approach measuring acceleration and quaternion values to recognise both simple and complex daily living activities using a deep learning approach. We compare and evaluate the performance of using 1, 3 and 5 on-body IMU sensors to train CNN and LSTM networks with both acceleration and quaternion values. The results show that the adoption of the quaternion values from 5 on-body sensors using the LSTM model outperforms all other models (F1-score=0.9606). This high performance provides many opportunities for the accurate monitoring of complex daily living activities.
Artificial intelligence (AI) has been gaining significant attention in various fields to reduce costs, increase revenue, and improve customer satisfaction. AI can be particularly beneficial in enhancing decision-making processes for complex and ill-structured problems that lack transparency and have unclear goals. Most AI algorithms require labeled datasets to learn the problem characteristics, draw decision boundaries, and generalize. However, most datasets collected to solve complex and ill-structured problems do not have labels. Additionally, most AI algorithms are opaque and not easily interpretable, making it hard for decision-makers to obtain model insights for developing effective solution strategies. To this end, we examine existing AI paradigms, mainly symbolic AI (SAI) guided by human domain knowledge and data-driven AI (DAI) guided by data. We propose an approach called informed AI (IAI) by integrating human domain knowledge into AI to develop effective and reliable data labeling and model explainability processes. We demonstrate and validate the use of IAI by applying it to a social media dataset comprised of conversations between customers and customer support agents to construct a solution – IAI defect explorer (I-AIDE). I-AIDE is utilized to identify product defects and extract the voice of customers to help managers make decisions to improve quality and enhance customer satisfaction.
Full-text available
In recent years, mobile phone technology has taken tremendous leaps and bounds to enable all types of sensing applications and interaction methods, including mobile journaling and self-reporting to add metadata and to label sensor data streams. Mobile self-report techniques are used to record user ratings of their experiences during structured studies, instead of traditional paper-based surveys. These techniques can be timely and convenient when data are collected Bin the wild^. This paper proposes three new viable methods for mobile self-reporting projects and in real-life settings such as recording weather information or urban noise mapping. These techniques are Volume Buttons control, NFC-on-Body, and NFC-on-Wall. This work also provides an experimental and comparative analysis of various self-report techniques regarding user preferences and submission rates based on a series of user experiments. The statistical analysis of our data showed that pressing screen buttons and screen touch allowed for higher labelling rates, while Volume Buttons proved to be more valuable when users engaged in other activities, e.g. while walking. Similarly, based on participants' preferences, we found that NFC labelling was also an easy and intuitive technique when used in the context of self-reporting and place-tagging. Our hope is that by reviewing current self-reporting interfaces and user requirements, we will be able to enable new forms of self-reporting technologies that were not possible before.
Full-text available
The detection and monitoring of emotions are important in various applications, e.g. to enable naturalistic and personalised human-robot interaction. Emotion detection often require modelling of various data inputs from multiple modalities, including physiological signals (e.g.EEG and GSR), environmental data (e.g. audio and weather), videos (e.g. for capturing facial expressions and gestures) and more recently motion and location data. Many traditional machine learning algorithms have been utilised to capture the diversity of multimodal data at the sensors and features levels for human emotion classification. While the feature engineering processes often embedded in these algorithms are beneficial for emotion modelling, they inherit some critical limitations which may hinder the development of reliable and accurate models. In this work, we adopt a deep learning approach for emotion classification through an iterative process by adding and removing large number of sensor signals from different modalities. Our dataset was collected in a real-world study from smart-phones and wearable devices. It merges local interaction of three sensor modalities: on-body, environmental and location into global model that represents signal dynamics along with the temporal relationships of each modality. Our ap proach employs a series of learning algorithms including a hybrid approach using Convolutional Neural Network and Long Short-term Memory Recurrent Neural Network (CNN-LSTM) on the raw sensor data, eliminating the needs for manual feature extraction and engineering. The results show that the adoption of deep-learning approaches is effective in human emotion classification when large number of sensors input is utilised (average accuracy 95% and F Measure=%95) and the hybrid models outperform traditional fully connected deep neural network (average accuracy 73% and F-Measure=73%). Further more, the hybrid models outperform previously developed Ensemble algorithms that utilise feature engineering to train the model average accuracy 83% and F-Measure=82%)
Full-text available
Nowadays Personal Informatics (PI) devices are used for sensing and saving personal data, everywhere and at any time, helping people improve their lives by highlighting areas of good and bad performances and providing a general awareness of different levels of conduct. However, not all these data are suitable to be automatically collected. This is especially true for emotions and mood. Moreover, users without experience in self-tracking may have a misperception of PI applications’ limits and potentialities. We believe that current PI tools are not designed with enough understanding of such users’ needs, desires, and problems they may encounter in their everyday lives. We designed and prototype the Mood TUI (Tangible User Interface), a PI tool that supports the self-reporting of mood data using a tangible interface. The platform is able to gather six different mood states and it was tested through several participatory design sessions in a secondary/high school. The solution proposed allows gathering mood values in an amusing, simple, and appealing way. Users appreciated the prototypes, suggesting several possible improvements as well as ideas on how to use the prototype in similar or totally different contexts, and giving us hints for future research.
Full-text available
Urban spaces have a great impact on how people’s emotion and behaviour. There are number of factors that impact our brain responses to a space. This paper presents a novel urban place recommendation approach, that is based on modelling in-situ EEG data. The research investigations leverages on newly affordable Electroencephalogram (EEG) headsets, which has the capability to sense mental states such as meditation and attention levels. These emerging devices have been utilized in understanding how human brains are affected by the surrounding built environments and natural spaces. In this paper, mobile EEG headsets have been used to detect mental states at different types of urban places. By analysing and modelling brain activity data, we were able to classify three different places according to the mental state signature of the users, and create an association map to guide and recommend people to therapeutic places that lessen brain fatigue and increase mental rejuvenation. Our mental states classifier has achieved accuracy of (%90.8). NeuroPlace breaks new ground not only as a mobile ubiquitous brain monitoring system for urban computing, but also as a system that can advise urban planners on the impact of specific urban planning policies and structures. We present and discuss the challenges in making our initial prototype more practical, robust, and reliable as part of our on-going research. In addition, we present some enabling applications using the proposed architecture.
Full-text available
Music genres allow to categorize musical items that share common characteristics. Although these categories are not mutually exclusive, most related research is traditionally focused on classifying tracks into a single class. Furthermore, these categories (e.g., Pop, Rock) tend to be too broad for certain applications. In this work we aim to expand this task by categorizing musical items into multiple and fine-grained labels, using three different data modalities: audio, text, and images. To this end we present MuMu, a new dataset of more than 31k albums classified into 250 genre classes. For every album we have collected the cover image, text reviews, and audio tracks. Additionally, we propose an approach for multi-label genre classification based on the combination of feature embeddings learned with state-of-the-art deep learning methodologies. Experiments show major differences between modalities, which not only introduce new baselines for multi-label genre classification, but also suggest that combining them yields improved results.
Full-text available
Today's mobile phone users are faced with large numbers of notifications on social media, ranging from new followers on Twitter and emails to messages received from WhatsApp and Facebook. These digital alerts continuously disrupt activities through instant calls for attention. This paper examines closely the way everyday users interact with notifications and their impact on users' emotion. Fifty users were recruited to download our application NotiMind and use it over a five-week period. Users' phones collected thousands of social and system notifications along with affect data collected via self-reported PANAS tests three times a day. Results showed a noticeable correlation between positive affective measures and keyboard activities. When large numbers of Post and Remove notifications occur, a corresponding increase in negative affective measures is detected. Our predictive model has achieved a good accuracy level using three different classifiers "in the wild" (F-measure 74-78% within-subject model, 72-76% global model). Our findings show that it is possible to automatically predict when people are experiencing positive, neutral or negative affective states based on interactions with notifications. We also show how our findings open the door to a wide range of applications in relation to emotion awareness on social and mobile communication.
Full-text available
Abstract Over the past few years, there has been a noticeable advancement in environmental models and information fusion systems taking advantage of the recent developments in sensor and mobile technologies. However, little attention has been paid so far to quantifying the relationship between environment changes and their impact on our bodies in real-life settings. In this paper, we identify a data driven approach based on direct and continuous sensor data to assess the impact of the surrounding environment and physiological changes and emotion. We aim at investigating the potential of fusing on-body physiological signals, environmental sensory data and on-line self-report emotion measures in order to achieve the following objectives: 1) model the short term impact of the ambient environment on human body, 2) predict emotions based on-body sensors and environmental data. To achieve this, we have conducted a real-world study ‘in the wild’ with on-body and mobile sensors. Data was collected from participants walking around Nottingham city centre, in order to develop analytical and predictive models. Multiple regression, after allowing for possible confounders, showed a noticeable correlation between noise exposure and heart rate. Similarly, UV and environmental noise have been shown to have a noticeable effect on changes in ElectroDermal Activity (EDA). Air pressure demonstrated the greatest contribution towards the detected changes in body temperature and motion. Also, significant correlation was found between air pressure and heart rate. Finally, decision fusion of the classification results from different modalities is performed. To the best of our knowledge this work presents the first attempt at fusing and modelling data from environmental and physiological sources collected from sensors in a real-world setting.
Full-text available
Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection.
This survey provides a comprehensive overview of the landscape of crowdsourcing research, targeted at the machine learning community. We begin with an overview of the ways in which crowdsourcing can be used to advance machine learning research, focusing on four application areas: 1) data generation, 2) evaluation and debugging of models, 3) hybrid intelligence systems that leverage the complementary strengths of humans and machines to expand the capabilities of AI, and 4) crowdsourced behavioral experiments that improve our understanding of how humans interact with machine learning systems and technology more broadly. We next review the extensive literature on the behavior of crowdworkers themselves. This research, which explores the prevalence of dishonesty among crowdworkers, how workers respond to both monetary incentives and intrinsic forms of motivation, and how crowdworkers interact with each other, has immediate implications that we distill into best practices that researchers should follow when using crowdsourcing in their own research. We conclude with a discussion of additional tips and best practices that are crucial to the success of any project that uses crowdsourcing, but rarely mentioned in the literature.