Content uploaded by Mohammad Allahbakhsh
Author content
All content in this area was uploaded by Mohammad Allahbakhsh on Feb 14, 2015
Content may be subject to copyright.
Web-Scale Workflow
Editor: Schahram Dustdar • dustdar@dsg.tuwien.ac.at
76 Publis hed by the I EEE Co mputer Societ y 1089-78 01/13/$31.00 © 2013 IEE E IEEE I NTERNET COM PUTING
C
rowdsourcing has emerged as an effec-
tive way to perform tasks that are easy for
humans but remain difficult for comput-
ers.1,2 For instance, Amazon Mechanical Turk
(MTurk; www.mturk.com) provides on-demand
access to task forces for micro-tasks such as
image recognition and language translation.
Several organizations, including DARPA and
various world health and relief agencies, are
using platforms such as MTurk, CrowdFlower
(http://crowdower.com), and Ushahidi (http://
ushahidi.com) to crowdsource work through
multiple chan nels, including S MS, email , Twitte r,
and the World Wide Web. As Internet and mobile
technologies continue to advance, crowdsourcing
can help organizations increase productivit y,
leverage an external (skilled) workforce in addi-
tion to a core workforce, reduce training costs,
and improve core and support processes for both
public and pr ivate sectors.
On the other hand, the people who contrib-
ute to crowdsourcing might have different lev-
els of skills and exper tise that are sometimes
insufcient for doing certain tasks.3 They might
also have var ious and even biased interests and
incentives.1,4 Indeed, in recent years, crowd-
sourcing systems have been widely subject to
malicious activities such as collusive cam-
paigns to support people or products, and fake
reviews posted to online markets.5 Addit ionally,
ill-dened crowdsourcing tasks that don’t pro-
vide workers with enough information about
the tasks and their requirements can also lead
to low-quality contributions from the crowd.6
Addressing these issues requires fundamentally
understanding the factors that impact quality as
well as quality-control approaches being used in
crowdsourcing systems.
Categorizing Quality Control
To crowdsource a task, its owner, also called the
requester, submits the task to a crowdsourcing plat-
form. People who can accomplish the task, called
workers, can choose to work on it and devise solu-
tions. Workers then submit these contributions to
the requester via the crowdsourcing platform.
Quality Control in
Crowdsourcing Systems
Issues and Directions
Mohammad Allahbakhsh, Boualem Benatallah,
and Aleksandar Ignjatovic • University of New South Wales
Hamid Reza Motahari-Nezhad • Hewlett-Packard Laboratories
Elisa Bertino • Purdue University
Schahram Dustdar • Vienna University of Technology
As a new distributed computing model, crowdsourcing lets people leverage
the crowd’s intelligence and wisdom toward solving problems. This article pro-
poses a framework for characterizing various dimensions of quality control
in crowdsourcing systems, a critical issue. The authors briey review existing
quality-control approaches, identify open issues, and look to future research
directions.
IC-17-02-WSWF.indd 76 3/6/13 3:46 PM
Quality Control in Crowdsourcing Systems
MARCH/APRIL 2013 7 7
The requester assesses the posted con-
tributions’ qualit y and might reward
those workers whose contributions
have been accepted. This reward can
be monetary, material, psychological,
and so on.7 A task’s outcome can be
one or more individual contr ibutions
or a combination of accepted ones.
The requester should choose contr i-
butions that reach his or her accepted
level of qualit y for the outcome.
Quality is a subjective issue in
general. Some efforts have proposed
models and metrics to quantita-
tively and objectively assess quality
along different dimensions of a soft-
ware system, such as reliability, accu-
racy, relevancy, completeness, and
consistency.8 In this survey, we
adopt Crosby’s denition of quality
as a guide to identify quality-control
attributes, including dimensions and
factors.9 This denition emphasizes
“conformance to requirements” as a
guiding principle to dene qualit y-
control models. In other words, we
define the quality of outcomes of a
crowdsourced task as
“the extent to which the prov ided out-
come fulfills the requirements of the
requester.”
The overall outcome quality
depends on the denition of the task
that’s being crowdsourced and the
contributing workers’ attributes.1,2
We characterize qualit y in crowd-
sourcing systems along two main
dimensions: worker proles and task
design. We propose a taxonomy for
quality in crowdsourcing systems,
as Figure 1 illustrates.
Worker Proles
The qualit y of a crowdsourced task’s
outcome can be affected by workers’
abilities and quality.2 As Figure 1a
shows, a worker’s quality is charac-
terized by his or her reputation and
expertise. Note that these attributes
are correlated: a worker with high
expertise is expected to have a high
reputation as well. We distinguish
them because reputation is more gen-
eral in nature. In addition to workers’
expertise (which is reected in the
quality of their contributions), we
might compute reputation based on
several other parameters, such as
the worker’s timeliness or the qual-
ity of evaluators. Also, reputation is
a public, community-wide metr ic,
but expertise is task-dependent. For
example, a Java expert with a high
reputation score might not be quali-
ed to undertake a SQL-related task.
Reputation. The tr ust relationship
between a requester and a par ticular
worker reects the probability that the
requester expects to receive a quality
contribution from the worker. At the
community level, because members
might have no experience or direct
interactions with other members,
they can rely on reputation to indi-
cate the community-wide judgment
on a given worker’s capabilities.10
Reputation scores are mainly built
on community members’ feedback
about workers’ activ ities in the sys-
tem.11 Sometimes, this feedback is
explicit — that is, community mem-
bers explicitly cast feedback on a
worker’s quality or contributions by,
for instance, rating or ranking the
content the worker has created. In
other cases, feedback is cast implicitly,
as in Wikipedia, when subsequent edi-
tors preserve the changes a particular
worker has made.
Expertise. A worker’s exper tise dem-
onstrates how capable he or she
is at doing particular tasks.4 Two
types of indicators point to worker
Figure 1. Taxonomy of quality in crowdsourcing systems. We characterize quality along two main dimensions: (a) worker
proles and (b) task design.
Compensation policyGranularityUser interface
Task design
Quality in crowdsourcing systems
ExpertiseReputation
(a) (b)
Worker’s prole
Denition
IC-17-02-WSWF.indd 77 3/6/13 3:46 PM
Web-Scale Workflow
78 www.computer.org/internet/ IEEE INTERNET COMPUTING
expertise: credentials and experi-
ence. Credentials are documents or
evidence from which the requesters
or crowdsourcing platform can
assess a worker’s capabilities as regards
a particular crowdsourced task. Infor-
mation such as academic certicates
or degrees, spoken languages, or
geographical regions that a worker
is familiar with can be credentials.
Experience refers to knowledge and
skills a worker has gained while work-
ing in the system as well as through
support and training. For instance,
in systems such as MTurk and Stack
Overow, workers can improve their
skills and capabilities over time with
shepherding and support.12
Task Design
Task design is the model under which
the requester descr ibes his or her
task; it consists of several compo-
nents. When the requester designates
a task, he or she provides some infor-
mation for workers. The requester
might put a few criteria in place to
ensure that only eligible people can
do the task, or specif y the evalua-
tion and compensation policies. We
identify four impor tant factors that
contribute to quality as regards this
dimension (see Figure 1b): task de-
nition, user interface, granularity,
and compensation policy.
Task denition. The task denition is
the information the requester gives
potential workers regarding the
crowdsourced task. A main element
is a short descr iption of the task
explaining its nature, time limita-
tions, and so on.6 A second element
is the qualication requirements for
performing the task. These spec-
ify the eligibility criteria by which
the requester will evaluate workers
before accepting their participation.
For example, in MTurk, requesters
can specif y that only workers with
a specied percentage of accepted
works (for example, larger than 90
percent) can participate, or that only
those workers living in the US can take
part in a particular survey. Previous
studies show that the qualit y of the
provided denition (such as its clar-
ity or the instructions’ usef ulness)
for a task affects outcome qualit y.6
User interface. The task UI refers
to the interface through which the
workers access and contribute to
the task. This can be a Web UI, an
API, or any other kind of UI. A user-
friendly interface can attract more
workers and increase the chance of
a high-quality outcome. A simple
interface, such as one with nonveri-
able questions, makes it easier for
deceptive workers to exploit the sys-
tem.1 On the other hand, an unnec-
essarily complicated interface will
discourage honest workers and could
lead to delays.
Granularity. We can divide tasks into
two broad t ypes: simple and complex.
Simple tasks are the self-contained,
appropriately short tasks that usu-
ally need little expertise to be
solved, such as tagging or describ-
ing.13 Complex tasks usually need
to be broken down into simpler sub-
tasks. Solving a complex task (such
as writing an article) might require
more time, costs, and expertise, so
fewer people will be interested or
qualied to perform it. Crowds solve
the subtasks, and their contributions
are consolidated to build the nal
answer.13 A complex task workow
denes how these simple subtasks are
chained together to build the overall
task.14 This workow can be itera-
tive, parallel, or a combination.14,15
Designing workf lows for com-
plex tasks greatly affects outcome
qu a l it y.1,2 ,5 For instance, one study
demonstrated that designing a poor
outline for an essay that the crowd
will write can result in a low-qualit y
essay.13 Improving the quality of an
outline using crowd contributions
increases the corresponding written
essay’s quality.
Incentives and compensation policy.
Choosing suitable incentives and
a compensation policy can affect
the crowd’s performance as well as
outcome quality.7,12 Knowing about
evaluation and compensation poli-
cies helps workers align their work
based on these criteria and produce
contributions with higher quality.12
We broadly categorize incentives into
two types: intrinsic incentives, such
as personal enthusiasm or altruism,
and extrinsic incentives, such as
monetar y reward. Intrinsic incen-
tives in conjunction with extrinsic
ones can motivate honest users
to participate in the task. Moreover,
in some cases, the intrinsic incen-
tives’ positive effect on the out-
come ’s quality is more signicant
than the impact of the extrinsic
incentives.16
Looking at monetary rewards,
which are common, the reward
amount attracts more workers and
affects how fast they accomplish the
task, but increasing the amount
doesn’t necessar ily increase outcome
qu a l it y.16 Some research also shows
that the payment method might have
a bigger impact on outcome quality
than the payment amount itself.6,16
For example, in a requested task
that requires nding 10 words in a
puzzle, paying per puzzle will lead
to more solved puzzles than paying
per word.13
Quality-Control Approaches
Researchers and practitioners have
proposed several quality-control
approaches that fall under the afore-
mentioned quality dimensions and
factors. We broadly classify exist-
ing approaches into two categories:
design-time (see Table 1) and run-
time (see Table 2). These two cat-
egories aren’t mutually exclusive. A
task can employ both approaches to
maximize the possibility of receiv-
ing high-quality outcomes.
At design time, the requesters can
leverage techniques for preparing a
IC-17-02-WSWF.indd 78 3/6/13 3:46 PM
Quality Control in Crowdsourcing Systems
MARCH/APRIL 2013 79
well-designed task and just allow a
suitable crowd to contribute to the task.
Although these techniques increase
the possibility of receiving high-
quality contributions from the crowd,
there is still a need to control the
quality of contributions at runtime.
Even high-quality workers might
submit low-quality contributions
because of mistakes or misunder-
standing. Therefore, requesters must
still put in place runtime quality-
control approaches when the task is
running as well as when the crowd
contributions are being collected
and probably aggregated to build the
final task answer. We discuss both
design-time and runtime approaches
in more detail in the Web appendix
at http://doi.ieeecomputersociety.org/
10.1109/MIC.2013.20.
A
lthough researchers have pro-
posed and used several quality-
control approaches so far, many
open issues and challenges remain
for dening, measuring, and man-
aging quality in crowdsourcing sys-
tems, and these issues require further
research and investigation.
One serious limitation of exist-
ing approaches is their reliance
on primitives and hard-wired
quality-control techniques. These
approaches are typically embedded
in their host systems, and requesters
can’t customize them based on their
specic requirements. Dening new
approaches is another challenge that
requesters struggle with. Although
some tools — such as TurKit — that
rely on current crowdsourcing sys-
tems let users dene some quality-
control processes, using these tools
requires programming skills such as
Java or C++.
Endowing crowdsourcing ser-
vices with customizable, rich, and
robust quality-control techniques
is key to crowdsourcing platforms’
Table 1. Existing quality-control design-time approaches.
Quality-control approach Subcategories Description Sample application
Effective task preparation Defensive
design
Provides an unambiguous description of the task;
task design is defensive — that is, cheating isn’t
easier than doing the task; denes evaluation and
compensation criteria
Refere nces 1, 3,6,12
Worker selection Open to all Allows everybody to contribute to the task ESP Game, Thredless.com
Reputation-
based
Lets only workers with prespecied reputation
levels contribute to the task
MTurk, Stack Overow, 4
Credential-
based
Allows only workers with prespecied credentials
to do the task
Wikipedia, Stack Overow, 4
Table 2. Existing quality-control runtime approaches.
Quality-control approach Description Sample application
Exper t review Domain experts check contribution quality. Academic conferences
and journals, Wikipedia, 3
Output agreement If workers independently and simultaneously provide the same
description for an input, they are deemed correct.
ESP Game
Input agreement Independent workers receive an input and describe it to each other.
If they all decided that it’s a same input, it’s accepted as a quality answer.
Tag-A-Tune
Ground truth Compares answers with a gold standard, such as known answers
or common sense facts to check the quality.
CrowdFlower, MTurk
Majority consensus The judgment of a majority of reviewers on the contribution’s quality
is accepted as its real quality.
TurKit, Threadless.com,
MTu r k
Contributor evaluation Assesses a contribution based on the contributor’s quality. Wikipedia, Stack
Ove r o w, MTurk
Real-time support Provides shepherding and support to workers in real time to help
them increase contribution quality.
Refe rence 12
Workow management Designs a suitable workow for a complex task; workow is monitored
to control quality, cost, and so on, on the y.
References 13,14
IC-17-02-WSWF.indd 79 3/6/13 3:46 PM
Web-Scale Workflow
80 www.computer.org/internet/ IEEE INTERNET COMPUTING
wide ranging success — whether it’s
supporting micro and commod-
ity tasks or high-value processes
(such as business processes or intel-
ligence data gathering). Requesters
can achieve this functionality using
a generic quality-control framework
that lets them dene new quality-
control approaches and reuse or
customize existing ones. Such a
framework should also be capable
of being seamlessly integrated
wit h e x i st ing crowdsourcing plat-
forms to let requesters benet from
both crowdsourcing and qualit y-
control systems simultaneously.
Building such a framework can
be an interesting future direction
for research in the crowdsourcing
arena.
Another major limitation of
existing quality-control approaches
comes from the subjective nature
of quality, particularly in crowd-
sourcing systems. The quality of a
task’s outcome might depend on sev-
eral parameters, such as requesters’
requirements, task properties, crowd
interests and incentives, and costs.
Currently, quality-control techniques
are domain-specic — that is, a tech-
nique that performs well for some
tasks might perform poorly on new
and different ones. For instance,
approaches that are suitable for check-
ing a written essay’s quality are dif-
ferent from those to control quality in
an image-processing task. Finding a
suitable approach based on a particu-
lar situation is a challenge that needs
more investigation.
One solution to this limitation is
a recommender system, which gives
requesters a list of adequate quality-
control approaches. Such a recom-
mender could use machine learning
techniques to provide more precise
recommendations. It should offer
the requester a list of approaches
that best suit the situation based
on the requester’s profile (social
relations, history, interests, and so
on), the task’s ty pe and attributes,
the history of the existing crowd,
and the quality requirements of
the task, along with many more
opt ions. Design ing such a system
can be a suitable direction for further
st udy.
Thanks to Web 2.0 technologies
and the rapid growth of mobile com-
puting in the form of smartphones,
tablets, and so on, a tremendous
amount of human computation power
is available for accomplishing jobs
almost for free. On the other hand,
articial intelligence and machine
learning are fast-growing areas in
computer science. We envision that,
in the near future, combining the
crowd and machines to solve prob-
lems will be easily feasible.17 Th is
will raise some interesting research
challenges. Topics such as machine
versus human trustworthiness,
workow design for such tasks, and
conict resolution between human
and machine judgments will all need
to be addressed.
Moreover, people are at the core
of crowdsourcing systems. How-
ever, they’re also distributed among
separated online communities, and a
requester can’t easily employ crowds
from several communities. We fore-
see that this will be simplied in the
near future via service composition
middleware. Building such middle-
ware will require addressing several
issues, including how to share people-
quality indicators such as reputation
and expertise between different com-
munities and how to build a public
global picture for each individual
based on his or her available his-
tory of activities in different possi-
ble crowd communities. Addressing
these issues will be another interest-
ing research direction for crowd-
sourcing systems.
References
1. A. Kitt ur, E. Chi , and B. Su h, “Crowd-
sourcing User Studies with Mechani-
cal Turk,” Proc. 26th Ann. SIGCHI Conf.
Human Factors in Computing Systems,
ACM, 2008, pp. 453–456.
2. R. Khazankin, S. Daniel, and S. Dustdar,
“Predicting QoS in Sc hedu led Cr owd-
sourcing,” Advanced Infor mation Sys-
tems Eng., vol. 7328, J. Ralyté et al., eds.,
Springer, 2012, pp. 460–472.
3. A.J. Qui nn and B.B. Bederson, “Human
Computat ion: A Sur vey and Taxonomy of
a Growing Field,” Proc. 2011 Ann. Conf.
Human Factors in Computing Systems,
ACM, 2011, pp. 1403–1412.
4. D. Schall , F. Skopik, and S. Dustdar,
“Exper t Discover y and Inter actions in
Mixed Ser vice-Oriented Systems,” IEEE
Trans. Services Computin g, vol. 5, no. 2,
2012, pp. 233–245.
5. G. Wang et al., “Serf and Turf: Crowd-
turfing for Fun and Profit,” Pr o c. 21st
Int’l Conf. World Wide Web, AC M, 2012,
pp. 679–688.
6. J.J. Chen, N. Menezes, and A. Bradley,
“Opportunities for Crowdsourcing Research
on Ama zon Mechanical Turk,” Pro c.
CHI 2011 Workshop Crowdsourcing
and Human Computat ion, 2011; htt p://
crowdresearc h.org/chi2011-workshop/
papers/chen-jenny.pdf.
7. O. Scekic, H. Tr uong, and S. Dust-
da r, “Mode ling Rewards and Ince ntive
Mechani sms for Socia l BPM,” Busi-
ness Process Management, vol. 7481,
A. Barros et al., eds., Springer, 2012,
pp. 15 0–155.
8. E. Agichtein et a l., “Finding High-Quality
Content in Social Med ia,” Proc. Int’l Conf.
Web Search and Web Data Mining, ACM,
2008, pp. 183–194.
9. P. Crosby, Quality is Free, McGraw-Hill ,
1979.
10. A. Jøsang, R. I smail, and C. Boyd , “A
Sur vey of Tr ust and Reputation Sy stem s
for Onl ine Se rv ice Provision,” Decision
Suppor t Systems, vol. 43, no. 2 , 2007,
pp. 618– 6 44 .
11. L.D. Al faro et al., “Reputation Systems
for Open Collaboration ,” Comm. AC M,
vol. 54, no. 8, 2011, pp. 81–87.
12. S.P. Dow et al., “Shepherd ing t he
Crowd Yields Better Work,” Proc. 2012
ACM Conf. Computer Suppor ted Coop-
erative Work (CSCW 12), ACM, 2012,
pp. 1013–1022 .
IC-17-02-WSWF.indd 80 3/6/13 3:46 PM
Quality Control in Crowdsourcing Systems
13. A. Kit tu r et al., “CrowdForge: Crowd-
sourcing Complex Work ,” Proc. 24th Ann.
ACM Symp. User Interface Software and
Technology, ACM, 2011, pp. 43–52.
14. A . Kul karni , M. Ca n, and B. Har tmann ,
“Collaboratively Crowdsourci ng Work-
flows wit h Turkomat ic,” P r o c. 2 012
ACM Conf. Computer Suppor ted Coop-
erative Work (CSCW 12), ACM, 2012,
pp. 1003 –1012.
15. G. Litt le et al. , “TurK it: Huma n Computat ion
Algor ith ms on Mechanica l Turk,” Proc.
23nd Ann. ACM Symp. User Interfa ce
Software and Technology, ACM, 2010,
pp. 57–66.
16. W. Mason and D.J. Watts, “Financial
Incentives and t he ‘Per formance of
Crowds,’” SIGKDD Explorations News-
letter, vol. 11, 2010, pp. 100–108.
17. H.L. Tr uong, S. Dustdar, a nd K. Bhat-
tacharya, “P rogr amming Hybr id Ser vices
in the Cloud,” Service-Oriented Comput-
ing, vol. 7636, C. Liu et al., eds., Springer,
2012, pp. 96–110.
Mohammad Allahbakhsh is a PhD cand idate in
the Sc hool of Computer Science and Engi-
neer ing at t he University of New Sout h
Wales, Aust ralia. H is research focuses
on quality control in crowdsou rci ng sys-
tems. Allahbakhsh ha s an MS i n software
engi neer ing from Ferdowsi Unive rsity of
Mash had. He’s a student member of IEEE.
Contact h im at mallahba khsh@cse.uns w
.edu.au.
Boualem Benatallah is a profe ssor of com-
puter science at the Univer sit y of New
South Wales, Austral ia. His resear ch
interest s include system a nd data i nte-
grat ion, process modeling, a nd ser vice-
oriented ar chitect ures. Be natallah has a
PhD in computer science from Grenoble
University, Fra nce. He’s a member of
IEEE. Contact him at boua lem@cse.un sw
.edu.au.
Aleksandar Ignjatovic is a senior lec turer
in the School of Computer Science and
Engi neer ing at the Univer sit y of New
South Wales, Austral ia. His cu rrent
research interests include applications
of mathematical logic to computationa l
complex ity theory, sampling theory, and
onli ne communities. Ignjatovic has a PhD
in mathemat ical logic from the Univer-
sity of California, Berkeley. Contact him
at ignjat@cse.unsw.edu.au.
Hamid Reza Motahari-Nezhad is a research
scientist at Hewlett-Packard Laborato-
ries in Palo A lto, Ca lifornia. His research
interests include business process man-
agement, social computing, and ser vice-
oriented computing. Motahar i-Nez had has
a PhD in computer science and engineer-
ing from the Unive rsity of New South
Wales, Aust ralia. He’s a member of the
IEEE Compute r Society. Contact hi m at
hamid.motahari@hp.com.
Elisa Bertino is a profe ssor of computer sci-
ence at Purdue University and ser ves
as research di rec tor for the Center for
Education and Resear ch in I nfor mation
Assura nce and Security (CERIAS). Her
main research interest s include secur ity,
privacy, digital identity management
systems, database systems, distrib-
uted sy stem s, and multimed ia systems.
She’s a fel low of IE EE and ACM and has
been named a Golden Core Member for
her serv ice to the IEEE Computer Soci-
ety. Contact her at bertino@cs.purdue
.edu.
Schahram Dustdar is a full professor of com-
puter science and head of t he Distributed
Syste ms Group, In stitute of Informa-
tion Systems, at the Vienna University
of Technolog y. His r esea rch inter ests
include service-or iented arch itectures
and computing, cloud a nd elastic com-
puting, complex and adaptive systems,
and context-aware computing. Du stda r
is an ACM Distinguished Sc ientist (20 09)
and IBM Faculty Awa rd recipient (2012).
Contact h im at dustdar@dsg.tuw ien
.ac.at.
Selected CS articles and columns
are also available for free at http://
ComputingNow.computer.org.
IC-17-02-WSWF.indd 81 3/6/13 3:46 PM