Content uploaded by Roman Lukyanenko
Author content
All content in this area was uploaded by Roman Lukyanenko on Mar 18, 2016
Content may be subject to copyright.
Content uploaded by Roman Lukyanenko
Author content
All content in this area was uploaded by Roman Lukyanenko on Mar 18, 2016
Content may be subject to copyright.
2016 Collective Intelligence Conference, NYU Stern School of Business from June 1-3, 2016, New York, NY 10012
The Impact of Task Design on Accuracy, Completeness
and Discovery in Surveillance-based Crowdsourcing
ROMAN LUKYANENKO, Florida International University
JEFFREY PARSONS and YOLANDA F. WIERSMA, Memorial University of Newfoundland
Crowdsourcing is becoming a major driver of commerce, product development, science, healthcare and
public policy, with applications powered by crowds continuously expanding. In this paper we focus on
a major kind of crowdsourcing termed surveillance-focused, in which “organizations harness human
perceptive and information-gathering abilities to make sense of the environment in which
organizations operate” [Lukyanenko and Parsons 2015]. Examples of this type of crowdsourcing
include community mapping, crisis management, and online citizen science. In this crowdsourcing
approach, an organization typically develops an online platform where users can provide information
on some observed or experienced phenomena of interest to the organization. An early example of a
surveillance-focused crowdsourcing is MedWatch, established by the US Food and Drug
Administration in 1993 to collect citizens’ reports on adverse reactions and quality reports associated
drugs and medical devices [Kessler et al. 1993]. Other examples include eBird.org (reports of birds),
CitySourced.com and FixMyStreet (crime, graffiti, pot holes), or iSpotNature.org (plants, animals).
A major challenge in crowdsourcing is ensuring that data are of acceptable quality to be useful in
analysis and to inform decision making. This is a difficult task, as data contributors in these projects
are typically unpaid volunteers, often with variable levels of domain expertise and motivations for
participating and contributing content. Concerns about quality continue to inhibit wider adoption of
crowdsourcing. In addition to data quality, other challenges include lack of established principles of
task design, prevalence of difficult-to-use protocols, and challenges in motivating volunteers to
contribute.
In this research, we focus on a general solution to data quality in surveillance-focused crowdsourcing.
We propose the alignment between task design and human capabilities as a key factor shaping the
quality of user-generated content. We further explore an important alignment strategy that considers
the role of conceptual modeling in shaping tasks and its effects on quality.
Specifically, we claim that task design based on predetermined information needs leads to specifying
in advance the classes of phenomena about which information is to be kept (a class-based approach),
resulting in a fixed set of choices for information contributors. As crowds may be incapable of fulfilling
these needs, data quality in surveillance-based crowdsourcing is expected to suffer. In contrast, we
advocate approaching task design in a use-agnostic way informed by our instance-based model of
crowdsourcing that emphasizes recording information about instances and their attributes,
independent of fixed classification [Lukyanenko et al. 2014]. We hypothesize that, compared with a
use-driven, fixed-choice design, a use-agnostic open-choice design leads to leads to greater information
completeness and information accuracy, and increase the opportunities for unanticipated discoveries
from crowd members.
We conducted two experiments to test our hypothesized relationships. In the first, we used a field
experiment in the context of citizen science in biology. We implemented both a class-based and an
instance-based database structure and data collection interface, and randomly assigned contributors
1
2 R. Lukyanenko, J. Parsons, Y. F. Wiersma,
2016 Collective Intelligence Conference, NYU Stern School of Business from June 1-3, 2016, New York, NY 10012
to one of the two versions. We measured the effect of the treatment on both the quantity and novelty
of contributions and found that users assigned to the instance-based condition provided more
observations (390) than users assigned to the class-based condition (87). Users in the instance-based
condition also reported more new classes of organisms (e.g., 119 described at the species-level) than
those in the class-based condition (e.g., 7 described at the species-level). One of the organisms led to
one confirmed biological discovery [Fielden et al. 2015], with others pending verification.
In the second study, we replicated the class-based and instance-based designs in a laboratory setting
(n = 108). In this controlled environment, we tested the impact of class-based versus instance-based
designs on data accuracy, data completeness, ease of use and intention to use the corresponding
system. Both accuracy and completeness were higher in the instance-based condition, while we found
no significant differences on the ease of use and intention to use (potentially showing that differences
in quality were independent of the user interface for the two rival designs). Importantly, we
manipulated familiarity and found high accuracy in the class-based version was only attained for
highly familiar organisms, and was extremely low (almost 0%) for somewhat familiar and unfamiliar
ones. Similarly, completeness was only high in the class-based system for the highly familiar condition
and declined for the other two conditions. In contrast, both accuracy and completeness were extremely
high (close to 100%) in the instance-based system across all conditions of familiarity.
Collectively, the findings support the proposed hypotheses and contribute to a deeper understanding
of task design as a major factor in increasing information accuracy and completeness in
crowdsourcing. This paper builds an earlier study that focused on accuracy [Lukyanenko et al. 2014],
and examines the impact of conceptual modeling on accuracy, completeness and discoveries. Exploring
modeling choices as an antecedent of data quality contributes beyond crowdsourcing to general
information systems use. Despite extensive research on quality and its centrality to organizational
decision making, relatively little is known about what causes low quality data - resulting in "a
significant gap in …research" [Petter et al. 2013, p.30]. Furthermore, of various factors that influence
quality – such as training, instructions, and input controls – issues related to modeling have not been
considered. This is unfortunate given that organizations may exercise weak authority over crowds
(e.g., precluding effective training), but remain in control of development and task design of the
system they sponsor.
REFERENCES
Miles A. Fielden et al. 2015. Aedes japonicus japonicus (Diptera: Culicidae) arrives at the most
easterly point in North America. Can. Entomol. 147, 06 (2015), 737–740.
David A. Kessler et al. 1993. Introducing MEDWatch: a new approach to reporting medication and
device adverse effects and product problems. JAMA 269, 21 (1993), 2765–2768.
Roman Lukyanenko and Jeffrey Parsons. 2015. Beyond Task-Based Crowdsourcing Database
Research. In AAAI Conference on Human Computation & Crowdsourcing (AAAI HCOMP). San
Diego, CA, USA, 1–2.
Roman Lukyanenko, Jeffrey Parsons, and Yolanda Wiersma. 2014. The IQ of the Crowd:
Understanding and Improving Information Quality in Structured User-generated Content.
Inf. Syst. Res. 25, 4 (2014), 669 – 689.
Stacie Petter, William DeLone, and Ephraim R. McLean. 2013. Information Systems Success: The
Quest for the Independent Variables. J. Manag. Inf. Syst. 29, 4 (2013), 7–62.