Conference PaperPDF Available

The Impact of Task Design on Accuracy, Completeness and Discovery in Surveillance-based Crowdsourcing

Authors:

Abstract

A major challenge in crowdsourcing is ensuring that data are of acceptable quality to be useful in analysis and to inform decision making. This is a difficult task, as data contributors in these projects are typically unpaid volunteers, often with variable levels of domain expertise and motivations for participating and contributing content. In this research, we propose a general solution to data quality in surveillance-focused crowdsourcing - the alignment between task design and human capabilities as a key factor shaping the quality of user-generated content. To evaluate our theoretical proposal, we conducted two experiments in the context of citizen science.
2016 Collective Intelligence Conference, NYU Stern School of Business from June 1-3, 2016, New York, NY 10012
The Impact of Task Design on Accuracy, Completeness
and Discovery in Surveillance-based Crowdsourcing
ROMAN LUKYANENKO, Florida International University
JEFFREY PARSONS and YOLANDA F. WIERSMA, Memorial University of Newfoundland
Crowdsourcing is becoming a major driver of commerce, product development, science, healthcare and
public policy, with applications powered by crowds continuously expanding. In this paper we focus on
a major kind of crowdsourcing termed surveillance-focused, in which organizations harness human
perceptive and information-gathering abilities to make sense of the environment in which
organizations operate” [Lukyanenko and Parsons 2015]. Examples of this type of crowdsourcing
include community mapping, crisis management, and online citizen science. In this crowdsourcing
approach, an organization typically develops an online platform where users can provide information
on some observed or experienced phenomena of interest to the organization. An early example of a
surveillance-focused crowdsourcing is MedWatch, established by the US Food and Drug
Administration in 1993 to collect citizens’ reports on adverse reactions and quality reports associated
drugs and medical devices [Kessler et al. 1993]. Other examples include eBird.org (reports of birds),
CitySourced.com and FixMyStreet (crime, graffiti, pot holes), or iSpotNature.org (plants, animals).
A major challenge in crowdsourcing is ensuring that data are of acceptable quality to be useful in
analysis and to inform decision making. This is a difficult task, as data contributors in these projects
are typically unpaid volunteers, often with variable levels of domain expertise and motivations for
participating and contributing content. Concerns about quality continue to inhibit wider adoption of
crowdsourcing. In addition to data quality, other challenges include lack of established principles of
task design, prevalence of difficult-to-use protocols, and challenges in motivating volunteers to
contribute.
In this research, we focus on a general solution to data quality in surveillance-focused crowdsourcing.
We propose the alignment between task design and human capabilities as a key factor shaping the
quality of user-generated content. We further explore an important alignment strategy that considers
the role of conceptual modeling in shaping tasks and its effects on quality.
Specifically, we claim that task design based on predetermined information needs leads to specifying
in advance the classes of phenomena about which information is to be kept (a class-based approach),
resulting in a fixed set of choices for information contributors. As crowds may be incapable of fulfilling
these needs, data quality in surveillance-based crowdsourcing is expected to suffer. In contrast, we
advocate approaching task design in a use-agnostic way informed by our instance-based model of
crowdsourcing that emphasizes recording information about instances and their attributes,
independent of fixed classification [Lukyanenko et al. 2014]. We hypothesize that, compared with a
use-driven, fixed-choice design, a use-agnostic open-choice design leads to leads to greater information
completeness and information accuracy, and increase the opportunities for unanticipated discoveries
from crowd members.
We conducted two experiments to test our hypothesized relationships. In the first, we used a field
experiment in the context of citizen science in biology. We implemented both a class-based and an
instance-based database structure and data collection interface, and randomly assigned contributors
1
2 R. Lukyanenko, J. Parsons, Y. F. Wiersma,
2016 Collective Intelligence Conference, NYU Stern School of Business from June 1-3, 2016, New York, NY 10012
to one of the two versions. We measured the effect of the treatment on both the quantity and novelty
of contributions and found that users assigned to the instance-based condition provided more
observations (390) than users assigned to the class-based condition (87). Users in the instance-based
condition also reported more new classes of organisms (e.g., 119 described at the species-level) than
those in the class-based condition (e.g., 7 described at the species-level). One of the organisms led to
one confirmed biological discovery [Fielden et al. 2015], with others pending verification.
In the second study, we replicated the class-based and instance-based designs in a laboratory setting
(n = 108). In this controlled environment, we tested the impact of class-based versus instance-based
designs on data accuracy, data completeness, ease of use and intention to use the corresponding
system. Both accuracy and completeness were higher in the instance-based condition, while we found
no significant differences on the ease of use and intention to use (potentially showing that differences
in quality were independent of the user interface for the two rival designs). Importantly, we
manipulated familiarity and found high accuracy in the class-based version was only attained for
highly familiar organisms, and was extremely low (almost 0%) for somewhat familiar and unfamiliar
ones. Similarly, completeness was only high in the class-based system for the highly familiar condition
and declined for the other two conditions. In contrast, both accuracy and completeness were extremely
high (close to 100%) in the instance-based system across all conditions of familiarity.
Collectively, the findings support the proposed hypotheses and contribute to a deeper understanding
of task design as a major factor in increasing information accuracy and completeness in
crowdsourcing. This paper builds an earlier study that focused on accuracy [Lukyanenko et al. 2014],
and examines the impact of conceptual modeling on accuracy, completeness and discoveries. Exploring
modeling choices as an antecedent of data quality contributes beyond crowdsourcing to general
information systems use. Despite extensive research on quality and its centrality to organizational
decision making, relatively little is known about what causes low quality data - resulting in "a
significant gap in research" [Petter et al. 2013, p.30]. Furthermore, of various factors that influence
quality such as training, instructions, and input controls issues related to modeling have not been
considered. This is unfortunate given that organizations may exercise weak authority over crowds
(e.g., precluding effective training), but remain in control of development and task design of the
system they sponsor.
REFERENCES
Miles A. Fielden et al. 2015. Aedes japonicus japonicus (Diptera: Culicidae) arrives at the most
easterly point in North America. Can. Entomol. 147, 06 (2015), 737740.
David A. Kessler et al. 1993. Introducing MEDWatch: a new approach to reporting medication and
device adverse effects and product problems. JAMA 269, 21 (1993), 27652768.
Roman Lukyanenko and Jeffrey Parsons. 2015. Beyond Task-Based Crowdsourcing Database
Research. In AAAI Conference on Human Computation & Crowdsourcing (AAAI HCOMP). San
Diego, CA, USA, 12.
Roman Lukyanenko, Jeffrey Parsons, and Yolanda Wiersma. 2014. The IQ of the Crowd:
Understanding and Improving Information Quality in Structured User-generated Content.
Inf. Syst. Res. 25, 4 (2014), 669 689.
Stacie Petter, William DeLone, and Ephraim R. McLean. 2013. Information Systems Success: The
Quest for the Independent Variables. J. Manag. Inf. Syst. 29, 4 (2013), 762.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The emergence of crowdsourcing as an important mode of information production challenges established approaches to data management. Crowdsourcing has attracted increasing research attention; currently, however, the database research community primarily focuses on crowdsourcing research that can be termed task-based. In this paper, we suggest to broaden this effort to include another important type of crowdsourcing, which we term surveillance-focused. We consider the challenges in this domain, review approaches to data modeling for crowdsourcing, and suggest directions for future research.
Article
Full-text available
In 1992, DeLone and McLean suggested that the dependent variable for information systems (IS) research is IS Success. Their research resulted in the widely cited DeLone and McLean (D&M) IS Success Model, in which System Quality, Information Quality, Use, User Satisfaction, Individual Impact, and Organizational Impact are distinct, but related dimensions of IS success. Since the original IS Success Model was published, research has developed a better understanding of IS success. Meanwhile, comprehensive and integrative research on the variables that influence IS success has been lacking. Therefore, we examine the literature on the independent variables that affect IS success. After examining over 600 articles, we focused our attention on integrating the findings of over 140 studies. In this research, we identify 43 specific variables posited to influence the different dimensions of IS success, and we organize these success factors into five categories based on the Leavitt Diamond of Organizational Change: task characteristics, user characteristics, social characteristics, project characteristics, and organizational characteristics. Next, we identify 15 success factors that have consistently been found to influence IS success: Enjoyment, Trust, User Expectations, Extrinsic Motivation, IT Infrastructure, Task Compatibility, Task Difficulty, Attitudes Toward Technology, Organizational Role, User Involvement, Relationship with Developers, Domain Expert Knowledge, Management Support, Management Processes, and Organizational Competence. Finally, we highlight gaps in our knowledge of success factors and propose a road map for future research.
Article
Full-text available
Aedes japonicus japonicus (Theobald) (Diptera: Culicidae), the Asian bush mosquito, is a keen biter linked to the transmission to humans of a variety of diseases. It has moved significantly from its historical Asian distribution, with its arrival in North America first noted in 1998 in New York and New Jersey, United States of America. Here we report the presence of A. j. japonicus within our collections of mosquitoes in the capital city of the easternmost province in Canada: St. John’s, Newfoundland and Labrador, in 2013. This observation provides further evidence of this mosquito’s ability to significantly expand its geographic range, potentially affecting connectivity between subpopulations globally.
Article
Full-text available
User-generated content (UGC) is becoming a valuable organizational resource, as it is seen in many cases as a way to make more information available for analysis. To make effective use of UGC, it is necessary to understand information quality (IQ) in this setting. Traditional IQ research focuses on corporate data and views users as data consumers. However, as users with varying levels of expertise contribute information in an open setting, current conceptualizations of IQ break down. In particular, the practice of modeling information requirements in terms of fixed classes, such as an Entity-Relationship diagram or relational database tables, unnecessarily restricts the IQ of user-generated data sets. This paper defines crowd information quality (crowd IQ), empirically examines implications of class-based modeling approaches for crowd IQ, and offers a path for improving crowd IQ using instance-and-attribute based modeling. To evaluate the impact of modeling decisions on IQ, we conducted three experiments. Results demonstrate that information accuracy depends on the classes used to model domains, with participants providing more accurate information when classifying phenomena at a more general level. In addition, we found greater overall accuracy when participants could provide freeform data compared to a condition in which they selected from constrained choices. We further demonstrate that, relative to attribute-based data collection, information loss occurs when class-based models are used. Our findings have significant implications for information quality, information modeling, and UGC research and practice.
Aedes japonicus japonicus (Diptera: Culicidae) arrives at the most easterly point in North America
  • A Miles
  • Fielden
Miles A. Fielden et al. 2015. Aedes japonicus japonicus (Diptera: Culicidae) arrives at the most easterly point in North America. Can. Entomol. 147, 06 (2015), 737–740.