Conference PaperPDF Available

Abstract

Citizen science enables non-scientists to participate in research, creating opportunities to collect and analyze data in ways that are not possible for individual researchers. Applications of citizen science are growing: volunteers are now engaged in a variety of scientific projects – from folding proteins to finding interstellar dust; from identifying birds to classifying galaxies. A central challenge in harnessing the power of human contributors is developing a technological environment capable of facilitating participation and accommodating individual perspectives of volunteers. Despite the potential for online engagement with citizen science, we argue that the prevailing assumptions and practices underlying data collection in these projects inhibit participation and data quality. We outline the root of these problems and suggest a solution designed to increase participation and data quality simultaneously. The success of citizen science projects depends on the willingness of volunteers to report information, as well as the technological ability to represent such data faithfully. For example, popular projects such as eBird (ebird.org) or iSpot (ispot.org.uk) encourage volunteers to provide detailed information about their plant or animal sightings. In these and similar projects, citizen scientists who wish to be involved must know how to classify species. However, positive species identification can be done reasonably well only by participants with a substantial level of domain knowledge. Those non-experts who are uncertain may choose not to participate or make incorrect guesses about observed species. The result is what appears to be an unavoidable tradeoff between data quality and level of participation. Furthermore, the focus on classification necessarily undermines data quality of citizen science projects. Reality is infinitely diverse and each sighting is unique. By abstracting from this diversity, classification limits the potential richness of communicated information for the sake of simplicity and economy. And while classification does not preclude humans from retaining details of an object, recording individual information as a member of some class in a computer system means that everything that does not " fit " into a preexisting class definition will escape structured storage. This may not be a concern for highly structured, " closed " domains, but it can undermine valuable indigenous observations of citizen scientists. The prevailing " closed world " data storage paradigm is not only contrary to the spirit of citizen science, but may severely limit its potential gains (see, Lukyanenko & Parsons, 2011a, 2011b, 2012; Jeffrey Parsons & Lukyanenko, 2011; J. Parsons, Lukyanenko, & Wiersma, 2011). We contend that it is possible to create an environment that allows broader participation by people with all levels of domain expertise while maintaining higher quality of collected information. To achieve this dual objective, the way information is collected and stored needs to be changed. We propose an approach to data collection and storage that does not require users to identify and classify observed phenomena. Instead, they should be given the option to record observable attributes associated with the information they are contributing. The attribute-centered solution offers a new approach to data collection that removes a psychological and technological barrier to data quality and participation. The outcome is an improved ability to harness human ingenuity which, coupled with better data quality, will increase the relevance of citizen contributions to scientific research.
Cite as:
Lukyanenko, R., Parsons, J., and Wiersma, Y. (2011). Citizen Science 2.0: Overcoming the
Participation and Data Quality Tradeoff. DTMD 2011: The Difference that makes a Difference
Workshop. Milton Keynes, United Kingdom.
Citizen Science 2.0: Overcoming the Participation and Data Quality Tradeoff
Roman Lukyanenko1, Jeffrey Parsons2, Yolanda Wiersma3,
1 Faculty of Business Administration, Memorial University, roman.lukyanenko@mun.ca
2 Faculty of Business Administration, Memorial University, jeffreyp@mun.ca
3 Department of Biology, Memorial University, ywiersma@ mun.ca
Citizen science enables non-scientists to participate in research, creating opportunities to collect and
analyze data in ways that are not possible for individual researchers. Applications of citizen science are
growing: volunteers are now engaged in a variety of scientific projects from folding proteins to finding
interstellar dust; from identifying birds to classifying galaxies. A central challenge in harnessing the
power of human contributors is developing a technological environment capable of facilitating
participation and accommodating individual perspectives of volunteers. Despite the potential for online
engagement with citizen science, we argue that the prevailing assumptions and practices underlying
data collection in these projects inhibit participation and data quality. We outline the root of these
problems and suggest a solution designed to increase participation and data quality simultaneously.
The success of citizen science projects depends on the willingness of volunteers to report
information, as well as the technological ability to represent such data faithfully. For example, popular
projects such as eBird (ebird.org) or iSpot (ispot.org.uk) encourage volunteers to provide detailed
information about their plant or animal sightings. In these and similar projects, citizen scientists who
wish to be involved must know how to classify species. However, positive species identification can be
done reasonably well only by participants with a substantial level of domain knowledge. Those non-
experts who are uncertain may choose not to participate or make incorrect guesses about observed
species. The result is what appears to be an unavoidable tradeoff between data quality and level of
participation.
Furthermore, the focus on classification necessarily undermines data quality of citizen science
projects. Reality is infinitely diverse and each sighting is unique. By abstracting from this diversity,
classification limits the potential richness of communicated information for the sake of simplicity and
economy. And while classification does not preclude humans from retaining details of an object,
recording individual information as a member of some class in a computer system means that
everything that does not “fit” into a preexisting class definition will escape structured storage. This may
not be a concern for highly structured, “closed” domains, but it can undermine valuable indigenous
observations of citizen scientists. The prevailing “closed world” data storage paradigm is not only
contrary to the spirit of citizen science, but may severely limit its potential gains (see, Lukyanenko &
Parsons, 2011a, 2011b, 2012; Jeffrey Parsons & Lukyanenko, 2011; J. Parsons, Lukyanenko, & Wiersma,
2011).
We contend that it is possible to create an environment that allows broader participation by
people with all levels of domain expertise while maintaining higher quality of collected information. To
achieve this dual objective, the way information is collected and stored needs to be changed. We
propose an approach to data collection and storage that does not require users to identify and classify
observed phenomena. Instead, they should be given the option to record observable attributes
associated with the information they are contributing. The attribute-centered solution offers a new
approach to data collection that removes a psychological and technological barrier to data quality and
participation. The outcome is an improved ability to harness human ingenuity which, coupled with
better data quality, will increase the relevance of citizen contributions to scientific research.
REFERENCES
Lukyanenko, R., & Parsons, J. (2011a). Information Loss in the Era of User-Generated Data. In pre-ICIS
SIG IQ (pp. 16). Shanghai, China.
Lukyanenko, R., & Parsons, J. (2011b). Rethinking data quality as an outcome of conceptual modeling
choices (pp. 116). Adelaide, Australia.
Lukyanenko, R., & Parsons, J. (2012). Conceptual modeling principles for crowdsourcing. In Proceedings
of the 1st international workshop on Multimodal crowd sensing (pp. 36). Maui, Hawaii, USA:
ACM. Retrieved from http://doi.acm.org/10.1145/2390034.2390038
Parsons, J., & Lukyanenko, R. (2011). Reconceptualizing Data Quality as an Outcome of Conceptual
Modeling Choices. In Tenth Symposium on Research in Systems Analysis and Design.
Bloomington, IN.
Parsons, J., Lukyanenko, R., & Wiersma, Y. (2011). Easier citizen science is better. Nature, 471(7336), 37
37.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
With the proliferation of unstructured data sources and the growing role of crowdsourcing, new data quality challenges are emerging. Traditional approaches that investigated quality in the context of structured relational databases viewed users as data consumers and quality as a product of an information system. Yet, as users increasingly become information producers, a reconceptualization of data quality is needed. This paper contributes by exploring data quality challenges arising in the era of user-supplied information and defines data quality as a function of conceptual modeling choices. The proposed approach can better inform the practice of crowdsourcing and can enable participants to contribute higher quality information with fewer constraints.
Conference Paper
Full-text available
With the proliferation of unstructured data sources and the growing role of crowdsourcing, new data quality challenges are emerging. Traditional approaches that investigated quality in the context of structured relational databases viewed users as data consumers and quality as a product of an information system. Yet, as users increasingly become information producers, a reconceptualization of data quality is needed. This paper contributes by exploring data quality challenges arising in the era of user-supplied information and defines data quality as a function of conceptual modeling choices. The proposed approach can better inform the practice of crowdsourcing and can enable participants to contribute higher quality information with fewer constraints.
Conference Paper
Full-text available
With the proliferation of crowdsourcing, improving the quality of user-generated data has become a major managerial concern. While traditional data quality research focused on structured data and corporate use, with the growing popularity of social networking, data created via the Internet is increasingly discretionary and heterogeneous. This paper explores the challenges of user-generated content and proposes a new data quality dimension, the degree of information loss resulting from the transformation of user inputs. With this research we aim to improve our understanding of the emerging phenomena and better inform the practice of crowdsourcing.
Conference Paper
Full-text available
Traditionally, the research and practice of conceptual modeling assumed all relevant information about a domain could be discovered through user-analyst communication. The increasing ubiquity of crowdsourcing challenges a number of long-held propositions about conceptual modeling. Given significant differences in levels of domain expertise among contributors in crowdsourcing projects, it is often impossible to predict all valid conceptualizations of a domain by potential users. Approaching conceptual modeling in crowdsourcing using traditional principles of modeling is highly constraining. This paper explores fundamental conceptual modeling challenges in crowdsourcing domains. We then use theoretical foundations in philosophy (ontology) to offer potential solutions.
Article
Full-text available
Non-scientists are now participating in research in ways that were previously impossible, thanks to more web-based projects to collect and analyse data. Here we suggest a way to encourage broader participation while increasing the quality of data. Participation may be passive, as when someone donates their computer's 'downtime' to projects such as SETI@home, or active, as when someone uses eBird to log birds they have spotted. Unfortunately, the prevailing data-collection and storage practices for active projects inhibit participation by non-experts.