Conference PaperPDF Available

Citizen Science 2.0: Data Management Principles to Harness the Power of the Crowd

Authors:

Abstract and Figures

Citizen science refers to voluntary participation by the general public in scientific endeavors. Although citizen science has a long tradition, the rise of online communities and user-generated web content has the potential to greatly expand its scope and contributions. Citizens spread across a large area will collect more information than an individual researcher can. Because citizen scientists tend to make observations about areas they know well, data are likely to be very detailed. Although the potential for engaging citizen scientists is extensive, there are challenges as well. In this paper we consider one such challenge – creating an environment in which non-experts in a scientific domain can provide appropriate and accurate data regarding their observations. We describe the problem in the context of a research project that includes the development of a website to collect citizen-generated data on the distribution of plants and animals in a geographic region. We propose an approach that can improve the quantity and quality of data collected in such projects by organizing data using instance-based data structures. Potential implications of this approach are discussed and plans for future research to validate the design are described.
Content may be subject to copyright.
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
1
Citizen Science 2.0: Data Management Principles to
Harness the Power of the Crowd
Roman Lukyanenko1 , Jeffrey Parsons , Yolanda Wiersma2,
1 Faculty of Business Administration, Memorial University, St. John’s Canada
2 Department of Biology, Memorial University, St. John’s Canada
{roman.lukyanenko, jeffreyp, ywiersma}@mun.ca
Abstract. Citizen science refers to voluntary participation by the general public
in scientific endeavors. Although citizen science has a long tradition, the rise of
online communities and user-generated web content has the potential to greatly
expand its scope and contributions. Citizens spread across a large area will
collect more information than an individual researcher can. Because citizen
scientists tend to make observations about areas they know well, data are likely
to be very detailed. Although the potential for engaging citizen scientists is
extensive, there are challenges as well. In this paper we consider one such
challenge creating an environment in which non-experts in a scientific
domain can provide appropriate and accurate data regarding their observations.
We describe the problem in the context of a research project that includes the
development of a website to collect citizen-generated data on the distribution of
plants and animals in a geographic region. We propose an approach that can
improve the quantity and quality of data collected in such projects by
organizing data using instance-based data structures. Potential implications of
this approach are discussed and plans for future research to validate the design
are described.
Keywords: design, citizen science, management, database design, conceptual
modeling, data quality.
1 Introduction
Citizen science is a term used to describe the voluntary participation of amateur
scientists in scientific endeavors [1]. Humans are increasingly regarded as effective
sensors of their environment [2] and the potential for using information collected by
individuals is continuously expanding [3]. Citizen science has a long tradition. During
the Victorian era many wealthy individuals engaged in natural history as a hobby, and
made contributions to the understanding of species distributions and behavior as a
result. With the development of the Internet, it has become easier for ordinary people
to participate and contribute large amounts of information. Yet, given the expertise
and language gap between scientists and ordinary people, information transfer in
citizen science projects is not straightforward. While citizen scientists can offer
insights and generate new ideas [4], their lack of training and expertise results in
inconsistent and incorrect data [5,6,7]. In particular, where direct elicitation of
peoples opinions is required we can expect lower scientific accuracy of data as wider
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
audiences with lesser expertise get engaged. This research attempts to address this
problem by suggesting data management principles that maximize the quantity and
quality of information collected from non-experts.
There are many advantages to harnessing citizen scientists. Participants spread
across a large area will collect more information than an individual researcher can.
Because citizen scientists tend to make observations about areas they know well, data
are likely to be very detailed. An additional advantage is the potential longevity of
such data; some citizen science programs (e.g., the Audubon Christmas Bird Count)
have been in existence for over 100 years, resulting in data sets extending over long
periods, thus enabling analysis of trends. Coupled with the availability of relatively
inexpensive photo and video equipment, harnessing the power of ordinary people to
provide data and observations about the natural world can lead to major advances in
the natural sciences, as well as assist in vital areas of wildlife conservation and
emergency management in the event of natural disasters (such as the Gulf of Mexico
oil spill).
Although the potential for engaging citizen scientists is extensive, there are
challenges as well. In this paper we describe one such challenge creating an online
environment in which non-experts in a scientific domain can provide appropriate and
accurate data regarding their observations. We describe the problem in the context of
a research project that includes the development of a website and database to collect
citizen-generated data on the distribution of plants and animals in a geographic
region. We propose an approach to improving the quantity and quality of data
collected in such projects by using instance-based data structures [8]. Potential
implications of this approach are discussed and plans for future research to validate
the design are described.
2 The Challenge Facilitating Participation
The success of a citizen science project depends on the willingness and ability of
members of the general public to voluntarily observe and report information. In many
cases, this in turn requires some level of scientific knowledge by participants. For
example, the website of the Cornell Ornithology Lab, eBird (www.ebird.com ), draws
on the enthusiasm of avid birders to provide detailed information about bird sightings.
The Cornell Lab is an international leader in ornithological research, and eBird is an
exemplar of a successful online citizen science project. However, engagement of the
lay public with eBird may be limited by the application domain. Citizen scientists
who wish to upload bird sightings need to be familiar with bird taxonomy and
identification. The bird checklist provided in the online interface assumes the user has
already made a positive identification (i.e., identified the species) and knows to which
taxonomic group the bird belongs. This is acceptable for a reasonably experienced
citizen scientist, but the rank beginner ([7] provides a taxonomy of expertise levels
among citizen scientists) may not be able to participate, or may provide data of poor
quality as a result of his/her inability to make a positive identification [9]. Thus,
useful participation may be limited to more experienced amateurs.
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
The issue of quality and reliability of user-supplied data in citizen science projects
has attracted much attention in recent research [5]. Although the literature is limited
(given the relative recency of Web 2.0 applications), the implied assumption of much
of the work to date is that there exists an inherent trade-off between data quality and
the level of participation (data quantity). Experts are considered to be the source of
the most accurate volunteered information [7], but there are fewer expert amateurs
than beginners available to participate.
The common method of increasing data quality considered in the literature is
training and educating the volunteers. For example, data inconsistency may result
from volunteers lack of experience, inadequate guidelines and insufficient training
[4], rolled up into larger monitoring projects [6]. Training, while generally
desirable, may not always be possible, especially for low budget projects.
A typical way to increase quality is through expert verification, an approach that
has been used for by-catch and beached bird observation [12] and for unusual
observations on eBird [19]. However, with the size of data sets increasing [6],
individual verification becomes unrealistic and in many ways is contrary to the spirit
of citizen science.
Another line of research suggests social networking as key to increasing data
quality. Some research has proposed a trust and reputation model for classifying
knowledge using the social networking practice of peer evaluation of content [13].
This approach is the basis for iSpot, a website that exploits a user reputation
mechanism to determine accuracy of observations [14]. The reputation-trust model
adapted from well-developed trust research in e-commerce [e.g., 15,16,17] has been
applied to the context of citizen science [18]. While the social networking approach
appears promising, it has a number of limitations. Although it has been compared to
the scientific peer review process [13], social networking is useful only for popular
citizen science projects with large numbers of users. Web sites with a small number of
users may not have sufficient user activity per observation to ensure rigorous peer
review. In addition, as even a very popular website cannot guarantee that every
observation will receive equal scrutiny, this metaphor of scientific review is not fully
justified. Furthermore, users with high reputation who are considered experts in some
domain may still provide inaccurate data in other domains. Most importantly, social
networking may fail to harness the potential of an individual non-expert, as in the
absence of domain knowledge such volunteers may feel too intimidated to express
their opinion (consider the description of a type 'neophyte' [7]). Finally, the social
networking approach lacks generality, as it relies on a particular technology, and may
exclude many citizen science projects that do not currently employ a social
networking model.
Notwithstanding the value of the above approaches, we argue that it is possible to
increase the quality of data generated by of an individual volunteer by minimizing
subject information that has a high likelihood of being inaccurate. Requiring
volunteers to make a (potentially inaccurate) positive identification of natural history
phenomena implies that the observer has some knowledge of traditional scientific
taxonomy. We argue that an alternative to classifying observations according to a
fixed taxonomy is to allow volunteers to provide information about observations and
that this will increase the general success of citizen-scientist projects.
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
3 A Proposed Solution Attribute-based Data Collection
A traditional approach to citizen participation in scientific data collection works well
(i.e., makes it possible to collect accurate data from a broad constituency) only if the
participants are capable of classifying observed phenomena accurately. For example,
accurate classification of observed plants and animals by species requires that
participants understand the distinguishing characteristics of species. We contend that
imposing this requirement on participation, as in projects such as eBird, imposes a
severe and unnecessary restriction on the level of participation that can be realized in
citizen science projects.
To combat this limitation, we propose an approach to data collection and storage
that does not require users to classify observed phenomena. Instead, they record any
attributes associated with the observation. We illustrate the approach in the context of
NLNature an ongoing citizen scientist-based project to collect data about the flora
and fauna of Newfoundland & Labrador (www.nlnature.com ). Our proposal is based
on the instance-based data model (IBDM) [8] and our application of the model has
implications both for interface design and for database design. Working within the
framework of the IBDM, we extend the model to address issues of identifying
phenomena, and suggest how the model offers a solution to the challenges of a typical
citizen scientist project.
The IBDM is based on ontological and cognitive principles [8, 20]. Ontologically,
every thing possesses a unique set of properties. Classes are formed based on the
principle that one can classify things based on a subset of their observed properties,
and make inferences about unobserved properties the instance possesses by virtue of
belonging to the class [21]. Since an instance can possess very many properties, it can
belong to a very large number of potential classes, depending on the context.
By shifting the focus from a predefined classification to the thing (instance) and its
attributes (see Fig. 1) we do not need to model a domain a priori in terms of the
classes of interest. It is sufficient to ensure that the application has a comprehensive
collection of instances, and each instance contains a set of well-defined attributes.
When required, a user can assemble a dynamic classification based on the collection
of attributes that are of interest at a given moment. For example, if an attribute such as
behavior” is of interest, then at least two classes can be constructed based on values:
animals that are nocturnal (active at night) vs. diurnal (active during the day). The
same system can also use attributes that connect each species with a biological
taxonomy to reproduce scientific biological classification. Thus, the instance-based
model is capable of achieving the objectives of a traditional classification without the
inherit limitations.
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
Fig. 1. Traditional vs. attribute-based information (Image source: Wikimedia Commons).
We posit that attribute-based design will enable potential citizen scientists to
provide data efficiently and effectively, thereby increasing their participation in data
gathering. We propose a data collection interface designed based on the primacy of a
phenomenon and its attributes over classification of the phenomenon. A user is asked
to identify those attributes (e.g., size, color, appearance, behavior, location, sound) of
an observed plant or animal. In principle, the primary scientific object of an
observation (the species observed) can be identified by an expert after the observation
is recorded, provided that the user reports enough attributes to produce a positive
identification. This contrasts with traditional approaches requiring a priori
classification (e.g., requiring users to select from a checklist of species), which are
usable only by more expert volunteers. Once several attributes are selected, the
system will match them with pre-existing sets of identifying attributes for species, and
either infer a species or ask for additional attributes that could also be automatically
inferred from those previously supplied.
Although the final attribute set resulting from an observation can potentially match
multiple species, this proposed solution offers a realistic compromise. Non-experts do
not always know the phenomenon that was observed. It is more realistic to expect a
volunteer to remember some features of unknown species then to expect a precise
classification and identification. The key activity of identification therefore shifts
from designing a perfect classification to facilitating effective attribute management.
The more the system can guide the choice of attributes, the higher inferential value
such records hold, and the easier it is to classify observations.
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
4 Attribute-based Database Design
Database structure can be either a major inhibitor or a facilitator of system evolution
[8, 20, 22]. Traditionally, database design results in a representation of the application
domain as a set of related classes (translated to tables in a relational database). In
addition, once the database structure is established it is assumed to be relatively static,
allowing other application elements, such as program code, to be created based on the
static structure. Altering the database structure once a system is built is costly. Thus,
traditional database design is subject to the inherit limitations of a rigid classification
[8].
The collection of user-supplied information based on attributes of observations
suggests the need for a database structure that supports variability in the data collected
from observers, including failure to classify an observation. Support for flexible
attribute collection can be implemented using a traditional relational database, as
illustrated in Fig. 2. We propose storing attributes in a generic table “Attributes that
contains attribute name and a unique identifier. A separate Attributes-Relationships
table links one attribute to another and creates relationships between attributes. The
table contains the primary key from the parent attribute and a primary key from the
child attribute, thus making many-to-many relationships possible. For example, if the
user selects the attribute was flying then lives in water” will be automatically
removed from the interface, and the system will respond by presenting a new set of
potential attributes that can be inferred from was flying (e.g., has feathers”, seen
at night”, six legs”). The choice of the first attribute narrows the observation to a bird
(subsequent attributes could focus on feather color, beak size, habitat, etc.), the
second to a bat, and the third to a flying insect.
Fig. 2. ER diagram showing instance-based data structure for a typical citizen scientist project.
In order to match selected attributes against a class-defining set, class blueprints
for each species need to be maintained. This is achieved by a table of Species
Definitions that links species with their attributes via a one-to-many relationship. For
example, boreal felt lichen will link to the following attributes: fuzzy white fringe
around the edges, grayish-brown when dry, has red dots, leafy, slate-blue when moist.
User-observed properties are then matched against the class definitions to infer class
membership. If necessary, new class definitions can be added or existing ones altered
during the operational phase of the enterprise system without having to change the
database schema. Finally, we provide tables that join objects and attributes to store the
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
details of the observations. These tables store events in the system. Each table
includes primary keys from the attributes and objects tables, attribute values, and
date/time of the attribute creation/change. By recording the date of attribute
creation/change, the system can document events that happen to the same
phenomenon. This approach addresses a persistent issue of database design
adaptation to organizational change that a traditional approach with its reliance on
rigid classification struggles to resolve [8, 22].
5 Implications for Data Gathering
The attribute based system proposed for this citizen scientist project has the potential
to increase participation rates (and, hence, data quantity). Unlike natural history
websites that only present taxonomic checklists and assume a basic level of expertise
from citizen scientists, the system proposed here allows for the full spectrum of
volunteer contributors [7] to participate. We believe that this will provide a means of
validating user-supplied data within the user community, particularly if users supply
additional information with their observations (e.g., photographs) that can be
reviewed by experts when necessary.
Many citizen science projects provide inventory data across space and time.
Although there will be biases within the data (for example, to areas where there is
high human population density and to more charismatic or easily observable species),
the data do have the benefit of indicating long-term trends. For the scientific
community, the biggest value is that such data sets are generated by many eyes on
the ground; thus, there is a higher likelihood of rare or unusual species being
detected or for early detection of new trends. Hence, it is important to have a usable
system that promotes a broad and consistent level of participation. Some potential
uses of data collected this way might be unanticipated. For example, long term data
can be useful to identify benchmark conditions in the event of a natural or
anthropogenic disaster (e.g., the Gulf oil spill) and can guide restoration strategies.
This research explores general ways of facilitating information transfer between
users with different level of domain expertise within the context of a citizen science
project. Information systems are increasingly being used to collect data from ordinary
people (e.g. personal health records [23]). While a number of factors are considered to
influence information quality (e.g., [24]), little attention is given to the role of data
structures in ensuring quality of collected information.
6 Limitations and Future Research
Internet technologies open new opportunities for citizen science. Yet the knowledge
requirements implied by rigid data structures constrain effective participation of
novices and thereby limit the potential outreach of citizen science projects. A
successful implementation of the approach proposed in this paper can facilitate
development of citizen-scientist initiatives. We believe it also has broader
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
applications based on user-generated content, and promises to be a practical solution
to an important design problem in citizen science.
The foundation of our proposed approach to improving the quantity and quality of
citizen science projects is the IBDM [8]. The primary theoretical assumption of the
IBDM that existence of things and properties (attributes) precedes classification -
has generally [cf. 25] been supported in ontological [26,27] and cognitive research
[28]. However, while attributes are building blocks of classification [29], not all
classes can be efficiently expressed as sets of common attributes (e.g., radial
categories [30,31,32]). Moreover, many superordinate categories, such as furniture,
animal, vehicles tend to be abstract and reflect some rules or functions rather than
observable attributes [33-34]. While this appears to limit the scope of our model, we
believe that for practical reasons little information in citizen science projects will be
expressed in terms of higher-level categories. Indeed, humans prefer to avoid
superordinate categories when they think of individual objects [35].
Classification is a ubiquitous activity and an attribute-centered approach to
knowledge management needs to be tested to determine its technological, economic,
scientific and business utility. We are currently designing empirical studies to
measure the practical impact of the above approach on data collection and storage,
user participation and satisfaction, data quality, and usefulness to scientists. The
experiment will also test the overall effectiveness and feasibility of applying the
IBDM to empower citizen scientists.
References
1. Silvertown, J.: A new dawn for citizen science. Trends in Ecology and Evolution 24 467-
471 (2009)
2. Goodchild, M.: Citizens as sensors: the world of volunteered geography. GeoJournal 69
211-221 (2007)
3. Hand, E.: People power. Nature 466 685-687 (2010)
4. Foster-Smith, J., Evans, S.M.: The value of marine ecological data collected by volunteers.
Biological Conservation 113 199-213 (2003)
5. Flanagin, A., Metzger, M.: The credibility of volunteered geographic information.
GeoJournal 72 137-148 (2008)
6. Wiersma, Y.F.: Birding 2.0: citizen science and effective monitoring in the Web 2.0 world.
Avian Conservation and Ecology 5 13 (2010)
7. Coleman, D.J., Georgiadou, Y., Labonte, J.: Volunteered geographic information: The
nature and motivation of producers. International Journal of Spatial Data Infrastructures
Research 4 332-358 (2009)
8. Parsons, J., Wand, Y.: Emancipating instances from the tyranny of classes in information
modeling. ACM Transactions on Database Systems 25 228 (2000)
9. Parsons, J., Lukyanenko, R., Wiersma, Y.: Easier citizen science is better. Nature 471 37
(2011)
10. Dickinson, J.L., Zuckerberg, B., Bonter, D.N.: Citizen science as an ecological research
tool: challenges and benefits. Annual Review of Ecology, Evolution, and Systematics 41
112-149 (2010)
11. Aaron, W.E.G., Tudor, M.T., Haegen, W.M.V.: The Reliability of Citizen Science: A Case
Study of Oregon White Oak Stand Surveys. Wildlife Society Bulletin 34 1425-1429 (2006)
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
12. Hamel, N.J., Burger, A.E., Charleton, K., Davidson, P., Lee, S., Bertram, D.F., Parrish,
J.K.: Bycatch and beached birds: Assessing mortality impacts in coastal net fisheries using
marine bird strandings. Marine Ornithology (2009)
13. Bishr, M., Mantelas, L.: A trust and reputation model for filtering and classifying
knowledge about urban growth. GeoJournal 72 229-237 (2008)
14. Silvertown, J.: Taxonomy: include social networking. Nature 467 788-788 (2010)
15. Komiak, S.Y.X., Benbasat, I.: The effects of personalization and familiarity on trust and
adoption of recommendation agents. MIS Quarterly 30 941-960 (2006)
16. Gefen, D., Karahanna, E., Straub, D.W.: Trust and TAM in online shopping: An integrated
model. MIS Quarterly 27 51-90 (2003)
17. Palvia, P.: The role of trust in e-commerce relational exchange: A unified model.
Information & Management 46 213-220 (2009)
18. Alabri, A., Hunter, J.: Enhancing the quality and trust of citizen science data. IEEE
eScience 2010, Brisbane, Australia (2010)
19. Sullivan, B.L., Wood, C.L., Iliff, M.J., Bonney, R.E., Fink, D., Kelling, S.: eBird: A
citizen-based bird observation network in the biological sciences. Biological Conservation
142 2282-2292 (2009)
20. Parsons, J., Su, J.: Analysis of data structures to support the instance-based data model.
International Conference on Design Science Research in Information Systems and
Technology (DESRIST) (2006) 107-130
21. Parsons, J., Wand, Y.: A question of class. Nature 455 1040-1041 (2008)
22. Allen, B.R., Boynton, A.C.: Information architecture: In search of efficient flexibility. MIS
Quarterly 15 435-445 (1991)
23. Agarwal, R., Angst, C.M.: Technology-Enabled Transformations in U.S. Health Care:
Early Findings on Personal Health Records and Individual Use. In: Galletta, D., Zhang, P.
(eds.): Human-Computer Interaction and Management Information Systems: Applications.
M.E. Sharpe, Inc, Armonk, NY (2006)
24. Nicolaou, A.I., McKnight, D.H.: Perceived information quality in data exchanges: Effects
on risk, trust, and intention to use. Information Systems Research 17 332-351 (2006)
25. Grill-Spector, K., Kanwisher, N.: Visual recognition. Psychological Science 16 152-160
(2005)
26. Bunge, M.A.: Treatise on Basic Philosophy: The furniture of the world. Reidel, Dordrecht ;
Boston (1977)
27. Wand, Y., Weber, R.: An ontological model of an information system. IEEE Transactions
on Software Engineering 16 1282-1292 (1990)
28. Bowers, J.S., Jones, K.W.: Detecting objects is easier than categorizing them. Quarterly
Journal of Experimental Psychology 61 552-557 (2008)
29. Wand, Y., Monarchi, D.E., Parsons, J., Woo, C.C.: Theoretical foundations for conceptual
modeling in information systems development. Decision Support Systems 15 285-304
(1995)
30. Raccoon, L.S.B., Puppydog, P.O.P.: A middle-out concept of hierarchy (or the problem of
feeding the animals). SIGSOFT Softw. Eng. Notes 23 111-119 (1998)
31. Young, J.J., Williams, P.F.: Sorting and comparing: Standard-setting and "ethical"
categories. Critical Perspectives on Accounting 21 509-521 (2010)
32. Lakoff, G.: Women, fire, and dangerous things : what categories reveal about the mind.
University of Chicago Press, Chicago (1987)
33. Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M., Boyesbraem, P.: Basic objects in
natural categories. Cognitive Psychology 8 382-439 (1976)
34. Murphy, G.L., Wisniewski, E.J.: Categorizing objects in isolation and in scenes - What a
superordinate is good for. Journal of Experimental Psychology: Learning 15 572-586
(1989)
Published under License Creative Commons Attribution Non-commercial. Citation: Lukyanenko
R, J Parsons, YF Wiersma. 2011. Citizen science 2.0: data management principles to harness
the power of the crowd. Lecture Notes in Computer Science 6629:465-473.
35. Rorissa, A.: User-generated descriptions of individual images versus labels of groups of
images: A comparison using basic level theory. Information Processing & Management 44
1741-1753 (2008)
... Improving data quality based on the data model is not a new idea [6,32,46,59]. This research aims to improve data and information quality by integrating quality characteristics into the citizen science platform's design, mainly focusing on the data model and user interface. ...
... Some researchers have investigated how citizen science platforms' data quality can be increased by training citizens [62], using reputation models [63], and using attribute filtering methods for data input [59]. These are excellent choices for increasing the quality of data and information, but they require more from citizens than making changes to the platform would. ...
... Some people trust data from citizen science platforms less than other sources because citizens are considered to be non-professionals who provide inaccurate data [6,22,64,65]. However, this is not necessarily true, and even if it is, there are methods to increase the quality of data on the platform [17,32,59,66]. ...
Article
Full-text available
The quality of the user-generated content of citizen science platforms has been discussed widely among researchers. Content is categorized into data and information: data is content stored in a database of a citizen science platform, while information is context-dependent content generated by users. Understanding data and information quality characteristics and utilizing them during design improves citizen science platforms’ overall quality. This research investigates the integration of data and information quality characteristics into a citizen science platform for collecting information from the general public with no scientific training in the area where content is collected. The primary goal is to provide a framework for selecting and integrating data and information quality characteristics into the design for improving the content quality on platforms. The design and implementation of a citizen science platform that collects walking path conditions are presented, and the resulting implication is evaluated. The results show that the platform’s content quality can be improved by introducing quality characteristics during the design stage of the citizen science platform.
... Facebook likes), medical health records, or IoT data, the amount collected outpaces the available processing power to analyse it [13], possibly resulting in sampling biases. Also, such big data approaches often bring the danger of collapsing the complexity of entire human personalities into assumptions constructed from simple data (clicks on a website) and usually misses the unique domain knowledge users have [14]. Thus, it has been suggested that information providers should structure their input in a contextualized way when sharing their data, utilising semantic web technologies 1 [15,16,17], and evaluate the quality of information shared by other providers [18]. ...
Preprint
Cryptoeconomic incentives in the form of blockchain-based tokens are seen as an enabler of the sharing economy which could shift society towards greater sustainability. Nevertheless, knowledge about the impact of those tokens on human sharing behavior is still limited, which challenges the design of effective cryptoeconomic incentives. This study applies the theory of self-determination to investigate the impact of those tokens on human behavior in an information sharing scenario. By utilising an experimental methodology in the form of a randomized control trial with a 2x2 factorial design involving 132 participants, the effects of two token incentives on human information sharing behavior are analysed. Individuals obtain these tokens in exchange for their shared information. Based on the collected tokens, individuals receive a monetary payment and build reputation. Besides investigating the effect of these incentives on the quantity of shared information, the study includes quality characteristics of information, such as accuracy and contextualisation. The focus on quantity while excluding quality has been identified as a limitation in previous work. Besides confirming previously known effects such as a crowding out of intrinsic motivation by incentives which also exists for blockchain-based tokens, the findings of this work show a until now unreported interaction effect between multiple tokens when applied simultaneously. The findings are critically discussed and put into context of recent work and ethical considerations. The theory-based, empirical study is of interest to those investigating the effect of cryptoeconomic tokens or digital currencies on human behavior and supports the community to design effective personalized incentives for sharing economies.
... To further our understanding of their impact and to better comprehend environmental chemicals in general, additional information detailing their use type and associated chemical classes is urgently needed. In this context, substance databases (DBs) play an essential role as information sources for research, governmental institutions, regulation, citizens' science, and companies alike ( [3]; see also Appendix A Database Compendium References). To date, an unintelligibly broad range of DBs is publicly available (see Appendix A Database Compendium References). ...
Article
Full-text available
With an ever-increasing production and registration of chemical substances, obtaining reliable and up to date information on their use types (UT) and chemical class (CC) is of crucial importance. We evaluated the current status of open access chemical substance databases (DBs) regarding UT and CC information using the “Meta-analysis of the Global Impact of Chemicals” (MAGIC) graph as a benchmark. A decision tree-based selection process was used to choose the most suitable out of 96 databases. To compare the DB content for 100 weighted, randomly selected chemical substances, an extensive quantitative and qualitative analysis was performed. It was found that four DBs yielded more qualitative and quantitative UT and CC results than the current MAGIC graph: The European Bioinformatics Institute DB, ChemSpider, the English Wikipedia page, and the National Center for Biotechnology Information (NCBI). The NCBI, along with its subsidiary DBs PubChem and Medical Subject Headings (MeSH), showed the best performance according to the defined criteria. To analyse large datasets, harmonisation of the available information might be beneficial, as the available DBs mostly aggregate information without harmonising them.
... The case study of Roy et al. (2012) found that the environmental/biodiversity fields are more in demand. Lukyanenko, Parsons, and Wiersma (2011) studied citizen-generated data on the distribution of plants and animals to improve the quantity and quality of data with data management principles. The study pointed out that it is necessary to pay attention to data management to ensure the data quality when ordinary people use the information system more to collect data. ...
Article
Full-text available
There has been considerable growth in citizen science in academic contributions—researches by the paradigms of different disciplines and by the activities of citizens when undertaking data collecting, data processing, and data analyzing for disseminating results. These researches have proved the importance of data management practices—urgent to carry out the data life cycle. This study aims to analyze the scientific data contribution of citizen science under the data life cycle approach. It investigates 1,020 citizen science projects within the DataONE life cycle framework, which includes data management plan, data collection, data quality assurance, data documentation, data discovery, data integration, data preservation, and data analysis. As the major finding, the result of this study shows that the data management plan is developed with the leading of universities, which are the host of the majority of citizen science projects. The processes of data collection, data quality assurance, data documentation, data preservation, and data analysis are well organized with the systematic tool in the Information and Communications Technology (ICT) age; meanwhile the citizen science projects are cumulative. Data discovery has mostly linked with SciStarter (citizen science community site) and Facebook (social media). In data integration, it is found that most of the projects integrate with global observation. Finally, the study provides the process and procedure of citizen science data management in an effort to contribute the scientific data and the design of data life cycle to academic and governmental works.
... Despite community initiatives lying in the target domain of ONSNs, as of yet, these platforms do not offer any specific tool support for their ideation, organization or implementation beyond generic communication capabilities . In a local and community context, crowdsourcing has presented itself as a suitable approach for mobilizing a local group of individuals, for example in case of participative urban design (Mueller et al., 2018), urban planning (Seltzer and Mahmoudi, 2012) or citizen science (Lukyanenko et al., 2011). Defined as the outsourcing of a task previously performed by a designated agent to a large group of individuals via an open call (Howe, 2006), crowdsourcing has been demonstrated to be able to produce innovative ideas and to enable collaborative problem-solving (Hammon and Hippner, 2012). ...
Conference Paper
Full-text available
The social connectedness of a community, characterized by aspects such as social support, social trust and civic engagement, plays an important role in determining the well-being of its inhabitants. Neighborhood activism and volunteering through community initiatives can improve this social connected-ness. Online neighborhood social networks (ONSNs) afford users functionality for social interaction, information sharing as well as peer-support and aim to improve community connectedness with platforms such as Nextdoor exhibiting rapid growth in recent years. However, as of yet, ONSNs do not provide specific tool support for implementing community initiatives beyond generic communication capabilities. We propose crowdsourcing as a suitable approach for mobilizing neighbors to ideate, participate in and collaboratively implement community initiatives on ONSNs. Using a design science research approach, we develop design goals and design principles for crowd-sourced community initiatives based on literature and empirical data from two case neighborhoods. We instantiate these design principles into a proof-of-concept artifact in the context of an existing ONSN. Based on our evaluation, we derive implications for establishing crowd-sourced community initiatives on ONSNs. We contribute to research on crowdsourcing and ONSNs with nascent design knowledge which guides researchers and practitioners in designing crowd-based artifacts in the context of local communities.
Article
Full-text available
Official data are not sufficient for monitoring the United Nations Sustainable Development Goals (SDGs): they do not reach remote locations or marginalized populations and can be manipulated by governments. Citizen science data (CSD), defined as data that citizens voluntarily gather by employing a wide range of technologies and methodologies, could help to tackle these problems and ultimately improve SDG monitoring. However, the link between CSD and the SDGs is still understudied. This article aims to develop an empirical understanding of the CSD-SDG link by focusing on the perspective of projects which employ CSD. Specifically, the article presents primary and secondary qualitative data collected on 30 of these projects and an explorative comparative case study analysis. It finds that projects which use CSD recognize that the SDGs can provide a valuable framework and legitimacy, as well as attract funding, visibility, and partnerships. But, at the same time, the article reveals that these projects also encounter several barriers with respect to the SDGs: a widespread lack of knowledge of the goals, combined with frustration and political resistance towards the UN, may deter these projects from contributing their data to the SDG monitoring apparatus. Supplementary information: The online version contains supplementary material available at 10.1007/s11625-021-01001-1.
Chapter
This chapter summarized state-of-the-art data sources and sourcing methods of agro-geoinformatics. The data mainly comes from four sources: satellite, airborne, and in-situ sensors, and human reports. Overall, the satellite datasets have the best spatial and temporal coverages. The airborne and in-situ datasets are mostly project-specific or site-specific. Human reports provide brief descriptions using concise terms and numbers to answer basic questions. The data from various sources are often overlapped spatially, temporally, spectrally, and/or thematically and can be combined to obtain comprehensive understanding of the crop fields. Data sourcing also has three major options: conventional, cloud-based, and crowdsourcing. Conventional sourcing depends on human surveyors, is often labor-intensive, and has very tedious administrative processes. Cloud based approach simplifies the collection and distribution of big amount of collected data. The cutting-edge crowdsourcing approach largely lowers the cost of data gathering and retrieval. The future development is towards Internet-based, mobile friendly, big data, low-cost, robustness, and high-performance data distribution.
Chapter
Full-text available
Crowdsourcing promises to expand organizational knowledge and “sensor” networks dramatically, making it possible to engage ordinary people in large-scale data collection, often at much lower cost than that of traditional approaches to gathering data. A major challenge in crowdsourcing is ensuring that the data that crowds provide is of sufficient quality to be usable in organizational decision-making and analysis. We refer to this challenge as the Problem of Crowd Information Quality (Crowd IQ). We need to increase quality while giving contributors the flexibility to contribute data based on their individual perceptions. The design science research project produced several artifacts, including a citizen science information system (NLNature), design principles (guidelines) for the development of crowdsourcing projects, and an instance-based crowdsourcing design theory. We also made several methodological contributions related to the process of design science research and behavioral research in information systems. Over the course of the project, we addressed several challenges in designing crowdsourcing systems, formulating design principles, and conducting rigorous design science research. Specifically, we showed that: design choices can have a sizable impact in the real world; it can be unclear how to implement design principles; and design features that are unrelated to design principles can confound efforts to evaluate artifacts. During the project, we also experienced challenges for which no adequate solution was found, reaffirming that design is an iterative process.
Chapter
This chapter discusses the design and deployment of a citizen science application to inventory iconic medicinal non-timber forest products in the wild, such as black cohosh, ramps, and Bloodroot. The application is called PlantShoe (a pun on ‘Gumshoe’) and is used on mobile devices to collect data in the field about forest medicinal plants and their growing conditions. The users’ data is fed into a database, which they can manage, study, and share. Plantshoe data is a part of a larger regional community and consortium which is collecting information about the ecology and distribution of medicinal forest plants. Such analyses can help forest farmers and wild stewards in their processes of site selection and management of these valuable botanicals. We describe our usability engineering in the development of the PlantShoe application and enumerate key design tradeoffs we encountered. Thus, the design decisions and results of PlantShoe provide rich material for the design of future technology on the trail.
Article
Dominant forms of contemporary big-data based digital citizen science do not question the institutional divide between qualified experts and lay-persons. In our paper, we turn to the historical case of a large-scale amateur project on biogeographical birdwatching in the late nineteenth and early twentieth century to show that networked amateur research (that produces a large set of data) can operate in a more autonomous mode. This mode depends on certain cultural values, the constitution of specific knowledge objects, and the design of self-governed infrastructures. We conclude by arguing that the contemporary quest for autonomous citizen science is part of a broader discourse on the autonomy of scientific research in general. Just as the actors in our historical case positioned themselves against the elitism of gentlemen scientists, avant-garde groups of the twenty first century like biohackers and civic tech enthusiasts position themselves against the system of professional science—while “digital citizen science” remains to oscillate between claims for autonomy and realities of heteronomy, constantly reaffirming the classic lay-expert divide.
Article
Full-text available
In most of the world's coastal fisheries, bycatch of marine birds is rarely monitored, and thus the impact on populations is poorly known. We used marine bird strandings to assess the impact of entanglement in Pacific Northwest coastal net salmon fisheries. We compared the magnitude and species composition of fisheries-associated strandings (FAS) to baseline data collected at beaches monitored by citizen-science programs in Washington State and British Columbia, and to seabirds salvaged from gillnets during observer programs. Carcass encounter rates were 16.4 carcasses/km [95% confidence interval (CI): 11.2 to 21.7] for FAS and 1.00 carcasses/km (95% CI: 0.87 to 1.14) for baseline data. Declines in fisheries effort were associated with decreasing FAS, although declines in at-sea seabird abundance may also be at play. Common Murres Uria aalge comprised most of the carcasses in both the FAS (86%) and bycatch studies (71%). Although the total count of murre FAS represented a small fraction (1.3%-6.6%) of baseline mortality accumulated for the Salish Sea over the same period, murre FAS added 0.2%-2.9% to annual mortality rates. Considering the effects of other natural and anthropogenic mortality agents on murres in the region, this species might benefit from further protection. Given the complexity of salmon fisheries management and the ubiquitous distribution of seabirds in the Salish Sea, we recommend the comprehensive adoption of gillnet gear modification to reduce seabird bycatch, a solution that may prove to be beneficial for the vitality of seabird populations and of the fishing industry.
Article
Full-text available
The amateur birding community has a long and proud tradition of contributing to bird surveys and bird atlases. Coordinated activities such as Breeding Bird Atlases and the Christmas Bird Count are examples "of citizen" science projects. With the advent of technology, Web 2.0 sites such as eBird have been developed to facilitate online sharing of data and thus increase the potential for real-time monitoring. However, as recently articulated in an editorial in this journal and elsewhere, monitoring is best served when based on a priori hypotheses. Harnessing citizen scientists to collect data following a hypotheticodeductive approach carries challenges. Moreover, the use of citizen science in scientific and monitoring studies has raised issues of data accuracy and quality. These issues are compounded when data collection moves into the Web 2.0 world. An examination of the literature from social geography on the concept of "citizen" sensors and volunteered geographic information (VGI) yields thoughtful reflections on the challenges of data quality/data accuracy when applying information from citizen sensors to research and management questions. VGI has been harnessed in a number of contexts, including for environmental and ecological monitoring activities. Here, I argue that conceptualizing a monitoring project as an experiment following the scientific method can further contribute to the use of VGI. I show how principles of experimental design can be applied to monitoring projects to better control for data quality of VGI. This includes suggestions for how citizen sensors can be harnessed to address issues of experimental controls and how to design monitoring projects to increase randomization and replication of sampled data, hence increasing scientific reliability and statistical power.
Article
Full-text available
New technologies are rapidly changing the way we collect, archive, analyze, and share scientific data. For example, over the next several years it is estimated that more than one billion autonomous sensors will be deployed over large spatial and temporal scales, and will gather vast quantities of data. Networks of human observers play a major role in gathering scientific data, and whether in astronomy, meteorology, or observations of nature, they continue to contribute significantly. In this paper we present an innovative use of the Internet and information technologies that better enhances the opportunity for citizens to contribute their observations to science and the conservation of bird populations. eBird is building a web-enabled community of bird watchers who collect, manage, and store their observations in a globally accessible unified database. Through its development as a tool that addresses the needs of the birding community, eBird sustains and grows participation. Birders, scientists, and conservationists are using eBird data worldwide to better understand avian biological patterns and the environmental and anthropogenic factors that influence them. Developing and shaping this network over time, eBird has created a near real-time avian data resource producing millions of observations per year.
Article
In this Introduction' we shall sketch the business of ontology, or metaphysics, and shall locate it on the map of learning. This has to be done because there are many ways of construing the word 'ontology' and because of the bad reputation metaphysics has suffered until recently - a well deserved one in most cases. 1. ONTOLOGICAL PROBLEMS Ontological (or metaphysical) views are answers to ontological ques­ tions. And ontological (or metaphysical) questions are questions with an extremely wide scope, such as 'Is the world material or ideal - or perhaps neutral?" 'Is there radical novelty, and if so how does it come about?', 'Is there objective chance or just an appearance of such due to human ignorance?', 'How is the mental related to the physical?', 'Is a community anything but the set of its members?', and 'Are there laws of history?'. Just as religion was born from helplessness, ideology from conflict, and technology from the need to master the environment, so metaphysics - just like theoretical science - was probably begotten by the awe and bewilderment at the boundless variety and apparent chaos of the phenomenal world, i. e. the sum total of human experience. Like the scientist, the metaphysician looked and looks for unity in diversity, for pattern in disorder, for structure in the amorphous heap of phenomena - and in some cases even for some sense, direction or finality in reality as a whole.
Article
Conceptual modelling in information systems development is the creation of an enterprise model for the purpose of designing the information system. It is an important aspect of systems analysis. The value of a conceptual modelling language (CML) lies in its ability to capture the relevant knowledge about a domain. To determine which constructs should be included in a CML it would be beneficial to use some theoretical guidelines. However, this is usually not done. The purpose of this paper is to promote the idea that theories related to human knowledge can be used as foundations for conceptual modelling in systems development. We suggest the use of ontology, concept theory, and speech act theory. These approaches were chosen because: (1) they deal with important and different aspects relevant to conceptual modelling and (2) they have already been used in the context of systems analysis. For each approach we discuss: the rationale for its use, its principles, its application to conceptual modelling, and its limitations. We also demonstrate the concepts of the three approaches by analysing an example. The analysis also serves to show how each approach deals with different aspects of modelling.
Article
In the context of personalization technologies, such as Web-based product-brokering recommendation agents (RAs) in electronic commerce, existing technology acceptance theories need to be expanded to take into account not only the cognitive beliefs leading to adoption behavior, but also the affect elicited by the personalized nature of the technology. This study takes a trust-centered, cognitive and emotional balanced perspective to study RA adoption. Grounded on the theory of reasoned action, the IT adoption literature, and the trust literature, this study theoretically articulates and empirically examines the effects of perceived personalization and familiarity on cognitive trust and emotional trust in an RA, and the impact of cognitive trust and emotional trust on the intention to adopt the RA either as a decision aid or as a delegated agent. An experiment was conducted using two commercial RAs. PLS analysis results provide empirical support for the proposed theoretical perspective. Perceived personalization significantly increases customers' intention to adopt by increasing cognitive trust and emotional trust. Emotional trust plays an important role beyond cognitive trust in determining customers' intention to adopt. Emotional trust fully mediates the impact of cognitive trust on the intention to adopt the RA as a delegated agent, while it only partially mediates the impact of cognitive trust on the intention to adopt the RA as a decision aid. Familiarity increases the intention to adopt through cognitive trust and emotional trust.
Article
The instance-based data model has been proposed as a way of conceptualizing data that alleviates many problems associated with existing class-based data models, such as relational and object-oriented approaches. In this paper, we review the instance-based approach and propose alternative data structures to support its implementation. We analyze the complexity of operations under three of these structures and identify the suitability of different structures under different query and update regimes. We also briefly discuss two implementations of this model using two data structures, and outline the results of an empirical test of the performance of each system under a variety of query and update operations.
Article
The Financial Accounting Standards Board (FASB) describes its public interest function as “…developing standards that result in accounting for similar transactions and circumstances in a like manner and different transactions and circumstances…in a different manner (Facts about FASB).” This statement implies that rule-makers possess an expertise that makes analogizing transactions or circumstances to other transactions or circumstances unproblematic. In this paper we utilize two instances of standard-setting, SFAS 123R and SFAS 143, to demonstrate from FASB's analogic reasoning in these cases that similarity and dissimilarity are not so easily ascertained. A judgment about similarity invariably involves ignoring some perspectives of similarity that would lead to substantially different conclusions about the appropriate accounting. We also illustrate via the two examples the inherent value judgments that underlie the conclusions reached by FASB and how these value judgments raise questions about the ethics of the current standard-setting process.