Article

Factors Influencing the Performance of Human Computation: An Empirical Study in Web Application Testing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Human computation tackles many difficult-to-automate problems. However, the involvement of human solvers in computation process leads to unstable performance, especially when solvers recruited via Internet. By considering human's attributes and computational situation, the research model of factors of affecting the effectiveness and efficiency of human computation algorithm is proposed in this paper, together with seven proxies of human's attributes and computational situation. For testing the proposed model, we collect data from an implemented Web application testing system using a crowd-sourcing approach. There are 87 volunteers recruited via social network and the corresponding user-session data. The results show that more human solvers with lower ability level, higher engagement, more personal bias and lower difficulty level of tasks lead to shorter completion time. Human solvers with higher ability level, higher engagement, more personal bias and lower difficulty level of tasks lead to more accurate output, but number of solvers does not show significant correlation with the correctness of human computation algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
As a new distributed computing model, crowdsourcing lets people leverage the crowd's intelligence and wisdom toward solving problems. This article proposes a framework for characterizing various dimensions of quality control in crowdsourcing systems, a critical issue. The authors briefly review existing quality-control approaches, identify open issues, and look to future research directions. In the Web extra, the authors discuss both design-time and runtime approaches in more detail.
Article
Full-text available
New technologies are rapidly changing the way we collect, archive, analyze, and share scientific data. For example, over the next several years it is estimated that more than one billion autonomous sensors will be deployed over large spatial and temporal scales, and will gather vast quantities of data. Networks of human observers play a major role in gathering scientific data, and whether in astronomy, meteorology, or observations of nature, they continue to contribute significantly. In this paper we present an innovative use of the Internet and information technologies that better enhances the opportunity for citizens to contribute their observations to science and the conservation of bird populations. eBird is building a web-enabled community of bird watchers who collect, manage, and store their observations in a globally accessible unified database. Through its development as a tool that addresses the needs of the birding community, eBird sustains and grows participation. Birders, scientists, and conservationists are using eBird data worldwide to better understand avian biological patterns and the environmental and anthropogenic factors that influence them. Developing and shaping this network over time, eBird has created a near real-time avian data resource producing millions of observations per year.
Article
Full-text available
Online labor marketplaces offer the potential to automate tasks too difficult for computers, but don't always provide accurate results. MobileWorks is a crowd platform that departs from the marketplace model to provide robust, high-accuracy results using three new techniques. A dynamic work-routing system identifies expertise in the crowd and ensures that posted work completes within a bounded time with fair wages. A peer-management system helps prevent wrong answers. Last, social interaction techniques let best workers manage and teach other crowd members. This process allows the crowd to collaboratively learn how to solve new tasks.
Article
Full-text available
Geospatial tagging (geotagging) is an emerging and very promising application that can help users find a wide variety of location-specific information, and thereby facili-tate the development of advanced location-based services. Conventional geotagging systems share some limitations, such as the use of a two-phase operating model and the tendency to tag popular objects with simple contexts. To address these problems, a number of geotagging systems based on the concept of 'Games with a Purpose' (GWAP) have been developed re-cently. In this study, we use analysis to investigate these new systems. Based on our analysis results, we design three metrics to evaluate the system performance, and develop five task assignment algorithms for GWAP-based systems. Using a comprehensive set of simulations under both synthetic and realistic mobility scenarios, we find that the Least-Throughput-First Assignment algorithm (LTFA) is the most effective approach because it can achieve competitive system utility, while its computational complexity remains moderate. We also find that, to improve the system utility, it is better to assign as many tasks as possible in each round. However, because players may feel annoyed if too many tasks are assigned at the same time, it is recommended that multiple tasks be assigned one by one in each round in order to achieve higher system utility., we have refined our analysis in modeling GWAP-based geospatial tag-ging systems, re-evaluated the five task assignment algorithms based on the new analytical model, and in-cluded a more comprehensive set of evaluations with different numbers of tasks per assignment and different buffer sizes per LOI. Moreover, we have updated the literature review of this study, and incorporated all the comments/suggestions of the conference attendees. Hence, this manuscript is a much more thorough and authoritative presentation of our study on GWAP-based geospatial tagging systems.
Article
Full-text available
Within the last few years, researchers have shown a renewed interest in "interest". Especially in the field of educational psychology many studies have been conducted to analyze how learning and achievement are influenced by motivational and cognitive factors, which are connected with individual and/or situational interests. In this paper, results from empirical research will be presented besides theoretical considerations concerning the interest-construct. Interest has typically been studied as an independent variable. Dependent variables have been either some aspects of learning outcome (knowledge structure, academic achievement) or hypothetical mediators, which probably can be used to explain the interest effects (e.g., learning strategies, attention, emotional experiences). There is also a growing number of studies which try to explore the conditions of interest development within educational settings. Future lines of research will be discussed in light of the demands of educational theory and practice.
Article
Full-text available
We conducted a field study to test the applicability of the job characteristics model (JCM) in volunteer organizations and examine the impact of job characteristics on volunteer motivation, satisfaction and intent to quit, as well as test a measure of volunteer performance. One hundred and twenty-four volunteers completed measures of job characteristics, motivation, satisfaction, and intent to quit. Supervisors rated volunteer task performance and organizational citizenship behaviors (OCB). Results showed that job characteristics were related to volunteers’ autonomous motivation, satisfaction and performance. Autonomous motivation acted as a mediator in the relationship between job characteristics and satisfaction. The theoretical and practical implications of these findings are discussed.
Conference Paper
Full-text available
Distributing labeling tasks among hundreds or thousands of annotators is an in- creasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) an- notations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both syn- thetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different "schools of thought" amongst the annotators, and can group together images belonging to separate categories.
Article
Full-text available
Given the judgments of multiple voters regarding some issue, it is generally assumed that the best way to arrive at some collective judgment is by following the majority. We consider here the now common case in which each voter expresses some (binary) judgment regarding each of a multiplicity of independent issues and assume that each voter has some fixed (unknown) probability of making a correct judgment for any given issue. We leverage the fact that multiple votes by each voter are known in order to demonstrate, both analytically and empirically, that a method based on maximum likelihood estimation is superior to the simple majority rule for arriving at true collective judgments.
Article
The internal program representation chosen for a software development environment plays a critical role in the nature of that environment. A form should facilitate implementation and contribute to the responsiveness of the environment to the user. The program dependence graph (PDG) may be a suitable internal form. It allows programs to be sliced in linear time for debugging and for use by language-directed editors. The slices obtained are more accurate than those obtained with existing methods because I/O is accounted for correctly and irrelevant statements on multi-statement lines are not displayed. The PDG may be interpreted in a data driven fashion or may have highly optimized (including vectorized) code produced from it. It is amenable to incremental data flow analysis, improving response time to the user in an interactive environment and facilitating debugging through data flow anomaly detection. It may also offer a good basis for software complexity metrics, adding to the completeness of an environment based on it.
Article
The ESP game belongs to the genre called Games With A Purpose (GWAP), which leverage people's desire to be entertained and also outsource certain steps of the computational process to humans. The games have shown promise in solving a variety of problems, which computer computation has been unable to resolve completely thus far. In this study, we consider generalized ESP games with two objectives. First, we propose an analytical model for computing the utility of generalized ESP games, where the number of players, the consensus threshold, and the stopping condition are variable. We show that our model can accurately predict the stopping condition that will yield the optimal utility of a generalized ESP game under a specific game setting. A service provider can therefore utilize the model to ensure that the hosted generalized ESP games produce high-quality labels efficiently. Second, we propose a metric, called system gain, for evaluating the performance of ESP-like GWAP systems, and also use analysis to study the properties of generalized ESP games. We believe that GWAP systems should be designed and played with strategies. To this end, we implement an optimal puzzle selection strategy (OPSA) based on our analysis. Using a comprehensive set of simulations, we demonstrate that the proposed OPSA approach can effectively improve the system gain of generalized ESP games, as long as the number of puzzles in the system is sufficiently large.
Article
Although microtask platforms are desirable for their speed, scalability, and low cost, task performance varies greatly. Many researchers have focused on improving the quality of the work performed on such platforms. Priming uses implicit mechanisms to induce observable changes in behavior. Although priming has been effective in the laboratory, its use hasn't been explored extensively in software design, perhaps because the effects are often short-lived. In the context of microtask crowdsourcing environments, however, where tasks are short and circumscribed, temporary priming effects can lead to significant performance gains.
Article
Crowdsourcing in the form of human-based electronic services (people services) provides a powerful way of outsourcing tasks to a large crowd of remote workers over the Internet. Research has shown that multiple redundant results delivered by different workers can be aggregated in order to achieve a reliable result. However, basic implementations of this approach are rather inefficient as they multiply the effort for task execution and are not able to guarantee a certain quality level. In this paper, we are addressing these chal- lenges by elaborating on a statistical approach for quality management of people services which we had previously proposed. The approach combines elements of statistical qual- ity management with dynamic group decisions. We present a comprehensive statistical model that enhances our original work and makes it more transparent. We also provide an extendible toolkit that implements our model and facilitates its application to real- time experiments as well as to simulations. A quantitative analysis based on an optical character recognition (OCR) scenario confirms the efficiency and reach of our model.
Article
The objectives of this paper are to explore the effect of leadership style and job characteristics on job performance and examine the mediating effect of organization commitment on the leadership style, the job characteristics and the job performance as well as to provide the management suggestions according the research findings. The questionnaires are applied to the accountants of county/city government in Taiwan, and the convenience sampling is adopted. The following research findings are described after conducting the statistical analyses. Firstly, this research finds that the more idealized influence for the county/city mayor is, the stronger ability of problem solving for the associated accountants are. In addition, the better management by exception conducted by the mayor, the more ability of problem solving for accountants exists, and therefore the higher passion of innovation happens. Secondly, this research also reveals that the more job autonomy, job importance, and job diversity happen, the stronger ability of problem solving for the accountants are. Besides, the higher job autonomy for accountants exists, the higher passion of innovation happens. The mediating effect of organization commitment was found between transformational leadership and job performance. It indicates that organization commitment would be a mediator between transactional leadership and job performance.
Article
Crowdsourcing comprises a variety of creative contests, and its success is closely related to the quantity and quality of solvers. The research model of factors influencing the quantity and quality of solvers with respect to contest arrangement attributes and market competition situation has been developed in this paper, and the model has been tested with data from a crowdsourcing website in China. The results show that higher awards, easier tasks, longer duration and lower competition intensity lead to a higher number of solvers. Higher awards, longer duration and higher difficulty level of tasks lead to higher ability level of winners, but competition intensity and market price for other competing projects do not show significant correlation with the ability level of winners.
Article
This paper studies differences in job satisfaction and intrinsic work motivation between employees with different characteristics. Based on a study of the literature assumptions regarding these differences are developed and tested on data from a survey in the Nordic countries. In this survey 9,623 employees from randomly selected households in the Nordic countries participated. Among the findings are that Danish workers were found to be the most satisfied and that there is no difference between the genders with respect to job satisfaction in the Nordic countries.
Article
The movement of workers to act in a desired manner has always consumed the thoughts of managers. In many ways, this goal has been reached through incentive programs, corporate pep talks, and other types of conditional administrative policy, However, as the workers adjust their behaviour in response to one of the aforementioned stimuli, is job satisfaction actualized? The instilling of satisfaction within workers is a crucial task of management. Satisfaction creates confidence, loyalty and ultimately improved quality in the output of the employed. Satisfaction, though, is not the simple result of an incentive program. Employees will most likely not take any more pride in their work even if they win the weekend getaway for having the highest sales. This paper reviews the literature of motivational theorists and draws from their approaches to job satisfaction and the role of motivation within job satisfaction. The theories of Frederick Herzberg and Edwin Locke are presented chronologically to show how Locke’s theory was a response to Herzberg’s theory. By understanding these theories, managers can focus on strategies of creating job satisfaction. This is followed by a brief examination of Kenneth Blanchard and Paul Hersey’s theory on leadership within management and how this art is changing through time.
Article
This paper considers factors that influenced long-term engagement in an online community. It draws on a case study of a 6year online collaboration amongst a group of European teachers. The email interchange between these teachers was, with their agreement, saved and used as research data; intensive group interviews were also conducted, involving all the participants in the online group. Dynamic motivation models and Csikszentmihalyi’s concept of motivational flow were used to create a theoretical framework to analyse the email interchanges and group interview data so as to build an understanding of the teachers’ commitment to the online community. As a result of the analysis, the existing theoretical models were modified to produce a new combined model. This incorporates ideas grounded in the participants’ reasons for their sustained engagement in the collaboration. This combined model could provide a basis for planning future virtual interactions and for assessing their sustainability.
Article
Workflow design is often an effort of distributed and heterogeneous teams, thus making tool support for collaboration a necessity. We present a novel concept of collaborative workflow design which combines cooperation and workflow model analysis. Workflow analysis is simplified using workflow metrics, which help identifying problematic aspects of the workflow model. Our findings are implemented in a collaborative workflow design system, which is easily accessible on the Web, but provides a desktop-like user experience.
Article
In performance evaluation literature, although the combination of some variables such as age, gender, experience, observation time, and interpersonal affect has been widely considered in determining employee performance, no investigation has indicated the influence of workplace conditions on job performance. This study reports the effects of job characteristics (physical efforts and job grade), and working conditions (environmental conditions and hazards) in addition to experience and education level on task performance and contextual performance. A total of 154 employees in 18 teams at a medium-sized metal company participated in this study. Seven criteria for task performance and 16 for contextual performance were used for measuring employee performance. The results showed that there were substantial relationships between employee performance both job grade and environmental conditions. Poor workplace conditions (physical efforts, environmental conditions, and hazards) result in decreasing employee performance consisted of following organization rules, quality, cooperating with coworkers to solve task problems, concentrating the tasks, creativity, and absenteeism.Relevance to industryUnpleasant working conditions in workshops have different effects on each of the job performance indicators. This study highlighted that training program designed to enhance job performance of the employees working under poor workplace conditions should focus on organizational rules in terms of occupational health and safety.
Article
User interfaces are redesigned for various purposes, like adapting interfaces or meeting new requirements during software creation processes. In the context of learning systems, the aim of interface redesign is to let the student creates his or her own interface corresponding to the abstract concept to be learned, which is reflected in the interface designed. In this article we present an approach to interface redesign in a cooperative learning scenario for cryptographic protocols. We describe an iterative workflow using two different pieces of software for the creation and redesign of interfaces and distributed simulation and evaluate this approach.
Article
Consider a human facing a variety of physically or cognitively demanding tasks, who then performs these tasks in a sequence that has been determined either strategically or arbitrarily. Just as the exhaustive scheduling literature has repeatedly demonstrated the significant impact that scheduling decisions have on system performance, the human factors literature suggests that task sequencing decisions have a profound impact on human performance and well-being. The latter claim is justified almost exclusively by empirical methods. The alternative of applying classic scheduling theory to sequencing decisions involving human tasks was proposed over a decade ago. However, these pioneering frameworks did not delineate a mathematical basis for incorporating human behavior into the machine scheduling paradigm. The purpose of this paper is to establish a framework for scheduling human tasks that account for physical and/or cognitive human characteristics and behaviors. The framework is constructed by surveying the human factors literature in an effort to identify human characteristics that are relevant to task sequencing, and by reviewing emerging areas of the scheduling literature that are auspicious with respect to modeling these human characteristics in a scheduling context. Interdisciplinary research opportunities in scheduling and human factors are also discussed.Relevance to industryThis paper is inspired by the physical and cognitive challenges associated with semi-automated order picking in warehouses. While pick schedules are often designed in practice based on metrics such as maximizing throughput and meeting delivery schedules, this paper describes a framework for task sequencing that accounts for the worker's risk and aims to maximize the worker's productivity. The proposed task-sequencing framework is general and relevant to any working environment characterized by demanding tasks, task variety, and objectives related to productivity and safety.
Conference Paper
The rapid growth of human computation within research and industry has produced many novel ideas aimed at organizing web users to do great things. However, the growth is not adequately supported by a framework with which to understand each new system in the context of the old. We classify human computation systems to help identify parallels between different systems and reveal "holes" in the existing work as opportunities for new research. Since human computation is often confused with "crowdsourcing" and other terms, we explore the position of human computation with respect to these related topics.
Book
This talk is about harnessing human brainpower to solve problems that computers cannot. Although computers have advanced dramatically over the last 50 years, they still do not possess basic conceptual intelligence or perceptual capabilities that most humans take for granted. By leveraging human abilities in a novel way, I solve large-scale computational problems and collect data to teach computers basic human talents. To this end, I treat human brains as processors in a distributed system, each performing a small part of a massive computation.
Article
Games with a purpose (GWAP) focus on improving artificial-intelligence algorithms. The ESP Game is a GWAP in which people provide meaningful, accurate labels for images on the Web as a side effect of playing the game. Other GWAPs include Peekaboom, which locates objects within images, Phetch, which annotates images with descriptive paragraphs. The ESP Game, introduced in 2003, and its successors represent the first seamless integration of game play and computation. The Open Mind Initiative is a worldwide research endeavor developing intelligent software by leveraging human skills to train computers. It collects information from regular Internet users, and feeds it to machine-learning algorithms. The GWAP approach is characterized by three motivating factors that include an increasing proportion of the world's population has access to the Internet, certain tasks are impossible for computers but easy for humans, and people spend lots of time playing games on computers.
Article
Since the concept of crowd sourcing is relatively new, many potential participants have questions about the AMT marketplace. For example, a common set of questions that pop up in an 'introduction to crowd sourcing and AMT' session are the following: What type of tasks can be completed in the marketplace? How much does it cost? How fast can I get results back? How big is the AMT marketplace? The answers for these questions remain largely anecdotal and based on personal observations and experiences. To understand better what types of tasks are being completed today using crowd sourcing techniques, we started collecting data about the AMT marketplace. We present a preliminary analysis of the dataset and provide directions for interesting future research.
Article
CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are widespread security measures on the World Wide Web that prevent automated programs from abusing online services. They do so by asking humans to perform a task that computers cannot yet perform, such as deciphering distorted characters. Our research explored whether such human effort can be channeled into a useful purpose: helping to digitize old printed material by asking users to decipher scanned words from books that computerized optical character recognition failed to recognize. We showed that this method can transcribe text with a word accuracy exceeding 99%, matching the guarantee of professional human transcribers. Our apparatus is deployed in more than 40,000 Web sites and has transcribed over 440 million words.