Conference Paper

Add-Remove-Confirm: Crowdsourcing synset cleansing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... В данном разделе представлены две краудсорсинговые процедуры очистки и обогащения лексических ресурсов. Краудсорсинговая процедура добавить-удалитьподтвердить предназначена для уточнения лексикализации понятий в электронных тезаурусах и состоит из этапов добавления недостающих слов, удаления посторонних слов и подтверждения предложенных изменений [26]. Процедура родвид-сопоставить предназначена для построения семантических отношений между понятиями: участниками соотносятся понятия в парах род и вид, после чего производится сопоставление подтверждённых пар [27]. ...
Article
Full-text available
Recently, microtask crowdsourcing has become a popular approach for addressing various data mining problems. Crowdsourcing workflows for approaching such problems are composed of several data processing stages which require consistent representation for making the work reproducible. This paper is devoted to the problem of reproducibility and formalization of the microtask crowdsourcing process. A computational model for microtask crowdsourcing based on an extended relational model and a dataflow computational model has been proposed. The proposed collaborative dataflow computational model is designed for processing the input data sources by executing annotation stages and automatic synchronization stages simultaneously. Data processing stages and connections between them are expressed by using collaborative computation workflows represented as loosely connected directed acyclic graphs. A synchronous algorithm for executing such workflows has been described. The computational model has been evaluated by applying it to two tasks from the computational linguistics field: concept lexicalization refining in electronic thesauri and establishing hierarchical relations between such concepts. The “Add–Remove–Confirm” procedure is designed for adding the missing lexemes to the concepts while removing the odd ones. The “Genus–Species–Match” procedure is designed for establishing “is-a” relations between the concepts provided with the corresponding word pairs. The experiments involving both volunteers from popular online social networks and paid workers from crowdsourcing marketplaces confirm applicability of these procedures for enhancing lexical resources.
Conference Paper
Sense inventory induction is a topical problem of deriving a set of synsets representing concepts nsing various automatic or human-assisted methods. There might be, and actually are, mistakes in such synsets. Here we are focused on the problem of eliminating potentially duplicate synsets having exactly two words in common as the broader lntersectlon is known to be successfully addressed by beoristics. We exploit the pbenomena of lexical substitotions and microtaskbased crowdsourcing for aligning the synsets to the individual word senses. We also present an open source mobile application implementing oor approach. Our experiments on the Russian language show that the approach scales well aad dramatically reduces the number of duplicate synsets in the inventory.
Article
Full-text available
The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such measures was ever performed for the Russian language. Exploring this problem for the Russian language is even more interesting, because this language has features, such as rich morphology and free word order, which make it significantly different from English, German, and other well-studied languages. We attempt to bridge this gap by proposing a shared task on the semantic similarity of Russian nouns. Our key contribution is an evaluation methodology based on four novel benchmark datasets for the Russian language. Our analysis of the 105 submissions from 19 teams reveals that successful approaches for English, such as distributional and skip-gram models, are directly applicable to Russian as well. On the one hand, the best results in the contest were obtained by sophisticated supervised models that combine evidence from different sources. On the other hand, completely unsupervised approaches, such as a skip-gram model estimated on a large-scale corpus, were able score among the top 5 systems.
Article
Full-text available
Crowdsourcing is an established approach for producing and analyzing data that can be represented as a human-assisted computation system. This paper presents a crowdsourcing engine that makes it possible to run a highly customizable hosted crowdsourcing platform controlling the entire annotation process including such elements as task allocation, worker ranking and result aggregation. The approach and the implementation have been described, and the conducted experiment shows promising preliminary results.
Conference Paper
Full-text available
YARN (Yet Another RussNet) project started in 2013 aims at creating a large open thesaurus for Russian using crowdsourcing. This paper describes synset assembly interface developed within the project — motivation behind it, design, usage scenarios, implementation details, and first experimental results.
Article
Full-text available
This article describes the creation and application of the Turk Bootstrap Word Sense Inventory for 397 frequent nouns, which is a publicly available resource for lexical substitution. This resource was acquired using Amazon Mechanical Turk. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then, more contexts are collected. Contexts that cannot be assigned to a current target word’s sense inventory re-enter the bootstrapping loop and get a supply of substitutions. This process yields a sense inventory with its granularity determined by substitutions as opposed to psychologically motivated concepts. It comes with a large number of sense-annotated target word contexts. Evaluation on data quality shows that the process is robust against noise from the crowd, produces a less fine-grained inventory than WordNet and provides a rich body of high precision substitution data at low cost. Using the data to train a system for lexical substitutions, we show that amount and quality of the data is sufficient for producing high quality substitutions automatically. In this system, co-occurrence cluster features are employed as a means to cheaply model topicality.
Conference Paper
Full-text available
Paid crowd work offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale. But it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework. Can we foresee a future crowd workplace in which we would want our children to participate? This paper frames the major challenges that stand in the way of this goal. Drawing on theory from organizational behavior and distributed computing, as well as direct feedback from workers, we outline a framework that will enable crowd work that is complex, collaborative, and sustainable. The framework lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.
Conference Paper
Full-text available
It is becoming clear that traditional evaluation measures used in Computational Linguistics (including Error Rates, Accuracy, Recall, Precision and F-measure) are of limited value for unbiased evaluation of systems, and are not meaningful for comparison of algorithms unless both the dataset and algorithm parameters are strictly controlled for skew (Prevalence and Bias). The use of techniques originally designed for other purposes, in particular Receiver Operating Characteristics Area Under Curve, plus variants of Kappa, have been proposed to fill the void. This paper aims to clear up some of the confusion relating to evaluation, by demonstrating that the usefulness of each evaluation method is highly dependent on the assumptions made about the distributions of the dataset and the underlying populations. The behaviour of a number of evaluation measures is compared under common assumptions. Deploying a system in a context which has the opposite skew from its validation set can be expected to approximately negate Fleiss Kappa and halve Cohen Kappa but leave Powers Kappa unchanged. For most performance evaluation purposes, the latter is thus most appropriate, whilst for comparison of behaviour, Matthews Correlation is recommended.
Conference Paper
Full-text available
Micro-task markets such as Amazon's Mechanical Turk represent a new paradigm for accomplishing work, in which employers can tap into a large population of workers around the globe to accomplish tasks in a fraction of the time and money of more traditional methods. However, such markets have been primarily used for simple, independent tasks, such as labeling an image or judging the relevance of a search result. Here we present a general purpose framework for accomplishing complex and interdependent tasks using micro-task markets. We describe our framework, a web-based prototype, and case studies on article writing, decision making, and science journalism that demonstrate the benefits and limitations of the approach.
Conference Paper
Full-text available
This paper introduces architectural and interaction patterns for integrating crowdsourced human contributions directly into user interfaces. We focus on writing and editing, complex endeavors that span many levels of conceptual and pragmatic activity. Authoring tools offer help with pragmatics, but for higher-level help, writers commonly turn to other people. We thus present Soylent, a word processing interface that enables writers to call on Mechanical Turk workers to shorten, proofread, and otherwise edit parts of their documents on demand. To improve worker quality, we introduce the Find-Fix-Verify crowd programming pattern, which splits tasks into a series of generation and review stages. Evaluation studies demonstrate the feasibility of crowdsourced editing and investigate questions of reliability, cost, wait time, and work time for edits.
Conference Paper
Multi-version data is often one of the most concerned information on the Web since this type of data is usually updated frequently. Even though there exist some Web information integration systems that try to maintain the latest update version, the maintained multi-version data usually includes inaccurate and invalid information due to the data integration or update delay errors. In this demo, we present CrowdCleaner, a smart data cleaning system for cleaning multi-version data on the Web, which utilizes crowdsourcing-based approaches for detecting and repairing errors that usually cannot be solved by traditional data integration and cleaning techniques. In particular, CrowdCleaner blends active and passive crowdsourcing methods together for rectifying errors for multi-version data. We demonstrate the following four facilities provided by CrowdCleaner: (1) an error-monitor to find out which items (e.g., submission date, price of real estate, etc.) are wrong versions according to the reports from the crowds, which belongs to a passive crowdsourcing strategy; (2) a task-manager to allocate the tasks to human workers intelligently; (3) a smart-decision-maker to identify which answer from the crowds is correct with active crowdsourcing methods; and (4) a whom-to-ask-finder to discover which users (or human workers) should be the most credible according to their answer records.
Article
With the rise of the Web 2.0, collaboratively constructed language resources are rivalling expert-built lexicons. The collaborative construction process of these resources is driven by what is called the "Wisdom of Crowds" phenomenon, which offers very promising research opportunities in the context of electronic lexicography. The vast number and broad diversity of authors yield, for instance, quickly growing and constantly updated resources. While expert-built lexicons have been extensively studied in the past, there is yet a gap in researching collaboratively constructed lexicons. We therefore provide a comprehensive description of Wiktionary – a freely available, collaborative online lexicon. We study the variety of encoded lexical, semantic, and cross-lingual knowledge of three different language edi-tions of Wiktionary and compare the coverage of terms, lexemes, word senses, domains, and registers to multiple expert-built lexicons. We conclude our work by discussing several findings and pointing out Wiktionary's future directions and impact on lexicography.
Article
This chapter presents analytic methods for matched studies with multiple risk factors of interest. We consider matched sample designs of two types, prospective (cohort or randomized) and retrospective (case-control) studies. We discuss direct and indirect parametric modeling of matched sample data and then focus on conditional logistic regression in matched case-control studies. Next, we describe the general case for matched samples including polytomous outcomes. An illustration of matched sample case-control analysis is presented. A problem solving section appears at the end of the chapter.
Article
Entity resolution is central to data integration and data cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate but expensive (and slow) way to bring human insight into the process. Previous work has proposed batching verification tasks for presentation to human workers but even with batching, a human-only approach is infeasible for data sets of even moderate size, due to the large numbers of matches to be tested. Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. We show that for such a hybrid system, generating the minimum number of verification tasks of a given size is NP-Hard, but we develop a novel two-tiered heuristic approach for creating batched tasks. We describe this method, and present the results of extensive experiments on real data sets using a popular crowdsourcing platform. The experiments show that our hybrid approach achieves both good efficiency and high accuracy compared to machine-only or human-only alternatives.
Conference Paper
We introduce PlateMate, a system that allows users to take photos of their meals and receive estimates of food intake and composition. Accurate awareness of this information can help people monitor their progress towards dieting goals, but current methods for food logging via self-reporting, expert observation, or algorithmic analysis are time-consuming, expensive, or inaccurate. PlateMate crowdsources nutritional analysis from photographs using Amazon Mechanical Turk, automatically coordinating untrained workers to estimate a meal's calories, fat, carbohydrates, and protein. We present the Management framework for crowdsourcing complex tasks, which supports PlateMate's nutrition analysis workflow. Results of our evaluations show that PlateMate is nearly as accurate as a trained dietitian and easier to use for most users than traditional self-reporting.
Russian Lexicographic Landscape: a Tale of 12 Dictionaries
  • Y Kiselev
  • A Krizhanovsky
  • P Braslavski
Current Status of Russian Electronic Thesauri: Quality, Completeness and Availability
  • Y Kiselev
  • S Porshnev
  • M Mukhin
Current Status of Russian Electronic Thesauri: Quality, Completeness and Availability
  • kiselev
Russian Lexicographic Landscape: a Tale of 12 Dictionaries
  • kiselev