Conference Paper

State of the Art in Knowledge Extraction from Online Polls: A Survey of Current Technologies

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The ongoing research and development in the field of Natural Language Processing has lead to a great number of technologies in its context. There have been major benefits when it comes to bringing together the worlds of natural language and semantic technologies, so more and more potential areas of application emerge. One of these is the subject of this paper, in particular the possible ways of knowledge extraction from single-question online polls. With concepts of the Social Web, internet users want to contribute and express their opinion. As a consequence, the popularity of online polls is rapidly increasing; they can be found in news articles of media sites, on blogs etc. It would be desirable to bring intelligence to the application of polls by using technologies of the SemanticWeb and Natural Language Processing as this would allow to build a great knowledge base and to draw conclusions from it. This paper surveys the current landscape of tools and state-of-the-art technologies and analyses them with regard to pre-defined requirements that need to be accomplished, in order to be useful for extracting knowledge from the results generated by online polls.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In many cases, these systems -like other analytics tools -make use only of polling metadata. However, previous studies have shown that user opinions in the form of poll responses are an untapped source of knowledge that can help publishers to monetize their content [23]. Combining poll responses with polling metadata allows even better profiling. ...
Chapter
To increase user engagement is an important goal and major business model for many web applications and online publishers. An established tool for this purpose is online polling, where user opinions, preferences, attitudes and possibly personal information are collected to help publishers to a better understanding of their target audiences. These polls are often provided as supplements to online newspaper articles, the topics of which are typically also reflected in the content of the polls. We analyzed and categorized this content, and related it with the user engagement rate given as the proportion of people who voluntarily disclose personal information. Recently, public privacy awareness has increased, especially since the introduction of the European Union’s General Data Protection Regulation (GDPR). Extensive media coverage has led to public discussions about data protection and privacy. This study additionally investigated the effect of increased public privacy awareness on individual privacy awareness and subsequently user engagement. The results are based on live data of more than 60,000 polls and more than 22 million user votes, mainly collected in German-speaking countries, and give insights into user behavior when confronted with requests for personal information in various settings and over time.
... Their main focus is on the basic functionality of showing polls to users and letting them pick an answer. However, previous studies showed that user opinions are an untapped source of knowledge, and have discussed possibilities and requirements of further usage [17]. Some of the potential benefits are: ...
Chapter
Online polls are considered a valuable method of collecting users’ opinions, attitudes and preferences. They also help to increase user engagement, which is a goal of many online publishers, as they seek to understand their target audiences better and therefore want to collect and analyze user data. Gaining access to information from their users’ social network accounts is seen as a significant advance and consequently social login functionality is becoming an increasingly common feature of various web applications. Users appreciate the convenience and benefits of this, but are often unaware of the privacy issues that arise. This study investigated the influence of different types of privacy alert on users’ decisions whether to connect an online polling application to a social network, thereby granting access to their social media data in exchange for seeing their friends’ votes. The method used live data from real polls in German-speaking countries and gives insights into user behavior when confronted with requests for Facebook data. Differences in privacy awareness and user decisions between our research and previous studies in laboratory settings are addressed as well.
... A comprehensive set of requirements for a system coping with knowledge extraction from online polls can be found in the work of Stabauer, Grossmann and Stumptner [20]. For a comparison of knowledge extraction tools, see [7]. ...
Conference Paper
Full-text available
A vast majority of internet users has adopted new ways and possibilities of interaction and information exchange on the social web. Individuals are becoming accustomed to contribute and express their opinion on various platforms and websites. Commercial online polls allow operators of online newspapers, blogs and other forms of media sites to provide such services to their users. Consequently, their popularity is rapidly increasing and more and more potential areas of application emerge. However, in most cases the expressed opinions are stored and displayed without any further actions and the knowledge that lies in the answers is discarded. This research paper explores the possibilities, advantages and limits of applying semantic technologies to these online polls. For this purpose, a list of requirements was assembled and possible system architectures for semantic knowledgebases were investigated with the focus on providing consistent and extensive data for further processing. In a next step, the current state of the art of relevant visualization technologies was analyzed and further research challenges were identified. Our results discuss possible applications within the scope of a challenging case study. A comprehensive data pool provided by our industry partner allows for testing various improvements to user experience and traction of the polling system.
Conference Paper
Online polling is a popular tool to increase user involvement on all kinds of websites. Consumers are interested in sharing their opinion and so contribute to the website's content. Aggregated opinions, attitudes, and preferences convey a great deal of knowledge which is often unutilised as there exists no efficient method to explore them. An ongoing research project suggests methods, technologies, and processes to extract the knowledge that lies within the questions posed by website publishers and the answers given by users. This knowledge is saved in a triple store and enhanced by reasoning and other methods of the semantic web. One important technique is the visualisation of both the structure (entities and relations) and aggregated information of consumers in the knowledge base. Existing techniques often focus on only one of them although integration of both is required to explore the nature of the content and information about user groups such as size and intersections among them. The research described in this paper surveys current visualisation tools and libraries for the support of identified requirements in a case study. Based on the findings, an implementation of Agile Visualization for a specific polling system in specific and ontologies in general is proposed, which allows for a more customisable and flexible visualisation. A reusable transformation process for the ontology's data is discussed, which makes it possible to use the aforementioned knowledge base as input for the agile visualization approach.
Article
Full-text available
Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a unified neural network framework which processes input sequences and questions, forms semantic and episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state of the art results on several types of tasks and datasets: question answering (Facebook's bAbI dataset), sequence modeling for part of speech tagging (WSJ-PTB), and text classification for sentiment analysis (Stanford Sentiment Treebank). The model relies exclusively on trained word vector representations and requires no string matching or manually engineered features.
Conference Paper
Full-text available
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design, straight-forward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Conference Paper
Full-text available
We present a new version of CEDAR, a taxonomic reasoner for large-scale ontologies. This extended version provides fuller support for TBox reasoning, checking consistency, and retrieving instances. CEDAR is built on top of the OSF formalism and based on an entirely new architecture which includes several optimization techniques. Using OSF graph structures, we define a bidirectional mapping between OSF structure and the Resource Description Framework (RDF) allowing a translation from OSF queries into SPARQL for retrieving instances. Experiments were carried out using very large ontologies. The results achieved by CEDAR were compared to those obtained by well-known Semantic Web reasoners such as FaCT++, Pellet, HermiT, TrOWL, and RacerPro. CEDAR performs on a par with the best systems for concept classification and several orders of magnitude more efficiently in terms of response time for Boolean query-answering.
Article
Full-text available
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.
Conference Paper
Full-text available
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.
Article
Full-text available
Previous work on German parsing has provided confusing and conflicting results concerning the difficulty of the task and whether techniques that are useful for English, such as lexicalization, are effective for German. This paper aims to provide some understanding and solid baseline numbers for the task. We examine the performance of three techniques on three treebanks (Negra, Tiger, and TüBa-D/Z): (i) Markovization, (ii) lexicalization, and (iii) state splitting. We additionally explore parsing with the inclusion of grammatical function information. Explicit grammatical functions are important to German language understanding, but they are numerous, and naïvely incorporating them into a parser which assumes a small phrasal category inventory causes large performance reductions due to increasing sparsity.
Article
Full-text available
We describe a software environment to support research and development in natural language (NL) engineering. This environment - GATE (General Architecture for Text Engineering) - aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarization, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialization and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available.
Conference Paper
Full-text available
The KIM platform provides a novel Knowledge and Information Management infrastructure and services for automatic semantic annotation, indexing, and retrieval of documents. It provides mature infrastructure for scaleable and customizable information extraction (IE) as well as annotation and document management, based on GATE. In order to provide basic level of performance and allow easy bootstrapping of applications, KIM is equipped with an upper-level ontology and a knowledge base providing extensive coverage of entities of general importance. The ontologies and knowledge bases involved are handled using cutting edge Semantic Web technology and standards, including RDF(S) repositories, ontology middleware and reasoning. From technical point of view, the platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases. This paper presents the KIM platform, with emphasize on its architecture, interfaces, tools, and other technical issues.
Conference Paper
Full-text available
Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure helps us to not only identify the units that an image or sentence contains but also how they interact to form a whole. We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. The same algorithm can be used both to provide a competitive syntactic parser for natural language sentences from the Penn Treebank and to outperform alternative approaches for semantic scene segmentation, annotation and classification. For segmentation and annotation our algorithm obtains a new level of state-of-the-art performance on the Stanford background dataset (78.1%). The features from the image parse tree outperform Gist descriptors for scene classification by 4%.
Conference Paper
Full-text available
We outline work to be carried out within the framework of an impending EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology) anchored in a language-independent ontology that is linked to wordnets in several languages. For each language, information extraction and identification of lexicalized concepts with ontological entries will be done by text miners ("Kybots"). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project will enable long-range knowledge sharing and transfer to many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the domain of ecology and environment addressed here.
Conference Paper
Full-text available
Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
Conference Paper
Full-text available
Most current statistical natural language process- ing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sam- pling, a simple Monte Carlo method used to per- form approximate inference in factored probabilis- tic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorpo- rate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consis- tency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.
Article
Full-text available
Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.
Conference Paper
Full-text available
We describe a software environment to support research and development in natural language (NL) engineering. This environment-GATE (General Architecture for Text Engineering)-aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available.
Article
Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images. However, the sentence vectors of previous models cannot accurately represent visually grounded meaning. We introduce the DT-RNN model which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences. Unlike previous RNN-based models which use constituency trees, DT-RNNs naturally focus on the action and agents in a sentence. They are better able to abstract from the details of word order and syntactic expression. DT-RNNs outperform other recursive and recurrent neural networks, kernelized CCA and a bag-of-words baseline on the tasks of finding an image that fits a sentence description and vice versa. They also give more similar representations to sentences that describe the same image.
Article
With the rapid growth of social media, sentiment analysis, also called opinion mining, has become one of the most active research areas in natural language processing. Its application is also widespread, from business services to political campaigns. This article gives an introduction to this important area and presents some recent developments.
Chapter
This chapter discusses some points of similarity between the various paradigms, namely production rules, structured representations as exemplified by frame systems incorporating an inheritance mechanism, and first-order predicate calculus, and reviews a number of areas where the expressive power of these representations seems to be inadequate, and extensions that have been proposed to address the resultant problems. Although the representations have very different surface forms, the entity-attribute-value triples found in production systems, the instance-slot-filler notation of the frame system, and relations with two parameters found in predicate logic, all express precisely the same information, namely that a binary relation holds between two objects in the domain. Predications are, as has always been recognized by formal logic, the typical form of an assertion of a fact about the world, and much knowledge consists of such predications. Therefore, taken as means of describing facts by making predications, the three paradigms are essentially equivalent in expressive power.
Conference Paper
In the last years, basic NLP tasks: NER, WSD, relation extraction, etc. have been configured for Semantic Web tasks including ontology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowledge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output.
Article
This paper addresses the problem of transforming business specifications written in natural language into formal models suitable for use in information systems development. It proposes a method for transforming controlled natural language specifications based on the Semantics of Business Vocabulary and Business Rules standard. This approach is unique in combining techniques from Model-Driven Engineering (MDE), Cognitive Linguistics, and Knowledge-based Configuration, which allows the reliable semantic processing of specifications and integration with existing MDE tools to improve productivity, quality, and time-to-market in software development. The method first learns the vocabulary of the specification from glossary-like definitions then parses the rules of the specification and outputs the resulting formal SBVR model. Both aspects of the method are tested separately, with the system correctly learning 98% of the vocabulary and correctly interpreting 98% of the rules of an SBVR SE based example. Finally, the proposed method is compared to state-of-the-art approaches for creating formal models from natural language specifications, arguing that it meets the criteria necessary to fulfil the three goals of (1) shifting control of specification to non-technical business experts, (2) reducing the manual effort involved in formalising specifications, and (3) supporting business experts in creating well-formed sets of business vocabularies and rules.
Article
This paper presents the first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRSs). LCFRS, an extension of CFG, can describe discontinuities in a straightforward way and is therefore a natural candidate to be used for data-driven parsing. To speed up parsing, we use different context-summary estimates of parse items, some of them allowing for A∗ parsing. We evaluate our parser with grammars extracted from the German NeGra treebank. Our experiments show that data-driven LCFRS parsing is feasible and yields output of competitive quality.
Conference Paper
A central challenge in semantic parsing is handling the myriad ways in which knowledge base predicates can be expressed. Traditionally, semantic parsers are trained primarily from text paired with knowledge base information. Our goal is to exploit the much larger amounts of raw text not tied to any knowledge base. In this paper, we turn semantic parsing on its head. Given an input utterance, we first use a simple method to deterministically generate a set of candidate logical forms with a canonical realization in natural language for each. Then, we use a paraphrase model to choose the realization that best paraphrases the input, and output the corresponding logical form. We present two simple paraphrase models, an association model and a vector space model, and train them jointly from question-answer pairs. Our system PARASEMPRE improves stateof- the-art accuracies on two recently released question-answering datasets.
Conference Paper
In Chap. 9, we studied the extraction of structured data from Web pages. The Web also contains a huge amount of information in unstructured texts. Analyzing these texts is of great importance as well and perhaps even more important than extracting structured data because of the sheer volume of valuable information of almost any imaginable type contained in text. In this chapter, we only focus on mining opinions which indicate positive or negative sentiments. The task is technically challenging and practically very useful. For example, businesses always want to find public or consumer opinions on their products and services. Potential customers also want to know the opinions of existing users before they use a service or purchase a product.
Conference Paper
L-LDA is a new supervised topic model for assigning "topics" to a collection of documents (e.g., Twitter profiles). User studies have shown that L-LDA effectively performs a variety of tasks in Twitter that include not only assigning topics to profiles, but also re-ranking feeds, and suggesting new users to follow. Building upon these promising qualitative results, we here run an extensive quantitative evaluation of L-LDA. We test the extent to which, compared to the competitive baseline of Support Vector Machines (SVM), L-LDA is effective at two tasks: 1) assigning the correct topics to profiles; and 2) measuring the similarity of a profile pair. We find that L-LDA generally performs as well as SVM, and it clearly outperforms SVM when training data is limited, making it an ideal classification technique for infrequent topics and for (short) profiles of moderately active users. We have also built a web application that uses L-LDA to classify any given profile and graphically map predominant topics in specific geographic regions.
Conference Paper
There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.
Article
The main applications and challenges of one of the hottest research areas in computer science.
Chapter
Configuration models specify the set of possible configurations (solutions). A configuration model together with a defined set of (customer) requirements are the major elements of a configuration task (problem). In this chapter, we discuss different knowledge representations that can be used for the definition of a configuration model. We provide examples that help to further develop the understanding of the underlying concepts and include a UML-based personal computer (PC) configuration model that is used as a reference example throughout this book.
Conference Paper
We present a document classification system that employs lazy learning from labeled phrases, and argue that the system can be highly effective whenever the following property holds: most of information on document labels is captured in phrases. We call this property near sufficiency. Our research contribution is twofold: (a) we quantify the near sufficiency property using the Information Bottleneck principle and show that it is easy to check on a given dataset; (b) we reveal that in all practical cases---from small-scale to very large-scale---manual labeling of phrases is feasible: the natural language constrains the number of common phrases composed of a vocabulary to grow linearly with the size of the vocabulary. Both these contributions provide firm foundation to applicability of the phrase-based classification (PBC) framework to a variety of large-scale tasks. We deployed the PBC system on the task of job title classification, as a part of LinkedIn's data standardization effort. The system significantly outperforms its predecessor both in terms of precision and coverage. It is currently being used in LinkedIn's ad targeting product, and more applications are being developed. We argue that PBC excels in high explainability of the classification results, as well as in low development and low maintenance costs. We benchmark PBC against existing high-precision document classification algorithms and conclude that it is most useful in multilabel classification.
Conference Paper
Freebase is a practical, scalable, graph-shaped database of structured general human knowledge, inspired by Se- mantic Web research and collaborative data communi- ties such as the Wikipedia. Freebase allows public read and write access through an HTTP-based graph-query API for research, the creation and maintenance of struc- tured data, and application building. Access is free and all data in Freebase has a very open (e.g. Creative Com- mons, GFDL) license.
Conference Paper
This paper presents a first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRS), together with context-summary estimates for parse items used to speed up parsing. LCFRS, an extension of CFG, can describe discontinuities both in constituency and dependency structures in a straightforward way and is therefore a natural candidate to be used for data-driven parsing. We evaluate our parser with a grammar extracted from the German NeGra treebank. Our experiments show that data-driven LCFRS parsing is feasible with a reasonable speed and yields output of competitive quality.
Conference Paper
Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-of-the-art Open IE systems is rife with uninformative and incoherent extractions. To overcome these problems, we introduce two simple syntactic and lexical constraints on binary relations expressed by verbs. We implemented the constraints in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. More than 30% of ReVerb's extractions are at precision 0.8 or higher---compared to virtually none for earlier systems. The paper concludes with a detailed analysis of ReVerb's errors, suggesting directions for future work.
Conference Paper
In this paper we present BabelNet - a very large, wide-coverage multilingual se- mantic network. The resource is automat- ically constructed by means of a method- ology that integrates lexicographic and en- cyclopedic knowledge from WordNet and Wikipedia. In addition Machine Transla- tion is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and ex- isting gold-standard datasets to show the high quality and coverage of the resource.
Article
We propose a new fast purely discriminative algorithm for natural language parsing, based on a "deep" recurrent convolutional graph transformer network (GTN). Assuming a decomposition of a parse tree into a stack of "levels", the network predicts a level of the tree taking into account predictions of previous levels. Using only few basic text features which leverage word representations from Collobert and Weston (2008), we show similar performance (in F1 score) to existing pure discriminative parsers and existing "benchmark" parsers (like Collins parser, probabilistic context-free grammars based), with a huge speed advantage.
Article
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.
Ask me anything: Dynamic memory netorks for natural language processing
  • A Kumar O. Irsoy
  • J Su
  • J Bradbury
  • R English
  • B Pierce
  • P Ondruska
  • M Iyyer
  • I Gulrajani
  • R Socher