Conference Paper

A Generic Approach to Generate Opinion Lists of Phrases for Opinion Mining Applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we present an approach to generate lists of opinion bearing phrases with their opinion values in a continuous range between -- 1 and 1. Opinion phrases that are considered include single adjectives as well as adjective-based phrases with an arbitrary number of words. The opinion values are derived from user review titles and star ratings, as both can be regarded as summaries of the user's opinion about the product under review. Phrases are organized in trees with the opinion bearing adjective as tree root. For trees with missing branches, opinion values then can be calculated using trees with similar branches but different roots. An example list is produced and compared to existing opinion lists.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... An alternative way is the inclusion of intensifiers , reducers and negation words directly in the opinion list. We follow this approach and use an algorithm presented in (Rill et al., 2012 ) to generate a list containing opinion bearing phrases together with their opinion values for the German language. ...
... PL and GPC will be used as benchmarks for our list (see Section 4.3). In (Rill et al., 2012), the authors proposed a generic algorithm to derive opinion values from online reviews taking advantage of the fact that both, star ratings and review titles can be regarded as a short summary of the writer's opinion and therefore are strongly correlated. The authors use this algorithm to derive a list of opinion bearing adjectives and adjective-based phrases for the English language. ...
... In some cases star rating and textual polarity are not correlated. Therefore, we perform the same filtering steps like proposed in (Rill et al., 2012): ...
... The resulting lexicon was used to estimate user happiness over the course of an average 24hour day as well as a seven-day week. Rill et al. (2012) independently came up with a very similar approach for identifying the evaluative meaning of adjectives and adjective phrases (absolutely fantastic vs. just awful) based on a corpus of online product reviews. Since the individual reviews come with a one-to-five star rating, the evaluative meaning of an adjective or phrase was computed as the average rating of all reviews it occurs in (Mean Star Rating, see Section 3.). ...
... MEAN STAR RATING. Following Rill et al. (2012), we predict y w i by averaging the gold labels of documents d j in which the word w i occurs. We denote the set of documents containing w i as D(w i ). ...
... To apply this method to numerical document labels (as present in the empathic reactions dataset; see Section 5.) we first apply a median split (documents labels below (above) the median are recorded as 0 (1)). Subsequently, we calculate Mean Binary Rating using the same equation as for Mean Star Rating (Equation 1) thus showing the resemblances between Mihalcea and Liu (2006) and Rill et al. (2012). ...
Preprint
Despite the excellent performance of black box approaches to modeling sentiment and emotion, lexica (sets of informative words and associated weights) that characterize different emotions are indispensable to the NLP community because they allow for interpretable and robust predictions. Emotion analysis of text is increasing in popularity in NLP; however, manually creating lexica for psychological constructs such as empathy has proven difficult. This paper automatically creates empathy word ratings from document-level ratings. The underlying problem of learning word ratings from higher-level supervision has to date only been addressed in an ad hoc fashion and is missing deep learning methods. We systematically compare a number of approaches to learning word ratings from higher-level supervision against a Mixed-Level Feed Forward Network (MLFFN), which we find performs best, and use the MLFFN to create the first-ever empathy lexicon. We then use Signed Spectral Clustering to gain insights into the resulting words.
... The new approach adopts the ideas of previous work from [23], namely (1) the calculation of sentiment values based on the correlation of a numerical rating and the wording in review titles and (2) the consideration of phrases instead of single words. The previous approach was based on part-of-speech (POS) tagging and simple phrase patterns, such as "[ADV+ADJ]" 1 . ...
... Thus, the treatment of intensifiers, reducers, and negation words has to be performed in the application algorithm implementing a sophisticated sentiment value composition when using entries from the sentiment lexicons in use [5,13,17,18]. One possibility to solve the problem of valence shifters has been described in the work of [23]. There, the authors propose using sentiment lexicons which not only contain single opinion-bearing words but sentiment phrases including the valence shifters. ...
... There, the authors propose using sentiment lexicons which not only contain single opinion-bearing words but sentiment phrases including the valence shifters. Using this approach, sentiment lexicons containing adjective phrases for the English language [23] (cba) and noun and adjective phrases for the German language [22] (cba) were derived using Amazon user reviews as a source of annotated text. A similar approach, namely generating a sentiment lexicon which contains opinion-bearing phrases, but with the use of a seed of positive and negative words, was used in [26] (cba). ...
Conference Paper
In this paper, we describe a new algorithm designed to generate lexical resources in the field of sentiment analysis. For this approach, based on corpora of customer reviews, we determine words and phrases as candidates for our sentiment lexicon solely by calculating a word co-occurrence measure and by considering word frequencies. The sentiment values of every single word or phrase are derived automatically from the review titles and the associated given ratings. We consciously renounce the use of natural language processing methods in order to ensure language independency of our algorithm. Furthermore, by using exclusively statistical methods, we are able to identify rather unusual word combinations, such as idiomatic expressions. This differentiates our work from most prior approaches which concentrate on single words or word-modifier combinations. An example lexicon is generated by the use of a corpus of 1.5 million German Amazon customer reviews.
... In addition to this, Rill et al. states that adjectives (JJ) along with its two forms such as superlative adjectives (JJS) and comparative adjectives (JJR) are helpful in obtaining the opinions from reviews [29]. The results obtained by this approach is 0.78 using Sen-tiWordNet library. ...
... 2. Combining other features with adjectives or adverbs does not enhance the value of F-measure From this discussion, it is obvious that adjectives and adverbs are the most important polarity bearing features. Moreover, different types of adjectives have already been studied [29,31]. Two types of adverbs such as: general adverbs [9] and degree adverbs [38] have also been utilized. ...
Article
Full-text available
Online shopping websites like Amazon stipulate a platform to the users where they can share their opinions about different products. Recently, it has been identified that prior to the purchasing, 81% of the users explore different online platforms in order to assess the reliability of product that they intend to buy. The reviews of different users are expressed by using natural language, which help a user to make an informed decision. From past few years, scientific community has payed attention to automatically specify the meaning of review through Sentiment Analysis. Sentiment Analysis is a research area which is gradually being evolved thus, helping the users to tackle the sentiment hidden in a review. To date, different sentiment analysis-based studies have been conducted in literature. For sentiment classification, the core ingredient is the exploitation of polarity bearing words present in the reviews e.g. adjectives, verbs, and adverbs etc. Different studies suggest the importance of different forms of adverbs in sentiment classification task. In literature, it has been reported that general adverbs strongly help to classify sentiments with better accuracy whereas other suggest that degree adverbs are important for sentiment classification. There are ten distinct forms of adverbs such as general adverbs, general superlative adverbs, general comparative adverbs, general-wh adverbs, degree adverbs, degree superlative adverbs, degree comparative adverbs, degree-wh adverbs, time adverbs and locative adverbs. In this paper, we intend to tackle a question that what is the impact of different forms of adverb on the classification of sentiments? For this, the impacts of all these forms have been evaluated on 51,005 reviews of two products, office products and musical DVDs acquired from Amazon. The outcomes of study revealed that two general superlative adverbs and degree-wh adverb hold more impact than the other forms of adverbs. The general superlative adverbs have attained F-measure of 0.86 and degree-wh adverbs have attained F-measure of 0.80.
... Recently, some excellent methods are proposed to conquer the existing limitations. For example, Rill et al. (2012) proposed an approach to generate the lists of opinion bearing phrases based on phrase extraction strategies. This work only adopts the review titles and the star ratings to calculate the opinion values. ...
Article
Opinion mining mainly involves three elements: feature and feature-of relations, opinion expressions and the related opinion attributes (e.g. Polarity), and feature–opinion relations. Although many works have emerged to achieve its aim of gaining information, the previous researches typically handled each of the three elements in isolation, which cannot give sufficient information extraction results; hence, the complexity and the running time of information extraction is increased. In this paper, we propose an opinion mining extraction algorithm to jointly discover the main opinion mining elements. Specifically, the algorithm automatically builds kernels to combine closely related words into new terms from word level to phrase level based on dependency relations; and we ensure the accuracy of opinion expressions and polarity based on: fuzzy measurements, opinion degree intensifiers, and opinion patterns. The 3458 analyzed reviews show that the proposed algorithm can effectively identify the main elements simultaneously and outperform the baseline methods. The proposed algorithm is used to analyze the features among heterogeneous products in the same category. The feature-by-feature comparison can help to select the weaker features and recommend the correct specifications from the beginning life of a product. From this comparison, some interesting observations are revealed. For example, the negative polarity of video dimension is higher than the product usability dimension for a product. Yet, enhancing the dimension of product usability can more effectively improve the product.
... For languages other than Polish, other approaches have been proposed. For example, Rill et al. (2012) introduced a method to derive sentiment phrases from reviews. Authors generated lists of opinion bearing phrases with their opinion values in a continuous range between −1 and 1. ...
Article
This article is a comprehensive review of freely available tools and software for sentiment analysis of texts written in Polish. It covers solutions which deal with all levels of linguistic analysis: starting from word-level, through phrase-level and up to sentence-level sentiment analysis. Technically, the tools include dictionaries, rule-based systems as well as deep neural networks. The text also describes a solution for finding opinion targets. The article also contains remarks that compare the landscape of available tools in Polish with that for English language. It is useful from the standpoint of multiple disciplines, not only information technology and computer science, but applied linguistics and social sciences.
... Considering this assumption as a base, we fix the value of θ as 10. 6 Polarity of Words in the Unlabeled Target Domain: Generally, in a polar corpus, a positive word occurs more frequently in context of other positive words, while a negative word occurs in context of other negative words . 7 Based on this hypothesis, we explore the contextual information of a word that is captured well by its context vector to assign polarity to words in the target domain (Rill et al., 2012;Rong, 2014). Mikolov et al., (2013) showed that similarity between context vector of words in vicinity such as 'go' and 'to' is higher compared to distant words or words that are not in the neighborhood of each other. ...
... lexicon that maps adjectives to real-valued scores encoding both sentiment polarity and intensity. The lexicon might be compiled automatically -for example, from analyzing adjectives' appearance in star-valued product or movie reviews (de Marneffe et al., 2010;Rill et al., 2012;Sharma et al., 2015;Ruppenhofer et al., 2014) -or manually. In our experiments we utilize the manually-compiled SO-CAL lexicon (Taboada et al., 2011). ...
... We assume negative polar expressions with a very high polar intensity to occur significantly more often in reviews assigned few stars (i.e. 1 or 2). Ruppenhofer et al. (2014) established that the most effective method to derive such polar intensity is by ranking words by their weighted mean of star ratings (Rill et al., 2012). All words of our base lexicon are ranked according to that score. ...
... Sentiment Analysis Affect is one of the properties of most idioms, so it is clear that dealing with idiomatic expressions is part of sentiment analysis, e.g. [11,15]. ...
Conference Paper
Full-text available
Idiomatic expressions are part of everyday language, therefore NLP applications that can "understand" idioms are desirable. The nature of idioms is somewhat heterogenous-idioms form classes differing in many aspects (e.g. syntactic structure, lexical and syntactic fixedness). Although dictionaries of idioms exist, they usually do not contain information about fixedness or frequency since they are intended to be used by humans, not computer programs. In this work, we propose how to deal with idioms in the Czech verb valency lexicon VerbaLex using automatically extracted information from the largest dictionary Czech idioms and a web corpus. We propose a three stage process and discuss possible issues.
... Tell About The Intensity of Words? Rill et al. (2012) showed that an intensity annotated polar corpus can be used to derive the intensity of the adjectives. A high intensity word will occur more frequently in high intensity reviews. ...
... Another corpus-based method we consider employs Mean star ratings (MeanStar) from product reviews as described by Rill et al. (2012). Unlike Collex, this method uses no linguistic properties of the adjectives themselves. ...
Chapter
We address the question which word n-gram feature induction approach yields the most accurate discriminative model for machine learning-based sentiment analysis within a specific domain: a purely data-driven word n-gram feature induction or a word n-gram feature induction based on a domain-specific or domain-non-specific polarity dictionary. We evaluate both approaches in document-level polarity classification experiments in 2 languages, English and German, for 4 analog domains each: user-written product reviews on books, DVDs, electronics and music. We conclude that while dictionary-based feature induction leads to large dimensionality reductions, purely data-driven feature induction yields more accurate discriminative models.
Conference Paper
In this paper, we present a study of aspect-based opinion mining using a lexicon-based approach. We use a phrase-based opinion lexicon for the German language to investigate, how good strong positive and strong negative expressions of opinions, concerning products and services in the insurance domain, can be detected. We perform experiments on hand-tagged statements expressing opinions retrieved from the Ciao platform. The initial corpus contained about 14,000 sentences from 1,600 reviews. For both, positive and negative statements, more than 100 sentences were tagged. We show, that the algorithm can reach an accuracy of 62.2% for positive, but only 14.8% for negative utterances of opinions. We examine the cases, in which the opinion could not correctly be detected or in which the linking between the opinion statement and the aspect fails. Especially, the large gap in accuracy between positive and negative utterances is analysed.
Conference Paper
In this paper, we propose a novel approach to identification of opinion words polarity (so-called polarity induction) for opinion words lexicon extraction. The method employs summaries of many reviews of a single product (or service) as a prediction of polarity of opinion words describing different aspects of the said product. In the article a preliminary experimentation is presented on the basis of which we can expect that the proposed approach can be used in polarity identification.
Thesis
Full-text available
In der vorliegenden Masterarbeit wird ein mehrteiliges Projekt vorgestellt, das den Einsatz von Sentiment Analysis (SA) in der quantitativen Dramenanalyse exploriert. Als beispielhafter Untersuchungsgegenstand wird ein Korpus von 11 Dramen des Schriftstellers Gotthold Ephraim Lessing (1729 – 1782) verwendet. Die Arbeit stellt eine Erweiterung eines bestehenden Tools zur quantitativen Dramenanalyse (Katharsis) um eine SA-Komponente dar. Es wurden Python-Programme zur Durchführung der SA entwickelt. Als zentraler SA-Ansatz wird mangels annotierter Trainings-Korpora ein Lexikon-basierter Ansatz gewählt. Um ein optimiertes SA-Verfahren zu identifizieren, werden mehrere Optionen und Herangehensweisen für die SA implementiert und auf ihre Leistung für den spezifischen Anwendungsfall untersucht. Es werden fünf der bekanntesten deutschsprachigen SA-Lexika implementiert sowie eine kombinierte Gesamtversion dieser erstellt. Als weitere Optionen wird der Einfluss einer Lexikonerweiterung mit historischen linguistischen Varianten, von Lemmatisierung über zwei Lemmatisierer und drei Lemmatisierungsarten, von drei verschiedenen Stoppwortlisten und der Beachtung von Groß- und Kleinschreibung implementiert und untersucht. Es werden für alle kombinatorischen Möglichkeiten von Lexika und Optionen verschiedene Sentiment-Metriken auf verschiedenen Ebenen berechnet. Als Ebenen des Dramas werden Sentiment-Metriken für die strukturelle Ebene (Drama, Akt, Szene, Replik), die Sprecher-Ebene (pro Drama, Akt, Szene, Replik) und für Sprecherbeziehungen (pro Dra-ma, Akt, Szene, Replik) kalkuliert. Es werden unterschiedliche Metriken für die Polari-tät (positiv, negativ) und 8 Emotionskategorien auf diesen Ebenen berechnet. Es werden mehrere Evaluationsverfahren durchgeführt. In einer ersten informellen Evaluation wird der Anteil der Wörter der Lexika in Zusammenhang mit den genannten Optionen am Vokabular des Korpus untersucht und diskutiert. Zur Ausführung einer systematischen Evaluation wird ein Gold-Standard von annotierten Repliken erstellt. In einer Annotationsstudie beurteilen 5 Teilnehmer einen repräsentativen Korpus von 200 Repliken bezüglich Polarität und Emotionen. In einem anschließenden Fragebogen konnten Einsichten zu Probleme und Schwierigkeiten bei der Annotation erhoben werden. Die Ergebnisse der Annotation werden statistisch ausgewertet und hinsichtlich Annotationsverhalten untersucht. Als Hauptergebnisse stellt man einen grundsätzlich geringeren Übereinstimmungsgrad als bei anderen Untersuchungsgegenständen in der SA fest. Auffällig ist auch eine starke Ungleichverteilung der Polaritäten im Korpus. Es werden deutlich mehr Repliken als negativ denn als positiv wahr-genommen. Das finale Evaluations-Korpus (Gold Standard, GS) besteht aus 139 negativen und 61 positiven Repliken basierend auf der Mehrheitsentscheidung der Annotatoren. Über ein in Python entwickeltes Evaluationsframework wurde systematisch die SA-Leistung aller Lexika und Methoden hinsichtlich der Prädiktion der Polarität einer Replik untersucht. Verschieden Evaluations-Metriken wurden zur differenzierten Ana-lyse und Diskussion aller Ansätze berechnet. Es können Erkennungsraten von bis zu 70% festgestellt werden. Unter Analyse aller Evaluationsergebnisse wird das leistungsstärkste Verfahren bestimmt. Es setzt sich aus der Methoden-Kombination des Lexikons SentiWS, erweitert durch historische linguistische Varianten, mit einer Lemmatisierung auf Text- und Lexikon-Ebene über den pattern-Lemmatisierer, ohne Stoppwortiste und unter Beachtung von Groß- und Kleinschreibung im letzten Abgleichschritt, zusammen. Für das als am besten identifizierte Verfahren wird ein Front-End zur Visualisierung der SA-Metriken als Web-Anwendung implementiert. Es stehen interaktive Visualisierungen für Polaritäten und Emotionskategorien zur Verfügung. Es können Verteilungen und Verläufe auf Dramen-, Akt-, Szenen-, Replik-, Sprecher- und Sprecherbeziehungs (je pro Drama, Akt, Szene, Replik) exploriert werden. Der mögliche Einsatz in der Dramenanalyse wird anhand vereinzelter Fallbeispiele beschrieben. Ab-schließend werden die Ergebnisse des Gesamtprojekts im Kontext der Forschung diskutiert und mögliche Anknüpfungspunkte besprochen.
Article
The Online reviews provided for a product enables web user to make decisions appropriately. These reviews may be positive, negative or neutral in nature. Analyzing and classifying such product reviews have attracted reasonable interest. It has become quite hard to make decisions since we aren't able to obtain the decisions quickly. Hence it is required to classify the reviews from balanced data sets for analysis and opinion mining of any applications. The reason for considering balanced data sets is that the decision will not be biased on the category of reviews considered. We have carried out investigations using similarity measures to categorize the reviews correctly. Experiments reveal that the reviews that were mixed in nature were able to be grouped correctly.
Conference Paper
Opinion mining is the field of study that analyses people?s thoughts, sentiments, emotions and attitude towards entities, product, services, issues, topics, events and their attributes. There are many different tasks such as opinion extraction, sentiment mining, emotional analysis, review mining etc. The important aspect of opinion minion is to gather the information from reviews, blogs, etc. and then finding out the behavior of that information, i.e. the information is related to either positive or negative context. The positive and negative reviews or blogs deal with a numerical value. The value is to be calculated using SentiWordNet 3.0. The opinion words are mainly adjective words such as ?good,? ?better,? ?awesome.? But there arises several problems because identifiers negation words and the extension of the opinion words such as ?very very good? are not considered. In this paper, details about opinion mining, how the polarity value deals with positive and negative and how to deal with Roman language reviews and blogs is discussed.
Chapter
Mit der zunehmenden Menge textueller Daten im Web 2.0 wächst auch die Notwendigkeit der maschinellen Auswertung dieser Daten, beispielsweise um in Texten geäußerte Meinungen aufzuspüren (Opinion Mining). Im vorliegenden Beitrag wird das Aspect-based Opinion Mining – ein Verfahren mit sehr hohem Detaillierungsgrad – für deutschsprachige Texte anhand eines Projekts für die Versicherungswirtschaft vorgestellt. Es wird gezeigt, dass in Bewertungsplattformen geäußerte Meinungen zu Produkten und Services von Versicherungen mit einer Genauigkeit von etwa 90% und einer Vollständigkeit von ca. 80% für positive und ca. 60% für negative Meinungen erkannt werden können.
Article
Full-text available
Social networking websites such as Twitter provide a platform where users share their opinions about different news, events, and products. A recent research has identified that 81% of users search online first before purchasing products. Reviews are written in natural language and needs sentiment analysis for opinion extraction. Various approaches have been proposed to perform sentiment classification based on polarity bearing words in reviews such as noun, verb, adverb, and an adjective. Prior researchers have also identified the role of an adverb as a feature. However, impact analysis of adverb forms, are not yet studied and remains an open research area. This study focused on the following tasks: (1) impact of different forms of adverbs that are not studied for sentiment classification; (2) analysis of possible combinations of eight forms that are 255. The different forms are Adverb (RA), Degree Adverbs (RG), Degree Comparative Adverbs (RGR), General Adverbs (RR), General Comparative Adverbs (RRR), Locative Adverbs (RL), Prep. Adverb (RP), and Adverbs of time (RT); (3) comparison with benchmark dataset. Dataset of 5513 tweets is used to evaluate the idea. The findings of this work show that RRR and RR are important polarities bearing words for neutral opinions, RL for positive, and RP for negative opinions.
Chapter
In recent years social media sites become very popular communication tools among Internet users where a significant amount of information is exchanged via computers, smart phones, etc. Internet now is not only a source of information for users to search for; regular users are now a major source of Internet information; where now regular people post daily life activities, share online pictures, and express their opinions about products, news, political debates, etc. Such noticed growing of opinion-rich resources along with user-generated content makes it worthwhile to use information technologies to collect, analyze, and understand human factors and behaviors. This chapter covers three main sections where the first section introduces the field of opinion mining in general along with a detailed exploration of its definitions and goals. Then a discussion of opinion mining related challenges is presented in the second section. The last section explores opinion mining available approaches along with possible future directions.
Article
Full-text available
We investigate the accuracy of a set of surface patterns in identifying ironic sentences in comments submitted by users to an on-line newspaper. The initial focus is on identifying irony in sentences containing positive predicates since these sentences are more exposed to irony, making their true polarity harder to recognize. We show that it is possible to find ironic sentences with relatively high precision (from 45% to 85%) by exploring certain oral or gestural clues in user comments, such as emoticons, onomatopoeic expressions for laughter, heavy punctuation marks, quotation marks and positive interjections. We also demonstrate that clues based on deeper linguistic information are relatively inefficient in capturing irony in user-generated content, which points to the need for exploring additional types of oral clues.
Article
Full-text available
This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not rec- ommended (thumbs down). The classifi- cation of a review is predicted by the average semantic orientation of the phrases in the review that contain adjec- tives or adverbs. A phrase has a positive semantic orientation when it has good as- sociations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual infor- mation between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic ori- entation of its phrases is positive. The al- gorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The ac- curacy ranges from 84% for automobile reviews to 66% for movie reviews.
Article
Full-text available
We explore the adaptation of English resources and techniques for text sentiment analysis to a new language, Spanish. Our main focus is the modification of an existing English semantic orientation calculator and the building of dictionaries; however we also compare alternate approaches, including machine translation and Support Vector Machine classification. The results indicate that, although language-independent methods provide a decent baseline performance, there is also a significant cost to automation, and thus the best path to long-term improvement is through the inclusion of language-specific knowledge and resources.
Article
Full-text available
We describe a pattern-based system for polarity classification from texts. Our system is currently restricted to the positive, negative or neutral po-larity of phrases and sentences. It analyses the input texts with the aid of a polarity lexicon that specifies the prior polarity of words. A chunker is used to determine phrases that are the basis for a compositional treatment of phrase-level po-larity assignment. In our current experiments we focus on sentences that are targeted towards per-sons, be it the writer (I, my, me, ..), the social group including the writer (we, our, ..) or the reader (you, your, ..). We evaluate our system on a manually annotated set of sentences taken from texts from a panel group called 'I battle depression'. We present the results of comparing our system's performance over this gold standard against a baseline system.
Article
Full-text available
Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or demote some target products. For reviews to reflect genuine user experiences and opinions, such spam reviews should be detected. Prior works on opinion spam focused on detecting fake reviews and individual fake reviewers. However, a fake reviewer group (a group of reviewers who work collaboratively to write fake reviews) is even more damaging as they can take total control of the sentiment on the target product due to its size. This paper studies spam detection in the collaborative setting, i.e., to discover fake reviewer groups. The proposed method first uses a frequent itemset mining method to find a set of candidate groups. It then uses several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detect fake reviewer groups. Additionally, we also built a labeled dataset of fake reviewer groups. Although labeling individual fake reviews and reviewers is very hard, to our surprise labeling fake reviewer groups is much easier. We also note that the proposed technique departs from the traditional supervised learning approach for spam detection because of the inherent nature of our problem which makes the classic supervised learning approach less effective. Experimental results show that the proposed method outperforms multiple strong baselines including the state-of-the-art supervised classification, regression, and learning to rank algorithms.
Conference Paper
Full-text available
Today, online stores collect a lot of customer feedback in the form of surveys, reviews, and comments. This feedback is categorized and in some cases responded to, but in general it is underutilized - even though customer satisfaction is essential to the success of their business. In this paper, we introduce several new techniques to interactively analyze customer comments and ratings to determine the positive and negative opinions expressed by the customers. First, we introduce a new discrimination-based technique to automatically extract the terms that are the subject of the positive or negative opinion (such as price or customer service) and that are frequently commented on. Second, we derive a Reverse-Distance-Weighting method to map the attributes to the related positive and negative opinions in the text. Third, the resulting high-dimensional feature vectors are visualized in a new summary representation that provides a quick overview. We also cluster the reviews according to the similarity of the comments. Special thumbnails are used to provide insight into the composition of the clusters and their relationship. In addition, an interactive circular correlation map is provided to allow analysts to detect the relationships of the comments to other important attributes and the scores. We have applied these techniques to customer comments from real-world online stores and product reviews from web sites to identify the strength and problems of different products and services, and show the potential of our technique.
Conference Paper
Full-text available
In recent years, opinion mining attracted a great deal of research attention. However, limited work has been done on detecting opinion spam (or fake reviews). The problem is analogous to spam in Web search [1, 9 11]. However, review spam is harder to detect because it is very hard, if not impossible, to recognize fake reviews by manually reading them [2]. This paper deals with a restricted problem, i.e., identifying unusual review patterns which can represent suspicious behaviors of reviewers. We formulate the problem as finding unexpected rules. The technique is domain independent. Using the technique, we analyzed an Amazon.com review dataset and found many unexpected rules and rule groups which indicate spam activities.
Conference Paper
Full-text available
This paper aims to detect users generating spam reviews or review spammers. We identify several characteristic behaviors of review spammers and model these behaviors so as to detect the spammers. In particular, we seek to model the following behaviors. First, spammers may target specific products or product groups in order to maximize their impact. Second, they tend to deviate from the other reviewers in their ratings of products. We propose scoring methods to measure the degree of spam for each reviewer and apply them on an Amazon review dataset. We then select a subset of highly suspicious reviewers for further scrutiny by our user evaluators with the help of a web based spammer evaluation software specially developed for user evaluation experiments. Our results show that our proposed ranking and supervised methods are effective in discovering spammers and outperform other baseline method based on helpfulness votes alone. We finally show that the detected spammers have more significant impact on ratings compared with the unhelpful reviewers.
Conference Paper
Full-text available
This paper presents a parse-and-paraphrase pa- radigm to assess the degrees of sentiment for product reviews. Sentiment identification has been well studied; however, most previous work provides binary polarities only (positive and negative), and the polarity of sentiment is simply reversed when a negation is detected. The extraction of lexical features such as uni- gram/bigram also complicates the sentiment classification task, as linguistic structure such as implicit long-distance dependency is often disregarded. In this paper, we propose an ap- proach to extracting adverb-adjective-noun phrases based on clause structure obtained by parsing sentences into a hierarchical represen- tation. We also propose a robust general solu- tion for modeling the contribution of adver- bials and negation to the score for degree of sentiment. In an application involving extract- ing aspect-based pros and cons from restaurant reviews, we obtained a 45% relative improve- ment in recall through the use of parsing me- thods, while also improving precision.
Conference Paper
Full-text available
Identifying domain-dependent opinion words is a key problem in opinion mining and has been studied by several researchers. However, existing work has been focused on adjectives and to some extent verbs. Limited work has been done on nouns and noun phrases. In our work, we used the feature-based opinion mining model, and we found that in some domains nouns and noun phrases that indicate product features may also imply opinions. In many such cases, these nouns are not subjective but objective. Their involved sentences are also objective sentences and imply positive or negative opinions. Identifying such nouns and noun phrases and their polarities is very challenging but critical for effective opinion mining in these domains. To the best of our knowledge, this problem has not been studied in the literature. This paper proposes a method to deal with the problem. Experimental results based on real-life datasets show promising results.
Conference Paper
Full-text available
SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. It lists positive and negative sentiment bearing words weighted within the interval of ( 1;1) plus their part of speech tag, and if applicable, their inflections. The current version of SentiWS (v1.8b) contains 1,650 negative and 1,818 positive words, which sum up to 16,406 positive and 16,328 negative word forms, respectively. It not only contains adjectives and adverbs explicitly expressing a sentiment, but also nouns and verbs implicitly containing one. The present work describes the resource's structure, the three sources utilised to assemble it and the semi-supervised method incorporated to weight the strength of its entries. Furthermore the resource's contents are extensively evaluated using a German-language evaluation set we constructed. The evaluation set is verified being reliable and its shown that SentiWS provides a beneficial lexical resource for German-language sentiment analysis related tasks to build on.
Conference Paper
Full-text available
In this work we present SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications. SENTIWORDNET 3.0 is an improved version of SENTIWORDNET 1.0, a lexical resource publicly available for research purposes, now currently licensed to more than 300 research groups and used in a variety of research projects worldwide. Both SENTIWORDNET ...
Conference Paper
Full-text available
In this paper, we propose GermanPolarityClues, a new publicly available lexical resource for sentiment analysis for the German language. While sentiment analysis and polarity classification has been extensively studied at different document levels (e.g. sentences and phrases), only a few approaches explored the effect of a polarity-based feature selection and subjectivity resources for the German language. This paper evaluates four different English and three different German sentiment resources in a comparative manner by combining a polarity-based feature selection with SVM-based machine learning classifier. Using a semi-automatic translation approach, we were able to construct three different resources for a German sentiment analysis. The manually finalized GermanPolarityClues dictionary offers thereby a number of 10;141 polarity features, associated to three numerical polarity scores, determining the positive, negative and neutral direction of specific term features. While the results show that the size of dictionaries clearly correlate to polarity-based feature coverage, this property does not correlate to classification accuracy. Using a polarity-based feature selection, considering a minimum amount of prior polarity features, in combination with SVM-based machine learning methods exhibits for both languages the best performance (F1: 0.83-0.88).
Article
Full-text available
Customer reviews are increasingly available online for a wide range of products and services. They supplement other information provided by electronic storefronts such as product descriptions, reviews from experts, and personalized advice generated by automated recommendation systems. While researchers have demonstrated the benefits of the presence of customer reviews to an online retailer, a largely uninvestigated issue is what makes customer reviews helpful to a consumer in the process of making a purchase decision. Drawing on the paradigm of search and experience goods from information economics, we develop and test a model of customer review helpfulness. An analysis of 1,587 reviews from Amazon.com across six products indicated that review extremity, review depth, and product type affect the perceived helpfulness of the review. Product type moderates the effect of review extremity on the helpfulness of the review. For experience goods, reviews with extreme ratings are less helpful than reviews with moderate ratings. For both product types, review depth has a positive effect on the helpfulness of the review, but the product type moderates the effect of review depth on the helpfulness of the review. Review depth has a greater positive effect on the helpfulness of the review for search goods than for experience goods. We discuss the implications of our findings for both theory and practice.
Conference Paper
Full-text available
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, it proposes a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is also implemented. The system is such that with a single glance of its visualization, the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features. This comparison is useful to both potential customers and product manufacturers. For a potential customer, he/she can see a visual side-by-side and feature-by-feature comparison of consumer opinions on these products, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Second, a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews. Such features form the basis for the above comparison. Experimental results show that the technique is highly effective and outperform existing methods significantly.
Conference Paper
Full-text available
Article
Full-text available
The Penn Treebank has recently implemented a new syntactic annotation scheme, designed to highlight aspects of predicate-argument structure. This paper discusses the implementation of crucial aspects of this new annotation scheme. It incorporates a more consistent treatment of a wide range of grammatical phenomena, provides a set of coindexed null elements in what can be thought of as "underlying " position for phenomena such as wh-movement, passive, and the subjects of infinitival constructions, provides some non-context free annotational mechanism to allow the structure of discontinuous constituents to be easily recovered, and allows for a clear, concise tagging system for some semantic roles. 1. INTRODUCTION During the first phase of the The Penn Treebank project [?], ending in December 1992, 4.5 million words of text were tagged for part-of-speech, with about two-thirds of this material also annotated with a skeletal syntactic bracketing. All of this material has been hand correc...
Article
Sentiment analysis or opinion mining is the computational study of people's opinions, appraisals, attitudes, and emotions toward entities, individuals, issues, events, topics and their attributes. The task is technically challenging and practically very useful. For example, businesses always want to find public or consumer opinions about their products and services. Potential customers also want to know the opinions of existing users before they use a service or purchase a product. With the explosive growth of social media (i.e., reviews, forum discussions, blogs and social networks) on the Web, individuals and organizations are increasingly using public opinions in these media for their decision making. However, finding and monitoring opinion sites on the Web and distilling the information contained in them remains a formidable task because of the proliferation of diverse sites. Each site typically contains a huge volume of opinionated text that is not always easily deciphered in long forum postings and blogs. The average human reader will have difficulty identifying relevant sites and accurately summarizing the information and opinions contained in them. Moreover, it is also known that human analysis of text information is subject to considerable biases, e.g., people often pay greater attention to opinions that are consistent with their own preferences. People also have difficulty, owing to their mental and physical limitations, producing consistent results when the amount of information to be processed is large. Automated opinion mining and summarization systems are thus needed, as subjective biases and mental limitations can be overcome with an objective sentiment analysis system. In the past decade, a considerable amount of research has been done in academia [58,76]. There are also numerous commercial companies that provide opinion mining services. In this chapter, we first define the opinion mining problem. From the definition, we will see the key technical issues that need to be addressed. We then describe various key mining tasks that have been studied in the research literature and their representative techniques. After that, we discuss the issue of detecting opinion spam or fake reviews. Finally, we also introduce the research topic of assessing the utility or quality of online reviews. © 2012 Springer Science+Business Media, LLC. All rights reserved.
Article
We have manually curated a polarity lexicon for German, comprising word polarities and polarity strength values of about 8,000 words: nouns, verbs and adjectives. The decisions were primarily carried out using the synsets from GermaNet, a WordNet-like lexical database. In an evaluation on German novels, it turned out that the stock of adjectives was too small. We carried out experiments to automatically learn new subjective adjectives together with their polarity orientation and polarity strength. For this purpose, we applied a corpus-based approach that works with pairs of coordinated adjectives extracted from a large German newspaper corpus. In the context of this work, we evaluated two subtasks in detail. First, how good are we at reproducing the polarity classification – including our three- level strength measure – contained in our initial lexicon by machine learning methods. Second, because adding of training material did not improve the results at the expected rate, we evaluated the human intercoder agreement on polarity classifications in an experiment. The results show that judgements about the strength of polarity do vary considerably between different persons. Given these problems related to the design and automatic augmentation of polarity lexicons, we have successfully experimented with a semi-automatically approach where a list of reliable candidate words (here: adjectives) is generated to ease the manual annotation process.
Article
Introduction While product review systems that collect and disseminate opinions about products from recent buyers (Table 1) are valuable forms of word-of-mouth communication, evidence suggests that they are overwhelmingly positive. Kadet notes that most products receive almost five stars. Chevalier and Mayzlin also show that book reviews on Amazon and Barnes & Noble are overwhelmingly positive. Is this because all products are simply outstanding? However, a graphical representation of product reviews reveals a J-shaped distribution (Figure 1) with mostly 5-star ratings, some 1-star ratings, and hardly any ratings in between. What explains this J-shaped distribution? If products are indeed outstanding, why do we also see many 1-star ratings? Why aren't there any product ratings in between? Is it because there are no "average" products? Or, is it because there are biases in product review systems? If so, how can we overcome them? The J-shaped distribution also creates some fundamental statistical problems. Conventional wisdom assumes that the average of the product ratings is a sufficient proxy of product quality and product sales. Many studies used the average of product ratings to predict sales. However, these studies showed inconsistent results: some found product reviews to influence product sales, while others did not. The average is statistically meaningful only when it is based on a unimodal distribution, or when it is based on a symmetric bimodal distribution. However, since product review systems have an asymmetric bimodal (J-shaped) distribution, the average is a poor proxy of product quality. This report aims to first demonstrate the existence of a J-shaped distribution, second to identify the sources of bias that cause the J-shaped distribution, third to propose ways to overcome these biases, and finally to show that overcoming these biases helps product review systems better predict future product sales. We tested the distribution of product ratings for three product categories (books, DVDs, videos) with data from Amazon collected between February--July 2005: 78%, 73%, and 72% of the product ratings for books, DVDs, and videos are greater or equal to four stars (Figure 1), confirming our proposition that product reviews are overwhelmingly positive. Figure 1 (left graph) shows a J-shaped distribution of all products. This contradicts the law of "large numbers" that would imply a normal distribution. Figure 1 (middle graph) shows the distribution of three randomly-selected products in each category with over 2,000 reviews. The results show that these reviews still have a J-shaped distribution, implying that the J-shaped distribution is not due to a "small number" problem. Figure 1 (right graph) shows that even products with a median average review (around 3-stars) follow the same pattern.
Article
Online word-of-mouth communication in the form of product reviews is a major information source for consumers and marketers about product quality. The literature has used the mean of online reviews to predict product sales, assuming that the mean reflects product quality. However, using a combination of econometric, experimental, and analytical results, we show that the mean is a biased estimator of product quality due to two self-selection biases (purchasing and under-reporting bias). First, econometric results with secondary data from Amazon.com show that almost all products have an asymmetric bimodal (J-shaped) distribution with more positive than negative reviews. Second, experimental results where all respondents wrote reviews show that their reviews have an approximately normal distribution with roughly equal number of positive and negative reviews. This implies two biases: (1) purchasing bias - only consumers with favorable disposition towards a product purchase the product and have the opportunity to write a product review, and (2) under-reporting bias - consumers with polarized (either positive or negative) reviews are more likely to report their reviews than consumers with moderate reviews. This results in a J-shaped distribution of online product reviews that renders the mean a biased estimator of product quality. Third, we develop an analytical model to derive the conditions for the mean to become an unbiased estimator of product quality. Based on these conditions, we build a new model that integrates three distributional parameters - mean, standard deviation, and the two modes of the online product reviews (to overcome under-reporting bias) and product price (to overcome purchasing bias). This model is shown to be a superior predictive model of future product sales compared to competing models.
Article
The sentiment detection of texts has been witnessed a booming interest in recent years, due to the increased availability of online reviews in digital form and the ensuing need to organize them. Till to now, there are mainly four different problems predominating in this research community, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction. In fact, there are inherent relations between them. Subjectivity classification can prevent the sentiment classifier from considering irrelevant or even potentially misleading text. Document sentiment classification and opinion extraction have often involved word sentiment classification techniques. This survey discusses related issues and main approaches to these problems.
Conference Paper
Current opinion lexicons contain most of the common opinion words, but they miss slang and so-called urban opinion words and phrases (e.g. delish, cozy, yummy, nerdy, and yuck). These subjectivity clues are frequently used in community questions and are useful for opinion question analysis. This paper introduces a principled approach to constructing an opinion lexicon for community-based question answering (cQA) services. We formulate the opinion lexicon induction as a semi-supervised learning task in the graph context. Our method makes use of existing opinion words to extract new opinion entities (slang and urban words/phrases) from community questions. It then models the opinion entities in a graph context to learn the polarity of the new opinion entities based on the graph connectivity information. In contrast to previous approaches, our method not only learns such polarities from the labeled data but also from the unlabeled data and is more feasible in the web context where the dictionary-based relations (such as synonym, antonym, or hyponym) between most words are not available for constructing a high quality graph. The experiments show that our approach is effective both in terms of the quality of the discovered new opinion entities as well as its ability in inferring their polarity. Furthermore, since the value of opinion lexicons lies in their usefulness in applications, we show the utility of the constructed lexicon in the sentiment classification task.
Conference Paper
Determining the polarity of a sentiment- bearing expression requires more than a sim- ple bag-of-words approach. In particular, words or constituents within the expression can interact with each other to yield a particu- lar overall polarity. Inthispaper, weview such subsentential interactions in light of composi- tionalsemantics, and present a novel learning- based approach that incorporatesstructural in- ference motivated by compositional seman- tics into the learning procedure. Our exper- iments show that (1) simple heuristics based on compositional semantics can perform bet- ter than learning-based methodsthat do not in- corporate compositional semantics (accuracy of 89.7% vs. 89.1%), but (2) a method that integrates compositional semantics into learn- ing performs better than all other alterna- tives (90.7%). We also find that "content- word negators", not widely employed in pre- vious work, play an important role in de- termining expression-level polarity. Finally, in contrast to conventional wisdom, we find that expression-level classification accuracy uniformly decreases as additional, potentially disambiguating, context is considered.
Conference Paper
We propose a method for extracting se- mantic orientations of words: desirable or undesirable. Regarding semantic ori- entations as spins of electrons, we use the mean field approximation to compute the approximate probability function of the system instead of the intractable ac- tual probability function. We also pro- pose a criterion for parameter selection on the basis of magnetization. Given only a small number of seed words, the pro- posed method extracts semantic orienta- tions with high accuracy in the exper- iments on English lexicon. The result is comparable to the best value ever re- ported.
Conference Paper
We examine the viability of building large polarity lexicons semi-automatically from the web. We begin by describing a graph propa- gation framework inspired by previous work on constructing polarity lexicons from lexi- cal graphs (Kim and Hovy, 2004; Hu and Liu, 2004; Esuli and Sabastiani, 2009; Blair- Goldensohn et al., 2008; Rao and Ravichan- dran, 2009). We then apply this technique to build an English lexicon that is signifi- cantly larger than those previously studied. Crucially, this web-derived lexicon does not require WordNet, part-of-speech taggers, or other language-dependent resources typical of sentiment analysis systems. As a result, the lexicon is not limited to specific word classes - e.g., adjectives that occur in WordNet - and in fact contains slang, misspellings, multi- word expressions, etc. We evaluate a lexicon derived from English documents, both qual- itatively and quantitatively, and show that it provides superior performance to previously studied lexicons, including one derived from WordNet.
Conference Paper
Social Media is becoming major and popular technological platform that allows users discussing and sharing information. Information is generated and managed through either computer or mobile devices by one person and consumed by many other persons. Most of these user generated content are textual information, as Social Networks(Face book, Linked In), Microblogging(Twitter), blogs(Blogspot, Word press). Looking for valuable nuggets of knowledge, such as capturing and summarizing sentiments from these huge amount of data could help users make informed decisions. In this paper, we develop a sentiment identification system called SES which implements three different sentiment identification algorithms. We augment basic compositional semantic rules in the first algorithm. In the second algorithm, we think sentiment should not be simply classified as positive, negative, and objective but a continuous score to reflect sentiment degree. All word scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, we propose a third algorithm which takes emoticons, negation word position, and domain-specific words into account. Furthermore, a machine learning model is employed on features derived from outputs of three algorithms. We conduct our experiments on user comments from Face book and tweets from twitter. The results show that utilizing Random Forest will acquire a better accuracy than decision tree, neural network, and logistic regression. We also propose a flexible way to represent document sentiment based on sentiments of each sentence contained. SES is available online.
Conference Paper
A prototype digital library of social media content was developed to present a summarized view of public opinion in a visual interface. The domain of the study was movie reviews of multiple genres harvested from weblogs, discussion boards, user and critic review Web sites, and Twitter. The system performs fine-grained analysis to determine both the sentiment orientation and sentiment strength of the reviewer towards various aspects of a movie, such as overall opinion, director, cast, story, scene, and music. Various visual interface components were developed to present an overview of public opinion on multiple aspects of each movie, and a usability evaluation was conducted to observe their effectiveness. Aspect-based sentiment summarization interface has the highest score for usefulness while a sentiment link analysis graph visualizing how positive and negative sentiment terms are associated with review aspects has the highest score for overall rating.
Conference Paper
It is now a common practice for e-commerce Web sites to enable their customers to write reviews of products that they have purchased. Such reviews provide valuable sources of information on these products. They are used by potential customers to find opinions of existing users before deciding to purchase a product. They are also used by product manufacturers to identify problems of their products and to find competitive intelligence information about their competitors. Unfortunately, this importance of reviews also gives good incentive for spam, which contains false positive or malicious negative opinions. In this paper, we make an attempt to study review spam and spam detection. To the best of our knowledge, there is still no reported study on this problem.
Article
Because meaningful sentences are composed of meaningful words, any system that hopes to process natural languages as people do must have information about words and their meanings. This information is traditionally provided through dictionaries, and machine-readable dictionaries are now widely available. But dictionary entries evolved for the convenience of human readers, not for machines. WordNet ¹ provides a more effective combination of traditional lexicographic information and modern computing. WordNet is an online lexical database designed for use under program control. English nouns, verbs, adjectives, and adverbs are organized into sets of synonyms, each representing a lexicalized concept. Semantic relations link the synonym sets [4].
Conference Paper
Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces OPINE, an unsupervised informationextraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products. Compared to previous work, OPINE achieves 22
Article
This paper presents a bootstrapping process that learns linguistically rich extraction patterns for subjective (opinionated) expressions. High-precision classifiers label unannotated data to automatically create a large training set, which is then given to an extraction pattern learning algorithm. The learned patterns are then used to identify more subjective sentences. The bootstrapping process learns many subjective patterns and increases recall while maintaining high precision.
Article
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.