Article

Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Aspect extraction is one of the fundamental steps in analyzing the characteristics of opinions, feelings and emotions expressed in textual data provided for a certain topic. Current aspect extraction techniques are mostly based on topic models; however, employing only topic models causes incoherent aspects to be generated. Therefore, this paper aims to discover more precise aspects by incorporating co-occurrence relations as prior domain knowledge into the Latent Dirichlet Allocation (LDA) topic model. In the proposed method, first, the preliminary aspects are generated based on LDA. Then, in an iterative manner, the prior knowledge is extracted automatically from co-occurrence relations and similar aspects of relevant topics. Finally, the extracted knowledge is incorporated into the LDA model. The iterations improve the quality of the extracted aspects. The competence of the proposed ELDA for the aspect extraction task is evaluated through experiments on two datasets in the English and Persian languages. The experimental results indicate that ELDA not only outperforms the state-of-the-art alternatives in terms of topic coherence and precision, but also has no particular dependency on the written language and can be applied to all languages with reasonable accuracy. Thus, ELDA can impact natural language processing applications, particularly in languages with limited linguistic resources.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Aspect-based sentiment classification methods [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31] not only assist in extracting the various aspects mentioned in the reviews but also help in classifying each aspect into positive or negative polarity. Take an example of an opinion 'Food is delicious but service is ordinary'. ...
... In the existing aspect identification methods [14][15][16][17][18][19][20][21][22][23][24], there is not a single method which deals with all the three aforementioned aspect identification issues effectively. For each issue, separate methods have been proposed in the literature, and there is a dire need of a method that can cater these issues together, with reasonable effectiveness. ...
... Similarly, Shams et al. [24] proposed an enriched latent Dirichlet allocation (LDA) model to discover more precise aspects from reviews by incorporating co-occurrence relations as prior domain knowledge into the LDA topic model. In the proposed method, first, the preliminary aspects are generated based on LDA. ...
Article
Full-text available
With the increase of online tourists reviews, discovering sentimental idea regarding a tourist place through the posted reviews is becoming a challenging task. The presence of various aspects discussed in user reviews makes it even harder to accurately extract and classify the sentiments. Aspect-based sentiment analysis aims to extract and classify user’s positive or negative orientation towards each aspect. Although several aspect-based sentiment classification methods have been proposed in the past, limited work has been targeted towards the automatic extraction of implicit, infrequent and co-referential aspects. Moreover, existing methods lack the ability to accurately classify the overall polarity of multi-aspect sentiments. This study aims to develop a predictive framework for aspect-based extraction and classification. The proposed framework utilises the semantic relations among review phrases to extract implicit and infrequent aspects for accurate sentiment predictions. Experiments have been performed using real-world data sets crawled from predominant tourist websites such as TripAdvisor and OpenTable. Experimental results and comparison with previously reported findings prove that the predictive framework not only extracts the aspects effectively but also improves the prediction accuracy of aspects.
... S ENTIMENT analysis, also referred to opinion mining, is the process of analyzing the characteristics of opinions, feelings and emotions expressed in textual data provided for a certain topic or object [1]. Sentiment analysis is widely used by companies to promote their services and products' sales [2]- [4]. In commercial business, sentiment analysis systematically models the users' requirements and views which in turn contributes to the organization's perception of the customer service. ...
... It is obvious that determining these aspects depends on the domain and may vary in each dataset. (2) The opinion words of each aspect are determined and separated from aspect words. For instance, in the aspect of price, "reasonable" and "free" are positive, whereas "expensive" and "waste" are negative words. ...
... Also, aspect means a collection of informative words belonging to each subtopic/property of a topic or product. To extract aspects ELDA, a method presented by the authors in [2], is employed. • After extracting aspects and the polarity lexicon related to each language, in the second step, probability of any word associated to each aspect/sentiment is calculated in an expectation-maximization fashion. ...
Article
Full-text available
Understanding “what others think” is one of the most eminent pieces of knowledge in the decision-making process required in a wide spectrum of applications. The procedure of obtaining knowledge from each aspect (property) of users’ opinions is called aspect-based sentiment analysis which consists of three core sub-tasks: aspect extraction, aspect and opinion-words separation, and aspect-level polarity classification. Most successful approaches proposed in this area require a set of primary training or extensive linguistic resources, which makes them relatively costly and time consuming in different languages. To overcome the aforementioned challenges, we propose an unsupervised paradigm for aspect-based sentiment analysis, which is not only simple to use in different languages, but also holistically performs the subtasks for aspect-based sentiment analysis. Our methodology relies on three coarse-grained phases which are partitioned to manifold fine-grained operations. The first phase extracts the prior domain knowledge from dataset through selecting the preliminary polarity lexicon and aspect word sets, as representative of aspects. These two resources, as primitive knowledge, are assigned to an expectation-maximization algorithm to identify the probability of any word based on the aspect and sentiment. To determine the polarity of any aspect in the final phase, the document is firstly broken down to its constituting aspects and the probability of each aspect/polarity based on the document is calculated. To evaluate this method, two datasets in the English and Persian languages are used and the results are compared with various baselines. The experimental results show that the proposed method outperforms the baselines in terms of aspect, opinionword extraction and aspect-level polarity classification.
... TABLE I Linguistic related challenges in Persian sentences for aspect extraction process [3,4,14] Shams and Baraani (2017) worked on aspect extraction by proposing ELDA that works based on topic modeling algorithm of LDA and utilization of a prior knowledge of those similar aspects that occur together on the related topics in an iterative manner. They evaluated their algorithm on English and Persian languages [8]. suggested a hybrid supervised approach of machine learning and rules to carry out the tasks of ATEs 5 and OTEs 6 . ...
... Their network was constructed based on different information from labelled and unlabeled aspects and linguistic features [11]. 8 9 for recommendation systems. In this method they extracted product' aspects and associated weights through a deep learning method and then merged them into a CF 10 filtering method. ...
... We evaluate our approach in extraction of multi-word aspects separately. Figure 16 depicts the comparison of our approach to four methods of LRT-based [3], SAM [4], ELDA [8] and MMI-based [14] in extracting three forms of multi- word aspects since the aim of these approaches are extraction multi-word aspects in Persian language from different perspectives. As could be seen, the results of our method outstrip the results of other four approaches in detecting the type C multi-word aspects. ...
Article
Full-text available
Due to the remarkable increase in e-commerce transactions, people try to have an appropriate choice of purchase through considering other people’s reflected experience in product’s or service’s reviews. Automatic analysis of such corpus requires enhanced developed algorithms based on natural language processing and opinion mining. Moreover, the linguistic differences make extending existing algorithms from one language to another challenging and in some cases impossible. Opinion mining focuses on different subjects of review analysis such as spam detection, aspect elicitation and polarity allocation. In this paper, we focus on detection of explicit aspect and propose a methodology to overcome some difficult and problematic aspect compounds in the form of multi- words format in Persian language. Our approach proposes the construction of a directed weighted graph (ADG structure) based on some yielded information from FP-Growth frequent pattern identification algorithm on our corpus of Persian sentence. Traversing some special paths within the ADG graph according to our developed rules could lead us to the extraction of problematic multi-word aspects. We utilize Neo4j NoSQL graph database environment and its Cypher query language in order to create the ADG graph and access the desired paths that reflects our developed rules on the ADG structure which lead us to extract the multi-word aspects. The evaluation of our methodology with the existing approaches on the issue of aspect derivation in Persian language including ELDA, SAM, an MMI-based and an LRT-based algorithms indicates the robustness of our approach.
... Aspect extraction is commonly done by using a supervised or an unsupervised approach. The unsupervised approach includes methods, such as bootstrapping-based extraction (Wang et al., 2010), syntactic rules-based extraction (Poria, Cambria, Ku, Gui, & Gelbukh, 2014), frequent pattern mining (Jeyapriya & Selvi, 2015), topic modeling (Shams & Baraani-Dastjerdi, 2017), etc. Wang et al. (2010) proposed a probabilistic rating regression model. Given a few seed words describing the aspects, a bootstrapping-based algorithm was employed to identify the major aspects and split comments. ...
... Jeyapriya and Selvi (2015) used frequent itemsets to mine aspects, where frequent itemset mining is used to find all frequent itemsets using minimum support count. Shams and Baraani-Dastjerdi (2017) proposed a method based on topic model. It discovered more precise aspects by incorporating co-occurrence relations as prior domain knowledge into the latent dirichlet allocation (LDA) topic model. ...
Article
Sentiment mining has been a helpful mechanism that targets to understand the market feedback on certain commodities by utilizing user comments. In general, the process of yielding each comment is essentially associated with his/her criteria for rating (i.e., the degree of harshness) , which makes users provide biased comments. For instance, for a tolerant user, although the user is extremely dissatisfied with the product, harshness still makes her yield a neutral comment which can not indicate the product quality. Existing work straightforwardly removes the comments of harsh users and those of tolerant ones, which is not the best strategy. To this end, we propose a harshness-aware sentiment analysis framework for product review. First, we depict the process of providing comments from users as a probabilistic graphical model in which the harshness is incorporated. Second, we employ a Bayesian-based inference for sentiment mining. Extensive experimental evaluations have shown that the results of the proposed method are more consistent with the expert evaluations than those of the state-of-the-art methods, and even outperform the method which infers the final evaluations with the ground truth of comments without considering users’ harshness.
... Deep learning provides an approach to utilizing large volumes of calculation and data using little manual engineering. Recently, deep learning approaches to analyzing emotions have reached a considerable triumph [29,30,47,77]. Optimization methods have developed significantly in recent years [31][32][33][34][35][36][37]. ...
Article
Full-text available
Latent Dirichlet Allocation (LDA) is an approach to unsupervised learning that aims to investigate the semantics among words in a document as well as the influence of a subject on a word. As an LDA-based model, Joint Sentiment-Topic (JST) examines the impact of topics and emotions on words. The emotion parameter is insufficient, and additional parameters may play valuable roles in achieving better performance. In this study, two new topic models, Weighted Joint Sentiment-Topic (WJST) and Weighted Joint Sentiment-Topic 1 (WJST1), have been presented to extend and improve JST through two new parameters that can generate a sentiment dictionary. In the proposed methods, each word in a document affects its neighbors, and different words in the document may be affected simultaneously by several neighbor words. Therefore, proposed models consider the effect of words on each other, which, from our view, is an important factor and can increase the performance of baseline methods. Regarding evaluation results, the new parameters have an immense effect on model accuracy. While not requiring labeled data, the proposed methods are more accurate than discriminative models such as SVM and logistic regression in accordance with evaluation results. The proposed methods are simple with a low number of parameters. While providing a broad perception of connections between different words in documents of a single collection (single-domain) or multiple collections (multidomain), the proposed methods have prepared solutions for two different situations (single-domain and multidomain). WJST is suitable for multidomain datasets, and WJST1 is a version of WJST which is suitable for single-domain datasets. While being able to detect emotion at the level of the document, the proposed models improve the evaluation outcomes of the baseline approaches. Thirteen datasets with different sizes have been used in implementations. In this study, perplexity, opinion mining at the level of the document, and topic_coherency are employed for assessment. Also, a statistical test called Friedman test is used to check whether the results of the proposed models are statistically different from the results of other algorithms. As can be seen from results, the accuracy of proposed methods is above 80% for most of the datasets. WJST1 achieves the highest accuracy on Movie dataset with 97 percent, and WJST achieves the highest accuracy on Electronic dataset with 86 percent. The proposed models obtain better results compared to Adaptive Lexicon learning using Genetic Algorithm (ALGA), which employs an evolutionary approach to make an emotion dictionary. Results show that the proposed methods perform better with different topic number settings, especially for WJST1 with 97% accuracy at |Z| = 5 on the Movie dataset.
... The fourth method is topic modeling, a plethora of probabilistic topic modeling methods conducted the task of aspect-based sentiment analysis ABSA [15]. The critical factor of topic modeling that made it so vigorous, can be used to extract and categorize the similar aspects into their related clusters concurrently [16][17][18][19][20]. ...
... MI metrics are used to exclude irrelevant knowledge. Enriched LDA (ELDA) (Shams and Baraani-Dastjerdi 2017) was proposed by Shams et al.. Combining the LDA topic model with word co-occurrence, the prior knowledge of similar aspects is extracted in each iteration of the algorithm based on word co-occurrence. Thus, the knowledge set is added to the LDA model. ...
Article
Full-text available
The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.
... The graph construction in this paper departs from the existing research by relying on hierarchical extraction of nodes based on a specific threshold, ensuring that the extracted feature terms will be highly dependent on each other. Moreover, unlike the single relationship between feature terms [19,38,50] used in the existing research, the correlation measurement for feature terms in this paper takes into account both content and context, comprehensively representing the correlation between feature terms by synthesizing different background information. ...
Article
Topic mining of scientific literature can accurately capture the contextual structure of a topic, track research hotspots within a field, and improve the availability of information about the literature. This paper introduces a multi-dimensional topic mining method based on a hierarchical semantic graph model. The main innovations include (1) the hierarchical extraction of feature terms and construction of a corresponding semantic graph and (2) multi-dimensional topic mining based on graph segmentation and structure analysis. The process of semantic graph construction is based primarily on hierarchical feature term extraction, which can effectively reveal the hierarchical structural distribution of feature terms within documents. Our graph model also takes into account the complementarity of content- and context-related feature terms in documents while avoiding the loss of textual information. In addition, the multi-dimensional features of the topic can be mined effectively via an in-depth analysis of the constructed graph, resulting in a quantitative visualization of the many-to-many association between the topic and feature terms. A variety of experiments on existing document datasets demonstrate that the proposed approach is able to outperform state-of-the-art methods in terms of accuracy and efficacy.
... In this subsection, the effectiveness of sentiment topic captured by proposed models is evaluated using topic coherency value. This metric (Shams & Baraani-Dastjerdi, 2017) can be calculated by Equation (37). ...
Article
Full-text available
One of the main benefits of unsupervised learning is that there is no need for labelled data. As a method of this category, latent Dirichlet allocation (LDA) estimates the semantic relations between the words of the text effectively and can play an important role in solving various issues, including emotional analysis in combination with other parameters. In this study, three novel topic models called date sentiment LDA (DSLDA), author–date sentiment LDA (ADSLDA), and pack–author–date sentiment LDA (PADSLDA) are proposed. The proposed models extend LDA through some extra parameters such as date, author, helpfulness, sentiment, and subtopic. The proposed models use helpfulness in the Gibbs sampling algorithm. Helpfulness is a part of readers who found the review helpful. The proposed models divide the words into two categories: the words more affected by the distribution of subtopic and the words more affected by the main topic. In this study, a new concept called pack is introduced, and a new model called PADSLDA is proposed for sentiment analysis at pack level. The proposed models outperformed the baseline models because according to evaluations results, the extra parameters can appropriately affect the generating process of words in a review. Sentiment analysis at the document level, perplexity, and topic coherence are the main parameters used in the evaluations.
... The topics and keywords obtained by LDA algorithms were extracted based solely on the distribution of the words, thus missing the co-occurrence information (Shams and Baraani-Dastjerdi, 2017). Word co-occurrence refers to the co-occurrence of two words in one sentence. ...
Article
With the increasing demand for fashion products, the textile and apparel industry is facing huge challenges in resource management and environmental regulations. Fashion renting provides an option to reuse clothing products. Besides that, it also fulfills an individual’s fashion needs while reducing the production of new clothes. Though some studies have been done to examine consumers’ opinions of fashion renting, most of these studies employed survey methods and did not utilize actual data of consumers’ real renting experiences. The purpose of this study is to evaluate consumers’ actual renting experiences and to identify the motivations and barriers for those consumers to rent fashion products. A data-mining approach was applied in this study. Consumers’ comments on renting experiences from three fashion rental companies, Rent the Runway (RTR), Gwynnie Bee (GB), and Bag Borrow Steal (BBS), were collected as a reliable data source to dig into and identify consumers’ motivations and concerns. Based on the theory of customer value, both benefits and costs for fashion renting were discovered. In addition, a comparison of the three fashion rental companies was also discussed. This study is the first attempt to use a data-mining method to thoroughly investigate the benefits and costs of fashion online renting through real consumers’ feedback of three different types of rental companies. It provides an in-depth text analysis of consumers’ online fashion renting experiences through the use of Latent Dirichlet Allocation (LDA) topic modeling and word co-occurrence networks.
... To solve the problem of knowledge acquisition and knowledge judgment in early studies, researchers explore to mine knowledge of word-word correlation automatically and measure incorrect knowledge in modeling process based on statistical information of word co-occurrence in corpus (Chen and Liu 2014a, b). These models solve the problems in early knowledge-based topic models and improve the coherence of topic modeling results further (Shams and Baraani-Dastjerdi 2017). However, the limitations are obvious because these models need a large amount of related domain datasets to mine valuable prior knowledge, which is not applicable in practical applications. ...
Article
Full-text available
Topic models have been widely used to infer latent topics in text documents. However, the unsupervised topic models often result in incoherent topics, which always confused users in applications. Incorporating prior domain knowledge into topic models is an effective strategy to extract coherent and meaningful topics. In this paper, we go one step further to explore how different forms of prior semantic relations of words can be encoded into models to improve the performance of topic modeling process. We develop a novel topic model—called Mixed Word Correlation Knowledge-based Latent Dirichlet Allocation—to infer latent topics from text corpus. Specifically, the proposed model mines two forms of lexical semantic knowledge based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate generated prior knowledge, a Mixed Markov Random Field is constructed over the latent topic layer to regularize the topic assignment of each word during the topic sampling process. Experimental results on two public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.
... To model the aspect relationship between different products, Yang [65] assumes that child categories will inherit aspects from parent categories, and the proposed approach incorporates the category hierarchy information into modeling process to enhance the extraction ability. To enhance the coherence of the generated aspect topics, Shams [66] mines knowledge automatically from pre-extracted topics and proposes an ELDA to inject the knowledge into modeling process. ...
Article
Full-text available
With the prevalence of social media and online forum, opinion mining, aiming at analyzing and discovering the latent opinion in user generated reviews on Internet, has become a hot research topic. This survey focuses on two important subtasks in this field, stance detection and product aspect mining, both of which can be formalized as the problem of the triple htarget; aspect; opinioni extraction. In this paper, we firstly introduce the general framework of opinion mining and describe the evaluation metrics. Then, the methodologies for stance detection on different sources such as online forum and social media are discussed. After that, approaches for product aspect mining are categorized into three main groups which are corpus level aspect extraction, corpus level aspect and opinion mining, and document level aspect and opinion mining based on the processing units and tasks. And then we discuss the challenges and possible solutions. Finally, we summarize the evolving trend of the reviewed methodologies and conclude the survey.
... Review Dataset. Shams and Baraani-Dastjerdi [39] used Enriched LDA (ELDA) to extract aspects by combining word co-occurrence as prior knowledge with LDA. The ELDA was evaluated with some English and Persian datasets and showed reasonable accuracy. ...
Article
Latent Dirichlet allocation (LDA) is one of the probabilistic topic models; it discovers the latent topic structure in a document collection. The basic assumption under LDA is that documents are viewed as a probabilistic mixture of latent topics; a topic has a probability distribution over words and each document is modelled on the basis of a bag-of-words model. The topic models such as LDA are sufficient in learning hidden topics but they do not take into account the deeper semantic knowledge of a document. In this article, we propose a novel method based on topic modelling to determine the latent aspects of online review documents. In the proposed model, which is called Concept-LDA, the feature space of reviews is enriched with the concepts and named entities, which are extracted from Babelfy to obtain topics that contain not only co-occurred words but also semantically related words. The performance in terms of topic coherence and topic quality is reported over 10 publicly available datasets, and it is demonstrated that Concept-LDA achieves better topic representations than an LDA model alone, as measured by topic coherence and F-measure. The learned topic representation by Concept-LDA leads to accurate and an easy aspect extraction task in an aspect-based sentiment analysis system.
... Furthermore, Zhang et al. [44] improved the overall system by adding some additional rules. Shams and Baraani-Dastjerdi [45] have combined aspect co-occurrences with latent Dirichlet allocation (LDA) to identify opinion targets. The performance of DP suffers with the size of the data sets. ...
Article
Aspect extraction or opinion target extraction is the key task of sentiment analysis, which aims to identify targets of people’s sentiments. This is the most important task of aspect-based sentiment analysis as without the aspects, there is no much use of extracted opinions. Recent approaches have shown the significance of dependency-based rules for the given task. These rules are heavily dependent on the dependency parser and generated with the help of grammatical rules. In this article, we are proposing to learn from user’s behaviour to identify the relation among aspects and opinions. The use of sequential patterns has been proposed for the extraction of aspects. The key purpose of this research is to study the impact of sequential pattern mining in the phase of aspect extraction. Our experimental results show that the approach proposed in our work produced better results as compared with the state-of-the-art approaches.
... Word network topic model (WNTM) is the hybrid approach which use WCN and topic model for topic detection. Similar approaches have been proposed for short text by Chen and Kao [7] and as Enriched-LDA by Shams and Baraani-Dastjerdi [42]. The author used multiple attribute decision making optimization algorithm analytical hierarchical process on proposed attributes based on WCN to identify influential segment [4] for social media data. ...
Article
The study of structure and dynamics of complex networks is seeking attention of academic researchers and practitioners in recent years. Although Word Co-occurrence Networks (WCN) have been studied for different languages, yet there is the need to study the structure of WCN for microblogs due to the presence of ill-formed and unstructured data. In this research article, existing WCN based applications have been explored and microblog WCN have been analysed for multiple key parameters to uncover the hidden patterns. The key parameters studied for microblogs WCN are scale-free property, small world feature, hierarchical organization, assortativity and spectral analysis. The twitter FSD dataset has been used for experimental results and evaluation. Different mathematical, statistical and graphical interpretations proved that the microblog WCN are different from the WCN of traditional well-formed text. The robustness of the key parameters of microblogs WCN have been explored for keyphrase extraction from domain specific set of microblogs. The baseline methods used for comparisons are TextRank, TopicRank, and NErank. Extensive experiments over standard public dataset proved that the proposed keyphrase extraction technique outperforms the existing techniques in terms of precision, recall, F-measure, and ROUGE scores.
... Each of these tasks might be either fine-grained or coarse-grained. Fine-grained sentiment analysis includes sentence-level sentiment analysis (Chenlo and Losada, 2014;Orimaye, 2013) and aspect-based sentiment analysis (Pavlopoulos, 2014;Shams and Baraani-Dastjerdi, 2017). Coarse-grained sentiment analysis includes document-level sentiment analysis (Hogenboom et al., 2015;Priyanka and Gupta, 2013;Sarvabhotla et al., 2011). ...
Article
Purpose This paper aims to propose a statistical and context-aware feature reduction algorithm that improves sentiment classification accuracy. Classification of reviews with different granularities in two classes of reviews with negative and positive polarities is among the objectives of sentiment analysis. One of the major issues in sentiment analysis is feature engineering while it severely affects time complexity and accuracy of sentiment classification. Design/methodology/approach In this paper, a feature reduction method is proposed that uses context-based knowledge as well as synset statistical knowledge. To do so, one-dimensional presentation proposed for SentiWordNet calculates statistical knowledge that involves polarity concentration and variation tendency for each synset. Feature reduction involves two phases. In the first phase, features that combine semantic and statistical similarity conditions are put in the same cluster. In the second phase, features are ranked and then the features which are given lower ranks are eliminated. The experiments are conducted by support vector machine (SVM), naive Bayes (NB), decision tree (DT) and k-nearest neighbors (KNN) algorithms to classify the vectors of the unigram and bigram features in two classes of positive or negative sentiments. Findings The results showed that the applied clustering algorithm reduces SentiWordNet synset to less than half which reduced the size of the feature vector by less than half. In addition, the accuracy of sentiment classification is improved by at least 1.5 per cent. Originality/value The presented feature reduction method is the first use of the synset clustering for feature reduction. In this paper features reduction algorithm, first aggregates the similar features into clusters then eliminates unsatisfactory cluster.
... In recent literature, much work has been written on sentiment analysis of user-generated content [10], including reviews and social discussions ( [11], [12]). Sentiment analysis can be performed at different levels: at document level [13], at sentence level [14] and at aspect level [15]. In our work we only focus on the analysis of sentiment at sentence level, since social contents usually consist of a single sentence, especially for microblogging platforms like Twitter. ...
Article
Full-text available
In recent years, the massive diffusion of social networks has made available a large amount of user-generated content, for the most part in the form of textual data that contain people’s thoughts and emotions about a great variety of topics. In order to exploit these publicly available information, in this work we introduce a social information discovery system which elaborates simultaneously over more-than-one social network in an integrated scenario. The system is designed to ensure flexibility and scalability, thus enabling for (near-)real-time analysis even in case of high rates of content’s creation and large amounts of heterogeneous data. Furthermore, a noise detection technique ensures a high relevance of analyzed posts/tweets to the domain of interest. We also propose a lexicon-based sentiment analysis algorithm to extract and measure users’ opinion, in order to support collaboration and open innovation. Polysemous words and negations are typically challenging for lexicon-based approaches: for this reason, we introduce both a word sense disambiguation algorithm and a negation handling technique. Experiments on several datasets have proven that the combined use of both techniques improves the classification accuracy on 3-class sentiment analysis.
... Topic modeling methods such as Latent Dirichlet Allocation (LDA) and its variants are also widely used in aspect extraction and applied to construct aspect ontology for ATE and OTE tasks [30][31][32][33][34][35] . Typical LDA methods are based on constructing word frequency vectors of the context. ...
Article
Aspect term extraction (ATE) and opinion target extraction (OTE) are two important tasks in fine-grained sentiment analysis field. Existing approaches to ATE and OTE are mainly based on rules or machine learning methods. Rule-based methods are usually unsupervised, but they can't make use of high level features. Although supervised learning approaches usually outperform the rule-based ones, they need a large number of labeled samples to train their models, which are expensive and time-consuming to annotate. In this paper, we propose a hybrid unsupervised method which can combine rules and machine learning methods to address ATE and OTE tasks. First, we use chunk-level linguistic rules to extract nominal phrase chunks and regard them as candidate opinion targets and aspects. Then we propose to filter irrelevant candidates based on domain correlation. Finally, we use these texts with extracted chunks as pseudo labeled data to train a deep gated recurrent unit (GRU) network for aspect term extraction and opinion target extraction. The experiments on benchmark datasets validate the effectiveness of our approach in extracting opinion targets and aspects with minimal manual annotation.
... In the LDA model, common sense reasoning is used to enhance the distribution of words, from grammatical analysis to semantic analysis, which transcends simple statistics and uses semantic information to identify aspects [21]. Shams utilized the statistical co-occurrence of the relationship between the words as a priori parameter, the first use of LDA model to generate the primary topic, according to co-occurrence and similarity relationship to automatic extract aspects, witch was puttde into the LDA model again to extract aspects [22]. The topic model mainly relies on statistics or statistical cooccurrence of words and topics. ...
Article
Full-text available
The existing methods for Chinese sentiment Labeling mainly relies on the artificial sentiment corpus, but a sentiment word in the corpus may not be sentiment words in different sentences. This paper proposes a new method to label the words in the sentences by combining deep convolution neural network with sequential algorithm., We first extract the aspects comprised by words vectors, part of speech vectors, dependent syntax vectors to train the deep convolution neural network, and then employ the sequential algorithm to obtain the sentiment annotation of the sentence. Experimental results verify that our method is effective for sentiment labeling. Considering that the identification of the implicit aspects can improve the completeness of sentiment analysis, we suggest to construct the tuples including aspect, sentiment shifter, sentiment intensity, sentiment words after obtaining the sentiment labels for each word in the sentence. We develop new algorithm for implicit aspect identification by taking the two key factors of the aspects as a topic and the match degree of aspects and sentiment words, and the human language habit. The experiment demonstrates that the algorithm can effectively identify the implicit aspect. In this paper, we solve the problem of sentiment labeling and implicit aspect recognition in sentiment analysis. As a new tool for sentiment analysis, our method can be applied to the enterprise management information analysis, such as product online review, product online reputation, brand image and consumer preference management, and can also be used for the sentiment analysis of large-scale text data.
... Topic modeling Y. Hu et al., 2014;Shams & Baraani-Dastjerdi, 2017) performs better than rule-based models in extracting opinion target. In Latent Dirichlet Allocation (LDA), the document is searched for any potential relevant topic while Probabilistic Latent Semantic Analysis (pLSA) incorporates latent rating information of reviews (Blei et al., 2003;Hofmann, 2001). ...
Article
Social networking sites have a wealth of user-generated unstructured text for fine-grained sentiment analysis regarding the changing dynamics in the marketplace. In aspect-level sentiment analysis, aspect term extraction (ATE) task identifies the targets of user opinions in the sentence. In the last few years, deep learning approaches significantly improved the performance of aspect extraction. However, the performance of recent models relies on the accuracy of dependency parser and part-of-speech (POS) tagger, which degrades the performance of the system if the sentence doesn't follow the language constraints and the text contains a variety of multi-word aspect-terms. Furthermore, lack of domain and contextual information is again an issue to extract domain-specific, most relevant aspect terms. The existing approaches are not capable of capturing long term dependencies for noun phrases, which in turn fails to extract some valid aspect terms. Therefore, this paper proposes a two-step mixed unsupervised model by combining linguistic patterns with deep learning techniques to improve the ATE task. The first step uses rules-based methods to extract the single word and multi-word aspects, which further prune domain-specific relevant aspects using fine-tuned word embedding. In the second step, the extracted aspects in the first step are used as label data to train the attention-based deep learning model for aspect-term extraction. The experimental evaluation on the SemEval-16 dataset validates our approach as compared to the most recent and baseline techniques.
... Rule-based methods (Hu and Liu, 2004;Liu et al., 2005;Zhuang et al., 2006;Scaffidi et al., 2007;Zhang et al., 2010;Qiu et al., 2011) are the pioneers along this direction. A number of unsupervised learning methods based on the LDA topic model and its variants (Titov and McDonald, 2008;Zhao et al., 2010;Brody and Elhadad, 2010;Mukherjee and Liu, 2012;Shams and Baraani-Dastjerdi, 2017) treat extracted topics as aspects. More recently, a neural model ExtRA (Luo et al., 2018) is proposed to further improve the aspect extraction at the document level. ...
Preprint
Aspect classification, identifying aspects of text segments, facilitates numerous applications, such as sentiment analysis and review summarization. To alleviate the human effort on annotating massive texts, in this paper, we study the problem of classifying aspects based on only a few user-provided seed words for pre-defined aspects. The major challenge lies in how to handle the noisy misc aspect, which is designed for texts without any pre-defined aspects. Even domain experts have difficulties to nominate seed words for the misc aspect, making existing seed-driven text classification methods not applicable. We propose a novel framework, ARYA, which enables mutual enhancements between pre-defined aspects and the misc aspect via iterative classifier training and seed updating. Specifically, it trains a classifier for pre-defined aspects and then leverages it to induce the supervision for the misc aspect. The prediction results of the misc aspect are later utilized to filter out noisy seed words for pre-defined aspects. Experiments in two domains demonstrate the superior performance of our proposed framework, as well as the necessity and importance of properly modeling the misc aspect.
... To solve problems deriving from syntactic and morpho-syntactic language variations, the proposed approach obtained more precise index terms by extracting syntactic dependencies as complex index terms. Shams and Baraani-Dastjerdi (2017) presented a method that combines LDA topic modelling with semantic relationships for extracting different aspects in textual data. ...
Article
Natural Language Processing (NLP) is now widely integrated into web and mobile applications, enabling natural interactions between humans and computers. Although there is a large body of NLP studies published in Information Systems (IS), a comprehensive review of how NLP research is conceptualised and realised in the context of IS has not been conducted. To assess the current state of NLP research in IS, we use a variety of techniques to analyse a literature corpus comprising 356 NLP research articles published in IS journals between 2004 and 2018. Our analysis indicates the need to move from semantics to pragmatics. More importantly, our findings unpack the challenges and assumptions underlying current research trends in NLP. We argue that overcoming these challenges will require a renewed disciplinary IS focus. By proposing a roadmap of NLP research in IS, we draw attention to three NLP research perspectives and present future directions that IS researchers are uniquely positioned to address.
... Machine learning techniques are the most common approaches used to extract, classify, identify and detect explicit and implicit based aspect, and are mainly categorized based on either supervised, semi-supervised or unsupervised techniques [10,11]. To find a way of performing aspect based sentiment analysis for under-resourced languages as most of the works target English language with a few on Chinese, Arabic, Hindi, Malay etc, [12][13][14][15][16][17][18] introduced aspectbased cross-lingual sentiment classification approach with impressive performance. ...
... The main idea is that the documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words [17]. LDA is termed as a Bayesian version of probabilistic latent semantic analysis method [18]. ...
Article
The main aim of the present study is to compare the keywords extracted from abstracts and full length text of scientific research papers. In addition to that, here, we compare Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) to identify better performer for keyword extraction. This comparative study is divided into three levels, In the first level, scientific research articles on topics such as Indian Economic growth, GDP, Economic Slowdown etc. were collected and abstracts and full length text was extracted from the sources and pre-processed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the pre-processed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, in order to study the feasibility of the Natural Language Processing (NLP) techniques, Latent Semantic analysis (LSA) and Latent Dirichlet Allocations (LDA) methods were applied over the resultant corpus.
... The latent Dirichlet allocation (LDA) model [14] is a topic model widely adapted to text classification, Blei et al. then introduced the Bayesian framework into the probabilistic latent semantic analysis (PLSA) model [15,16]. Shams [17] innovatively added the co-occurrence relationship on the basis of the LDA model and regarded it as an a priori domain. Furthermore, Lin [18] and Moghaddam et al. [19] studied the dependence between potential aspects and scores and performed modeling analysis. ...
Article
Full-text available
The personal description of a company associated with job satisfaction, company culture, and opinions of senior leadership is available on workplace community websites. However, it is almost impossible to read all of the different and possibly even contradictory reviews and make an accurate overall rating. Therefore, extracting aspects or sentiments from online reviews and the corresponding ratings is an important challenge. We collect online anonymous employees’ reviews from Glassdoor.com which allows people to evaluate and review the companies they have worked for or are working for. Here, we propose a joint rules-based model which combines the numerical evaluation reflected in the form of 1–5 stars, and the reviewed context to extract aspects. The model first inputs the five aspects with the initial word sets that are manually screened, and expands the aspect keyword sets through bootstrapping semi-supervised learning, and then uses latent rating regression to obtain the aspect score and aspect weight to update the corresponding score. Our experimental evaluation has shown better results as compared with an unsupervised learning of the latent Dirichlet allocation. The results could not only help companies understand their strengths and weaknesses, but also help job seekers apply for companies.
... The proposed ELDA model is said to be language independent and incorporates the advantages of both methods used in combination. The model also involves relevant topics to enhance knowledge extraction in a small corpus [7].LDA has been used predominantly for topic modeling. Y. Kalepalli et.al have proposed a comparison of LSA and LDA where both the algorithms are applied on the BBC news dataset. ...
... Accordingly, topic modelling techniques can be used to identify the different aspect terms and then to categorise them into topics or aspect categories. The first proposals were based on variants as extensions of the LDA algorithm [38], like [39][40][41][42]. The development of topic models based on deep learning, neural topic models [43,44], has pushed the development of neural models in which continuous representations of aspect categories are learnt. ...
Article
Opinion summarisation is concerned with generating structured summaries of multiple opinions in order to provide insightful knowledge to end users. We present the Aspect Discovery for OPinion Summarisation (ADOPS) methodology, which is aimed at generating explainable and structured opinion summaries. ADOPS is built upon aspect-based sentiment analysis methods based on deep learning and Subgroup Discovery techniques. The resultant opinion summaries are presented as interesting rules, which summarise in explainable terms for humans the state of the opinion about the aspects of a specific entity. We annotate and release a new dataset of opinions about a single entity on the restaurant review domain for assessing the ADOPS methodology, and we call it ORCo. The results show that ADOPS is able to generate interesting rules with high values of support and confidence, which provide explainable and insightful knowledge about the state of the opinion of a certain entity.
... The proposed ELDA model is said to be language independent and incorporates the advantages of both methods used in combination. The model also involves relevant topics to enhance knowledge extraction in a small corpus [7]. LDA has been used predominantly for topic modeling. ...
Conference Paper
Full-text available
The importance of online news articles has evolved with the advancement of information and technology. How people gather information, shape their views, and engage with topics of relevance has been increased by the internet. Thus news articles become important sources and play a significant role in shaping personal and public opinion. Predicting polarity in news articles becomes crucial to have a well-balanced understanding of any event. Using aspect-based sentiment analysis, the application predicts sentiments attached to various aspects of a particular Hindi news article. Our approach consists of sentence identification using POS tagging techniques followed by aspect extraction using unsupervised learning algorithms and finally predicting the sentiments of the aspects using sentiment analysis. The predicted sentiments would be displayed in a user-friendly format so that the users can easily understand them. Existing systems work on the English language unlike our approach for sentiment analysis Keywords: News Articles, Polarity, Sentiment Analysis, Topic Modeling, RNN
... Bagheri, Saraee, and De Jong (2014 ) proposed an aspect extraction model by considering each word in the sentence as a state of a Markov chain ( Gruber, Weiss, & Rosen-Zvi, 2007 ). Shams and Baraani-Dastjerdi (2017 ) proposed Enriched LDA model which incorporates the aspect co-occurrence relations as prior domain knowledge into LDA for aspect extraction. Poria, Chaturvedi, Cambria, and Bisio (2016 ) presented Sentic LDA which integrates common-sense in LDA to calculate the word distributions. ...
Article
Opinion target extraction or aspect extraction is the most important subtask of the aspect-based sentiment analysis. This task focuses on the identification of the targets of user’s opinions or sentiments from online reviews. In the recent years, syntactic patterns-based approaches have performed quite well and produced significant improvement in the aspect extraction task. However, these approaches are heavily dependent on the dependency parsers which produced syntactic relations following the grammatical rules and language constraints. In contemporary, users do not give much importance to these rules and constraints while expressing their opinions about particular product and neither reviewer websites restrict users to do so. This makes syntactic patterns-based approaches vulnerable. Therefore, in this paper, we are proposing a two-fold rules-based model (TF-RBM) which uses rules defined on the basis of sequential patterns mined from customer reviews. The first fold extracts aspects associated with domain independent opinions and the second fold extracts aspects associated with domain dependent opinions. We have also applied frequency- and similarity-based approaches to improve the aspect extraction accuracy of the proposed model. Our experimental evaluation has shown better results as compared with the state-of-the-art and most recent approaches.
... But after one year of usage its speakers goes down." Now, if the most frequently occurring aspects in this comment are price, design and performance then, according to the current aspect based sentiment analysis techniques [59], [35], [52] these are aspects whereas the "speakers" as an aspect will be ignored regardless of a strong negative sentiment. In the ex-plicit aspect identification the most stirring Part of Speech tag NOUN is considered to be as aspect. ...
Article
Full-text available
Aspect Based Sentiment Analysis techniques have been widely applied in several application domains. During the last two decades, these techniques have been mostly developed for the domain of product and service re-views. However, very few Aspect Based Sentiment Techniques have been proposed for the domain of movie reviews. In contrast to most studies that focus on movie specific aspects such as Script, Director, and Actor, this work focus on NER (Named Entity Recognition) in order to find out entity-specific aspects. Consequent-ly, MAIM (Movie Aspects Identification Model) is proposed that can extract not only movie-specific aspects but can also identify Named Entities (NEs) such as Person Name and Movie Title. The three main contribu-tions in this paper are (i) identification of infrequent aspects, (ii) identification of NEs, and (iii) identification of N-gram opinion words as an entity. MAIM is implemented using BiLSTM-CRF (Bidirectional Long Short-Term Memory – Conditional Random Field) hybrid technique and tested on movie reviews dataset. The results showed a precision score of 89.9%, recall of 88.9%, and f1-score of 89.4%. The results of the hybrid model are compared with the baseline models i.e., CRF (Conditional Random Field) and LSTM-CRF (Long Short-Term Memory – Conditional Random Field) and shown hybrid model outperforms both models in term of precision, recall and f1-score.
... Another approach used is to take advantage of the relationships of aspects with expressions that indicate some sentiment within the text [13][14] [15]. On the other hand, there are more advanced approaches such as those based on supervised learning [16] [17][18] [19] and those that use models based on probabilistic inference [20][21] [22]. ...
Article
Full-text available
In this article, we present a semantic model for aspect extraction from Spanish text as part of a complete aspect-based sentiment analysis system. The model uses ontology, semantic similarity, and double propagation techniques to detect explicit and implicit aspects. The proposed approach allows the implementation of a scalable system for any language or domain. The experimental tests were carried out using the SemEval-2016 dataset for task 5, corresponding to the aspect-based sentiment analysis sentence level. The implemented system obtained an F1 score of 73.07 for the aspect extraction, achieving the best results among the systems participating in the comparison, and an F1 score of 89.18 for the hotel domain using a ten-iteration cross-validation. Index Terms-Aspect-based sentiment analysis, ontology, opinion mining, natural language processing, semantic similarity.
... Sentiment analysis or opinion mining is the technique of investigating the behavior of feelings, opinions, and emotions for a particular object or topic that is indicated in textual data [1]. The companies use sentiment analysis to increase their product's sales and services [2,3]. In the case of private business, sentiment analysis shapes the needs and views of the user that leads to the institution's recognition of the service of the customer. ...
Article
In this paper, it is proposed to understand the opinion of the public regarding the policy of demonetization that is implemented recently in India through Aspect-based Sentiment Analysis (ABSA) that predicts the sentiment of specific aspects present in the text. The major aim is to identify the relevant contexts for various aspects. Most of the conventional techniques have adopted attention mechanisms and deep learning concepts that decrease the prediction accuracy and generate huge noise. Another major disadvantage with the attention mechanisms is that the sentiment related to few context words alters with various aspects, and hence it cannot be concluded from itself alone. This paper adopts the optimized deep learning concept for performing the ABSA for demonetization tweets. The proposed model involves various phases such as pre-processing, aspect extraction, polarity feature extraction, and sentiment classification. Initially, the different demonetization tweets collected from the Kaggle dataset are taken. Pre-processing is done with the help of four phases like stop words removal, punctuation removal, lower case conversion, and stemming from minimizing the data to its reduced format. This pre-processed data is further performed with aspect extraction to extract the opinion words. These extracted aspect words are converted to the features with the help of polarity score computation and Word2vec. The weight of the polarity scores is optimized using hybridization of two meta-heuristic algorithms like FireFly Algorithm (FF), and Multi-Verse Optimization (MVO), and the new algorithm is termed as Fire Fly-oriented Multi-Verse Optimizer (FF-MVO). Further, combined features are subjected to a deep learning algorithm called Recurrent Neural Network (RNN). As a modification to the existing RNN, the hidden neurons are optimized by the hybrid FF-MVO, FF-MVO-RNN classifies the positive and negative sentiments. Finally, the comparative analysis of different machine learning algorithms proves the competent performance of the proposed model.
Article
Full-text available
With the proliferation of the Internet and the rapid growth of electronic articles, text categorization has become one of the key and important tools for data organization and management. In the text categorization, a set of basic knowledge is provided to the system by learning from this set, the new input documents into one of the subject groups. In health literature due to the wide variety of topics, preparing such a set of early education is a very time consuming and costly task. The purpose of this article is to present a hybrid model of learning (supervised and unsupervised) for the subject classification of health scientific products that perform the classification operation without the need for an initial labeled set. To extract the thematic model of health science texts from 2009 to 2019 at the PubMed database, data mining and text mining were performed using machine learning. Based on the Latent Dirichlet Allocation model, the data were analyzed and then the Support Vector Machine was used to classify the texts. In the findings of this study, the model was introduced in three main steps. In the first step, the necessary preprocessing was done on the dataset due to the elimination of unnecessary and unnecessary words from the dataset and increasing the accuracy of the proposed model. In the second step, the themes in the texts were extracted using the Latent Dirichlet Allocation method, and as a basic training set in step 3, the data were backed up by the Support Vector Machine algorithm and the classifier learning was performed with the help of these topics. Finally, with the help of the categorization, the subject of each document was identified. The results showed that the proposed model can build a better classification by combining unsupervised clustering properties and prior knowledge of the samples. Clustering on labeled samples with a specific similarity criterion merges related texts with prior knowledge, then the learning algorithm teaches classification by the supervisory method. Combining categorization and clustering can increase the accuracy of the categorization of health texts. https://jipm.irandoc.ac.ir/article-1-4307-fa.pdf
Article
User-generated content on various Internet platforms is growing explosively, and contains valuable information that helps decision-making. However, extracting this information accurately is still a challenge since there are massive amounts of data. Thereinto, sentiment analysis solves this problem by identifying people’s sentiments towards the opinion target. This article aims to provide an overview of deep learning for aspect-based sentiment analysis. Firstly, we give a brief introduction to the aspect-based sentiment analysis (ABSA) task. Then, we present the overall framework of the ABSA task from two different perspectives: significant subtasks and the task modeling process. Finally, challenges are proposed and summarized in the field of sentiment analysis, especially in the domain of aspect-based sentiment analysis. In addition, ABSA task also takes the relations between various objects into consideration, which is rarely discussed in the previous work.
Chapter
Sentiment analysis or opinion mining has become one of the fastest growing areas now. Though the journey started since 1990s but huge outbreak of sentiment analysis occurred after 2004. With the increasing pressure of new era, new technology, more complex and busy lifestyle, mental health issues are also becoming more serious concerns. In this survey paper, a brief evolution history of sentiment analysis has been discussed. Briefly, emotion detection through facial expression has also been addressed. Commonly used approaches and technology used for sentiment analysis and emotion detection has been discussed with the comparison of available technologies. With the methodology used for sentiment analysis, an insight view of mental health concerns where sentiment analysis can play a vital role, has been discussed. Nowadays, our youth generation is using social website in a large scale to express their mental status, as a way of entertainment, to express their general opinion regarding any topic or issue. Hence the large web data now has gained the ability to show the overall mental condition for a large community. After corona outbreak mental issues have been increased significantly as well as use of social media also has been increased incredibly due to lockdown and work from home lifestyle. So by using the vast data of web platform, we can analyze the recent situation of mental health issues and also predict near future concerns. Although in accuracy of sentiment analysis, we are still facing many challenges with our existing algorithms, but there are a lot of future scope in this field.
Article
In traditional bibliometric analysis, author keywords (AKs) play a critical role in such areas as information query, co-word analysis, and capturing topic terms. In past decades, the most relevant studies have focused on the weighting methods of AKs to find specialty or discriminated terms for a topic; however, very few explorations touched the issue of role differentiation for AKs within a specific topic or the context of topic query. Furthermore, either traditional co-word analysis or the latest semantic modeling methods still face the challenges on accurate classifying and ranking the keywords/terms for a specific research topic. As a complement to prior research, a novel analytical framework based on role differentiation of AKs and Technique for Order of Preference by Similarity to Ideal Solution is proposed in this article. In addition, a case study on additive manufacturing is conducted to verify the proposed framework.
Article
As a variant problem of aspect-based sentiment analysis (ABSA), aspect category sentiment analysis (ACSA) aims to identify the aspect categories discussed in sentences and predict their sentiment polarities. However, most aspect-based sentiment analysis (ABSA) research focuses on predicting the sentiment polarities of given aspect categories or aspect terms explicitly discussed in sentences. In contrast, aspect categories are often discussed implicitly. Additionally, most of the research does not consider the relations between contextual words and aspect categories. This paper proposes a novel Semantic Relatedness-enhanced Graph Network (SRGN) model which integrates the semantic relatedness information through an Edge-gated Graph Convolutional Network (EGCN). We introduce an ontology-based approach and a distributional approach to calculate the semantic relatedness values between contextual words and aspect categories. EGCN with the capability to aggregate multi-channel edge features, is then applied to model the semantic relatedness values in a graphical structure. We also employ an aspect-context attention module to generate aspect-specific representations. The proposed SRGN is evaluated on five datasets constructed based on SemEval 2015, SemEval 2016 and MAMC-ACSA datasets. Experimental results indicate that our proposed model outperforms the baseline models in both accuracy and F1 score.
Article
Aspect term extraction (ATE) aims at identifying the aspect terms that are expressed in a sentence. Recently, Seq2Seq learning has been employed in ATE and significantly improved performance. However, it suffers from some weaknesses, such as lacking the ability to encode the more informative information and integrate information of surrounding words in the encoder. The static word embeddings employed in ATE fall short of modeling the dynamic meaning of words. To alleviate the problems mentioned above, this paper proposes the information-augmented neural network (IANN) which is a novel Seq2Seq learning framework. In IANN, a specialized neural network is developed as the key module of the encoder, named multiple convolution with recurrence network (MCRN), to encode the more informative information and integrate information of surrounding words in the encoder. The contextualized embedding layer is designed to capture the dynamic word sense. Besides, the novel AO ({ A spect, O utside}) tags are proposed as the less challenging tagging scheme. A lot of experiments have been performed on three widely used datasets. These experiments demonstrate that the proposed IANN acquires state-of-the-art results and validate that the proposed IANN is a powerful method for the ATE task.
Article
Purpose Online advertisement brings huge revenue to many websites. There are many types of online advertisement; this paper aims to focus on the online banner ads which are usually placed in a particular news website. The investigated news website adopts a pay-per-ad payment model, where the advertisers are charged when they rent a banner from the website during a particular period. In this payment model, the website needs to ensure that the ad pushed frequency of each ad on the banner is similar. Under such advertisement push rules, an ad-recommendation mechanism considering ad push fairness is required. Design/methodology/approach The authors proposed a novel ad recommendation method that considers both ad-push fairness and personal interests. The authors take every ad’s exposure time into consideration and investigate users’ three different usage experiences in the website to identify the main factors affecting the interests of users. Online ad recommendation is conducted on the investigated news website. Findings The results of the experiments show that the proposed approach performs better than the traditional approach. This method can not only enhance the average click rate of all ads in the website but also ensure reasonable fairness of exposure frequency of each ad. The online experiment results demonstrate the effectiveness of this approach. Originality/value Existing researches had not considered both the advertisement recommendation and ad-push fairness together. With the proposed novel ad recommendation model, the authors can improve the ad click-through rate of ads with reasonable push fairness. The website provider can thereby increase the commercial value of advertising and user satisfaction.
Article
Full-text available
Topic models, such as Latent Dirichlet Allocation (LDA), allow us to categorize each document based on 3 the topics. It builds a document as a mixture of topics and a topic is modeled as a probability distribution over words. 4 However, the key drawback of the traditional topic model is that it cannot handle the semantic knowledge hidden in 5 the documents. Therefore, semantically related, coherent and meaningful topics cannot be obtained. However, semantic 6 inference plays a significant role in topic modeling as well as in other text mining tasks. In this paper, in order to 7 tackle this problem, a novel NET-LDA model is proposed. In NET-LDA, semantically similar documents are merged 8 to bring all semantically related words together and the obtained semantic similarity knowledge is incorporated into 9 the model with a new adaptive semantic parameter. The motivation of the study is to reveal the impact of semantic 10 knowledge in the topic model researches. Therefore, in a given corpus, different documents may contain different words 11 but may speak about the same topic. For such documents to be correctly identified, the feature space of the documents 12 must be elaborated with more powerful features. In order to accomplish this goal, the semantic space of documents is 13 constructed with concepts and named entities. Two datasets in the English and Turkish languages and twelve different 14 domains have been evaluated to show the independence of the model from both language and domain. The proposed 15 NET-LDA, compared to the baselines, outperforms in terms of topic coherence, F-measure, and qualitative evaluation. 16
Article
Many existing systems for analyzing and summarizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the prominent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain services such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost everyday. In this paper, we propose a novel method empowered by knowledge sources such as Probase and WordNet, for extracting the most prominent aspects of a given product type from textual reviews. The proposed method, ExtRA (Extraction of Prominent Review Aspects), (i) extracts the aspect candidates from text reviews based on a data-driven approach, (ii) builds an aspect graph utilizing the Probase to narrow the aspect space, (iii) separates the space into reasonable aspect clusters by employing a set ofproposed algorithms and finally (iv) generates K most prominent aspect terms or phrases which do not overlap semantically automatically without supervision from those aspect clusters. ExtRA extracts high-quality prominent aspects as well as aspect clusters with little semantic overlap by exploring knowledge sources. ExtRA can extract not only words but also phrases as prominent aspects. Furthermore, it is general-purpose and can be applied to almost any type of product and service. Extensive experiments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types.
Article
Recently, deep learning methods have been applied to deal with the opinion target extraction (OTE) task with fruitful achievements. On the other hand, since the features captured by the embedding layer can make a multiple-perspective analysis from a sentence, an embedding layer that can grasp the high-level semantics of the sentences is of essence for processing the OTE task and can improve the performance of model into a more efficient manner. However, most of the existing studies focused on the network structure rather than the significant embedded layer, which may be the fundamental reason for the problem of relatively poor performance in this field, not mention the Chinese extraction model. To compensate these shortcomings, this paper proposes a model using multiple effective features and Bidirectional Encoder Representations from Transformers (BERT) on the architecture of Bidirectional Long Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) for Chinese opinion target extraction task, namely MF-COTE, which can construct features from different perspectives to capture the context and local features of the sentences. Besides, to handle the difficult case of multiple nouns in one sentence, we innovatively propose noting words feature to regulate the model emphasize on the noun near the transition or contrast word, thus leading a better opinion target location. Moreover, to demonstrate the superiorities of the proposed model, extensive comparison experiments are systematically conducted compared with other existing state-of-the-art methods, with the F1-score of 90.76%, 92.10%, 89.63% on the Baidu, the Dianping, and the Mafengwo dataset, respectively.
Thesis
Full-text available
Yapısal ve yapısal olmayan milyarlarca içeriği biz kullanıcılarına sunan Web, günümüzün önemli veri kaynaklarından birisi haline gelmiştir. Sunulan içerik her geçen gün büyümekte, bu içerikten istenilen bilginin otomatik bir şekilde çıkartılması ve çıkartılan bilginin organize edilme, analiz edilme ve anlaşılması adımında ise daha yeni ve daha etkili yöntemlerin geliştirilmesi gerekmektedir. Konu modelleri ise bahsedilen bu görevleri gerçekleştirme aşamasında güçlü ve başarılı bir yöntem olarak karşımıza çıkmaktadır. İlk olarak 1990 yılında ortaya çıkan konu modelleri içerisinde ise en yeni ve başarılı olanı Gizli Dirichlet Ayırımıdır (LDA). Doküman gibi ayrık verileri modellemek ve dokümanı meydana getiren konuları ortaya çıkarmak için kullanılan üretici grafiksel bir yöntem olan LDA, sadece kelimelerin doküman koleksiyonunda birlikte geçme durumlarını dikkate almaktadır. Buna karşın içerdikleri anlamsal bilgiyi ise dikkate almamaktadır. Bu durum önemli bir dezavantaj oluşturmaktadır. Bu tez çalışmasında kavram ve adlandırılmış varlıklar şeklindeki anlamsal bilgiyi LDA'ya dahil ederek anlamsal olarak ilişkili, uyumlu, detayları yakalayabilen ve daha anlamlı konuları elde etmek amacıyla iki konu modeli önerilmiştir. Concept-LDA olarak adlandırılan birinci yöntemde, LDA'nın temel varsayımı olan kelime torbası yaklaşımı, {kelime+kavram+adlandırılmış varlık} torbası olacak şekilde genişletilerek anlamsal bir zenginleştirme yöntemi hedeflenmiştir. Geliştirilen Concept-LDA alandan bağımsız bir yöntemdir. NET-LDA olarak adlandırılan ikinci yöntemde ise, anlamsal olarak benzer dokümanlar birleştirilmiş ve birleştirme adımında elde edilen anlamsal benzerlik bilgisi yeni bir adaptif parametre olarak modele dahil edilmiştir. NET-LDA hem alandan hem de dilden bağımsız olup her iki yöntem ile başarılı konuların çıkartılması sağlanmıştır. Anlamsal bilginin elde edilmesi adımında ise graf tabanlı bir yaklaşım olan Babelfy kullanılmıştır. Geliştirilen yöntemlerin performansları hem niceliksel hem de niteliksel olarak değerlendirilmiştir. Concept-LDA'nın değerlendirilmesi adımında on iki farklı ürüne ait İngilizce kullanıcı yorumları kullanılmıştır; NET-LDA'nın değerlendirilmesinde ise biri Türkçe diğer on iki tanesi İngilizce olmak üzere on üç farklı ürüne ait kullanıcı yorumları kullanılmıştır. Ayrıca, geliştirilen yöntemler hem niceliksel hem de niteliksel olarak üç temel yöntemden elde edilen sonuçlar ile karşılaştırılmıştır. Yapılan deneyler sonucunda anlamsal bilginin modele dahil edilmesi ile anlamsal olarak ilişkili, uyumlu, detayları yakalayabilen ve daha anlamlı konuların elde edildiği görülmüştür. Geliştirilen yöntemlerin temel yöntemlere kıyasla da oldukça başarılı oldukları yapılan deneylerde ispatlanmıştır.
Article
Full-text available
Refurbishing is an industrial process whereby used products are returned to good working condition to extend lifespan. As smartphones are short life cycle products replaced at an increasing rate, refurbishing is one of the end-of-life strategies to recover value from used smartphones. Chief among the factors given for refurbishing success is consumers' perception of the aspects of the purchase toward these products. This research investigates the significant factors in consumer perceived value about purchasing refurbished smartphones. Online product reviews are recognized as a promising data source to evaluate consumers' post-purchase behaviour in the actual market. Accordingly, a customer satisfaction model of online refurbished smartphone reviews from e-commerce websites is presented to explore customer satisfaction dimensions (CSDs) toward refurbished smartphones. The results indicate that product characteristics, including function, which is related to satisfactory working, appearance, which is the same as no scratches on body and screen, and battery health, are the most worrying feature of refurbished smartphones that consumers have mentioned. Besides, we find that the similarity of these products to brand new ones and their lower prices are the main reason and motivation for purchasing. The results also show that the perceived value of refurbished smartphones is found as a two-dimensional structure based on perceived incentive and quality and perceived benefit and risk. Eventually, some solutions propose to improve customer perceptions and reduce the misconception of refurbishment concept, which can be used by refurbishers and marketing managers for proper product development and marketing strategies.
Article
Recently, the explosive increase in social media data enables manufacturers to collect product defect information promptly. Extant literature gathers defect information like defective components or defect symptoms without distinguishing defect-related (DR) texts from defect-unrelated (DUR) texts and thus makes defects discussed by few texts buried in enormous DUR texts. Moreover, existing studies do not consider the defect severity which is valuable and important for manufacturers to make remedial decisions. To bridge these research gaps, we propose a novel approach that integrates the probabilistic graphic model named Product Defect Identification and Analysis Model (PDIAM) with Failure Mode and Effect Analysis (FMEA) to derive product defect information from social media data. Comparing to extant studies, PDIAM identifies DR texts and then extracts defect information from these texts. And PDIAM provides more defect information than previous researches. Besides, we further analyze defect severity with the combination of FMEA and PDIAM which alleviates the inherent subjectivity brought by expert evaluation in the traditional FMEA. A case study in the automobile industry proves the predominant performance of our approach and great potential in defect management.
Article
Unsupervised aspect identification is a challenging task in aspect-based sentiment analysis. Traditional topic models are usually used for this task, but they are not appropriate for short texts such as product reviews. In this work, we propose an aspect identification model based on aspect vector reconstruction. A key of our model is that we make connections between sentence vectors and multi-grained aspect vectors using fuzzy k-means membership function. Furthermore, to make full use of different aspect representations in vector space, we reconstruct sentence vectors based on coarse-grained aspect vectors and fine-grained aspect vectors simultaneously. The resulting model can therefore learn better aspect representations. Experimental results on two datasets from different domains show that our proposed model can outperform a few baselines in terms of aspect identification and topic coherence of the extracted aspect terms.
Article
The Latent Dirichlet Allocation (LDA) model, which is a document-level probabilistic model, has been widely used in topic modeling. However, an essential issue of the LDA is its shortage in identifying co-occurrence relationships (e.g., aspect-aspect, aspect-opinion, etc.) in sentences. To address the problem, we propose an association constrained LDA (AC-LDA) for effectively capturing the co-occurrence relationships. Specifically, based on the basic features of the syntactic structure in product reviews, we formalize three major types of word association combinations and then carefully design corresponding identifications. For reducing the influence of global aspect words on the local distribution, we apply an important constraint on global aspects. Finally, the constraint and related association combinations are merged into the LDA to guide the topic-words allocation in the learning process. Based on the experiments on real-world product review data, we demonstrate that our model can effectively capture the relationships hidden in local sentences and further increase the extraction rate of fine-grained aspects and opinion words. Our results confirm the superiority of the AC-LDA over the state-of-the-art methods in terms of the extraction accuracy. We also verify the strength of our method in identifying irregularly appeared terms, such as non-aspect opinions, low-frequency words, and secondary aspects.
Article
Full-text available
In this paper we consider a nonlinear parabolic equation of p(x, t)-Laplacian type in divergence form with measurable data over non-smooth domains. We establish the global Calderón–Zygmund theory for the weak solutions of such a problem in the setting of weighted Lebesgue spaces. The nonlinearity of the coefficients is assumed to be discontinuous with respect to (x, t)-variables and the lateral boundary of the domain is sufficiently flat beyond the Lipchitz category. As an application of the main result, the regularity in parabolic Morrey scales for the spatial gradient is also obtained.
Article
Full-text available
Sentiment analysis (SA) has become one of the most active and progressively popular areas in information retrieval and text mining due to the expansion of the World Wide Web (WWW). SA deals with the computational treatment or the classification of user’s sentiments, opinions and emotions hidden within the text. Aspect extraction is the most vital and extensively explored phase of SA to carry out the classification of sentiments in precise manners. During the last decade, enormous number of research has focused on identifying and extracting aspects. Therefore, in this survey, a comprehensive overview has been attempted for different aspect extraction techniques and approaches. These techniques have been categorized in accordance with the adopted approach. Despite being a traditional survey, a comprehensive comparative analysis is conducted among different approaches of aspect extraction, which not only elaborates the performance of any technique but also guides the reader to compare the accuracy with other state-of-the-art and most recent approaches.
Article
Full-text available
Consumers are increasingly relying on other consumers' online reviews of features and quality of products while making their purchase decisions. However, the rapid growth of online consumer product reviews makes browsing a large number of reviews and identifying information of interest time consuming and cognitively demanding. Although there has been extensive research on text review mining to address this information overload problem in the past decade, the majority of existing research mainly focuses on the quality of reviews and the impact of reviews on sales and marketing. Relatively little emphasis has been placed on mining reviews to meet personal needs of individual consumers. As an essential first step toward achieving this goal, this study proposes a product feature-oriented approach to the analysis of online consumer product reviews in order to support feature-based inquiries and summaries of consumer reviews. The proposed method combines LDA (Latent Dirichlet Allocation) and a synonym lexicon to extract product features from online consumer product reviews. Our empirical evaluation using consumer reviews of four products shows higher effectiveness of the proposed method for feature extraction in comparison to association rule mining.
Conference Paper
Full-text available
The popularity of Web 2.0 has resulted in a large number of publicly available online consumer reviews created by a demographically diverse user base. Information about the authors of these reviews, such as age, gender and location, provided by many on-line consumer review platforms may allow companies to better understand the preferences of different market segments and improve their product design, manufacturing processes and marketing campaigns accordingly. However, previous work in sentiment analysis has largely ignored these additional user meta-data. To address this deficiency, in this paper, we propose parametric and non-parametric User-aware Sentiment Topic Models (USTM) that incorporate demographic information of review authors into topic modeling process in order to discover associations between market segments, topical aspects and sentiments. Qualitative examination of the topics discovered using USTM framework in the two datasets collected from popular online consumer review platforms as well as quantitative evaluation of the methods utilizing those topics for the tasks of review sentiment classification and user attribute prediction both indicate the utility of accounting for demographic information of review authors in opinion mining.
Article
Full-text available
Online consumer product reviews are a main source for consumers to obtain product information and reduce product uncertainty before making a purchase decision. However, the great volume of product reviews makes it tedious and ineffective for consumers to peruse individual reviews one by one and search for comments on specific product features of their interest. This study proposes a novel method called EXPRS that integrates an extended PageRank algorithm, synonym expansion, and implicit feature inference to extract product features automatically. The empirical evaluation using consumer reviews on three different products shows that EXPRS is more effective than two baseline methods.
Conference Paper
Full-text available
Topic modeling has been widely used to mine topics from documents. However, a key weakness of topic modeling is that it needs a large amount of data (e.g., thousands of documents) to provide reliable statistics to generate coherent topics. However, in practice, many document collections do not have so many documents. Given a small number of documents, the classic topic model LDA generates very poor topics. Even with a large volume of data, unsupervised learning of topic models can still produce unsatisfactory results. In recently years, knowledge-based topic models have been proposed, which ask human users to provide some prior domain knowledge to guide the model to produce better topics. Our research takes a radically different approach. We propose to learn as humans do, i.e., retaining the results learned in the past and using them to help future learning. When faced with a new task, we first mine some reliable (prior) knowledge from the past learning/modeling results and then use it to guide the model inference to generate more coherent topics. This approach is possible because of the big data readily available on the Web. The proposed algorithm mines two forms of knowledge: must-link (meaning that two words should be in the same topic) and cannot-link (meaning that two words should not be in the same topic). It also deals with two problems of the automatically mined knowledge, i.e., wrong knowledge and knowledge transitivity. Experimental results using review documents from 100 product domains show that the proposed approach makes dramatic improvements over state-of-the-art baselines.
Conference Paper
Full-text available
Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MC-LDA outperforms the existing state-of-the-art models markedly.
Article
Full-text available
This work proposes an extension of Bing Liu’s aspect-based opinion mining approach in order to apply it to the tourism domain. The extension concerns with the fact that users refer differently to different kinds of products when writing reviews on the Web. Since Liu’s approach is focused on physical product reviews, it could not be directly applied to the tourism domain, which presents features that are not considered by the model. Through a detailed study of on-line tourism product reviews, we found these features and then model them in our extension, proposing the use of new and more complex NLP-based rules for the tasks of subjective and sentiment classification at the aspect-level. We also entail the task of opinion visualization and summarization and propose new methods to help users digest the vast availability of opinions in an easy manner. Our work also included the development of a generic architecture for an aspect-based opinion mining tool, which we then used to create a prototype and analyze opinions from TripAdvisor in the context of the tourism industry in Los Lagos, a Chilean administrative region also known as the Lake District. Results prove that our extension is able to perform better than Liu’s model in the tourism domain, improving both Accuracy and Recall for the tasks of subjective and sentiment classification. Particularly, the approach is very effective in determining the sentiment orientation of opinions, achieving an F-measure of 92% for the task. However, on average, the algorithms were only capable of extracting 35% of the explicit aspect expressions, using a non-extended approach for this task. Finally, results also showed the effectiveness of our design when applied to solving the industry’s specific issues in the Lake District, since almost 80% of the users that used our tool considered that our tool adds valuable information to their business.
Article
Full-text available
Topic detection with large and noisy data collections such as social media must address both scalability and accuracy challenges. KeyGraph is an efficient method that improves on current solutions by considering keyword cooccurrence. We show that KeyGraph has similar accuracy when compared to state-of-the-art approaches on small, well-annotated collections, and it can successfully filter irrelevant documents and identify events in large and noisy social media collections. An extensive evaluation using Amazon’s Mechanical Turk demonstrated the increased accuracy and high precision of KeyGraph, as well as superior runtime performance compared to other solutions.
Conference Paper
Full-text available
Micro-blogging services, such as Twitter, and location-based social network applications have generated short text messages associated with geographic information, posting time, and user ids. The availability of such data received from users offers a good opportunity to study the user's spatial-temporal behavior and preference. In this paper, we propose a probabilistic model W4 (short for Who+Where+When+What) to exploit such data to discover individual users' mobility behaviors from spatial, temporal and activity aspects. To the best of our knowledge, our work offers the first solution to jointly model individual user's mobility behavior from the three aspects. Our model has a variety of applications, such as user profiling and location prediction; it can be employed to answer questions such as ``Can we infer the location of a user given a tweet posted by the user and the posting time?" Experimental results on two real-world datasets show that the proposed model is effective in discovering users' spatial-temporal topics, and outperforms state-of-the-art baselines significantly for the task of location prediction for tweets.
Conference Paper
Full-text available
Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
Article
Full-text available
Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model's ability to follow and generalize beyond user-specified domain knowledge.
Conference Paper
Aspect-level sentiment analysis or opinion mining consists of several core sub-tasks: aspect extraction, opinion identification, polarity classification, and separation of general and aspect-specific opinions. Various topic models have been proposed by researchers to address some of these sub-tasks. However, there is little work on modeling all of them together. In this paper, we first propose a holistic fine-grained topic model, called the JAST (Joint Aspect-based Sentiment Topic) model, that can simultaneously model all of above problems under a unified framework. To further improve it, we incorporate the idea of lifelong machine learning and propose a more advanced model, called the LAST (Lifelong Aspect-based Sentiment Topic) model. LAST automatically mines the prior knowledge of aspect, opinion, and their correspondence from other products or domains. Such knowledge is automatically extracted and incorporated into the proposed LAST model without any human involvement. Our experiments using reviews of a large number of product domains show major improvements of the proposed models over state-of-the-art baselines.
Article
Product feature (feature in brief) extraction is one of important tasks in opinion mining as it enables an opinion mining system to provide feature level opinions. Most existing feature extraction methods use only local context information (LCI) in a clause or a sentence (such as co-occurrence or dependency relation) for extraction. But global context information (GCI) is also helpful. In this paper, we propose a combined approach, which integrates LCI and GCI to extract and rank features based on feature score and frequency. Experimental evaluation shows that the combined approach does a good job. It outperforms the baseline extraction methods individually.
Article
Probabilistic topic models could be used to extract low-dimension aspects from document collections, and capture how the aspects change over time. However, such models without any human knowledge often produce aspects that are not interpretable. In recent years, a number of knowledge-based topic models and dynamic topic models have been proposed, but they could not process concept knowledge and temporal information in Wikipedia. In this paper, we fill this gap by proposing a new probabilistic modeling framework which combines both data-driven topic model and Wikipedia knowledge. With the supervision of Wikipedia knowledge, we could grasp more coherent aspects, namely, concepts, and detect the trends of concepts more accurately, the detected concept trends can reflect bursty content in text and people's concern. Our method could detect events and discover events specific entities in text. Experiments on New York Times and TechCrunch datasets show that our framework outperforms two baselines.
Article
The electronic word-of-mouth (e-WOM) is one of the most important among all the factors affecting consumers’ behaviours. Opinions towards a product through online reviews will influence purchase decisions of other online consumers by changing their perceptions on the product quality. Furthermore, each product aspect may impact consumers’ intentions differently. Thus, sentiment analysis and econometric models are incorporated to examine the relationship between purchase intentions and aspect-opinion pairs, which enable the weight estimation for each product aspect. We first identify product aspects and reduce dimensions to extract aspect-opinion pairs. Next the information gain is calculated for each aspect through entropy theory. Based on sentiment polarity and sentiment strength, we formulate an econometric model by integrating the information gain to measure the aspect’s weight. In the experiment, we track 386 digital cameras on Amazon for 39 months, and results show that the aspect weight for digital cameras is detected more precisely than TF-ID and HAC algorithms. The results will bridge product aspects and consumption intention to facilitate e-WOM-based marketing.
Article
Topic detection as a tool to detect topics from online media attracts much attention. Generally, a topic is characterized by a set of informative keywords/terms. Traditional approaches are usually based on various topic models, such as Latent Dirichlet Allocation (LDA). They cluster terms into a topic by mining semantic relations between terms. However, co-occurrence relations across the document are commonly neglected, which leads to the detection of incomplete information. Furthermore, the inability to discover latent co-occurrence relations via the context or other bridge terms prevents the important but rare topics from being detected. To tackle this issue, we propose a hybrid relations analysis approach to integrate semantic relations and co-occurrence relations for topic detection. Specifically, the approach fuses multiple relations into a term graph and detects topics from the graph using a graph analytical method. It can not only detect topics more effectively by combing mutually complementary relations, but also mine important rare topics by leveraging latent co-occurrence relations. Extensive experiments demonstrate the advantage of our approach over several benchmarks.
Conference Paper
Knowledge discovery in texts (KDT) has been widely applied for business data analysis, but it only reveals a common pattern based on large amounts of data. Since 2000, chance discovery (CD) as an extension of KDT has been proposed to detect rare but significant events or situations regarded as chance candidates for human decision making. Key Graph is a useful and important algorithm as well as a tool in CD for mining and visualizing these chances. However, a scenario graph visualized by Key Graph is machine-oriented, causing a bottleneck of human cognition. Traditional knowledge discovery also runs into the similar problem. In this paper, we propose a human-oriented algorithm called IdeaGraph which can generate a rich scenario graph for human's perception, comprehension and even innovation. IdeaGraph not only works on discovering more rare and significant chances, but also focuses on uncovering latent relationships among chances for gaining richer and deeper human insights. Our experiment has validated the advantages of IdeaGraph by comparing with Key Graph.
Conference Paper
Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Conference Paper
In this paper, we define the problem of topic-sentiment anal- ysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their asso- ciated sentiments. It could also provide general sentiment models that are applicable to any ad hoc topics. With a specifically designed HMM structure, the sentiment mod- els and topic models estimated with TSM can be utilized to extract topic life cycles and sentiment dynamics. Em- pirical experiments on different Weblog datasets show that this approach is effective for modeling the topic facets and sentiments and extracting their dynamics from Weblog col- lections. The TSM model is quite general; it can be applied to any text collections with a mixture of topics and senti- ments, thus has many potential applications, such as search result summarization, opinion tracking, and user behavior prediction.
Article
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.