Conference Paper

Finding a needle in a haystack of reviews

To read the full-text of this research, you can request a copy directly from the authors.


Online hotel searching is a daunting task due to the wealth of online information. Reviews written by other travelers replace the word-of-mouth, yet turn the search into a time consuming task. Users do not rate enough hotels to enable a collaborative filtering based recommendation. Thus, a cold start recommender system is needed. This demo describes briefly our cold start hotel recommender system, which uses the text of the reviews as its main data. We define context groups based on reviews extracted from and We introduce a novel weighted algorithm for text mining. We implemented our system which was used by the public to conduct 150 trip planning experiments. We compare our solution to the top suggestions of the mentioned web services and show that users were, on average, 20% more satisfied with our hotel recommendations. We outperform these web services even more in cities where hotel prices are high.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Sedangkan pada penelitian lain menyatakan bahwa sistem rekomendasi dapat diterapkan dalam rangka membantu para pelancong untuk mendapatkan tempat penginapan (hotel) yang baik dan sesuai dengan kebutuhan [6]. Sistem rekomendasi ini dikembangkan dengan menggunakan data review yang ditulis oleh para wisatawan lain yang pernah menginap di sebuah hotel dengan menerapkan teknik text-mining menggunakan algoritma unsupervised clustering. ...
Full-text available
Jumlah penyelenggara pendidikan di Sragen telah mengalami peningkatan dalam kurun beberapa tahun terakhir. Hal ini juga salah satu akibat dari semakin banyaknya jumlah siswa di wilayah tersebut. Namun dengan adanya peningkatan jumlah ini ternyata tidak diimbangi dengan pengelolaan data sekolah yang baik misalnya menggunakan teknologi informasi. Tidak adanya pusat data yang terintegrasi secara baik dapat menyebabkan pihak sekolah bahkan masyarakat sulit untuk mendapatkan informasi yang valid sehingga banyak calon siswa yang mengalami kesulitan untuk menentukan sekolah menengah tingkat atas yang akan dituju. Berdasarkan hal tersebut, sebuah data warehouse diperlukan sebagai pengelola data sekolah-sekolah secara terintegrasi dengan baik di Sragen dan dikembangkan sistem rekomendasi dengan menerapkan teknik data mining. Dengan demikian, data-data sekolah dapat digali untuk menghasilkan suatu informasi strategis yang dapat dimanfaatkan sebagai rekomendasi bagi calon siswa sekolah menengah atas untuk menentukan pilihan sekolah yang dituju. Suatu diagram snowflake dirancang sebagai langkah awal dalam pengembangan data warehouse. Sementara itu, sistem rekomendasi dibangun menggunakan metode naïve bayes dengan cara menghitung probabilitas masing-masing kriteria yang diajukan dalam perhitungan. Adapun kriteria yang digunakan yaitu biaya sekolah, jarak sekolah, nilai akreditasi, tingkat kelulusan, dan nilai rerata ujian akhir nasional. Hasil penelitian menunjukkan bahwa sebuah data warehouse telah berhasil dibangun sebagai pengelola data sekolah di Sragen yang terintegrasi dan terhubung dengan sistem rekomendasi untuk membantu calon siswa memilih sekolah yang sesuai berdasarkan kriteria-kriteria yang diajukan. Rekomendasi yang dihasilkan oleh sistem berdasarkan nilai confidence tertinggi dari setiap variabel masing-masing sekolah, prioritas variabel, serta urutan peringkat sekolah.
Music occupies a very important space in the heart and life of common people and it is rather subjective and universal nature indeed. Music Identifier System is obviously concerned with providing a very meaningful and personalized recommendation of items i.e. songs, music, playlist according to the mood, emotion, interest and preference of the users or listeners. With the advancement of technologies, rapid development of internet, it has become very common to use the streaming services to listen and enjoy music or songs in more convenient ways. In this paper, an attempt has been made to perform a comparative analysis, systematic research, empirical thorough review on various approaches or strategies proposed and applied by different researchers in the task of designing an effective system for music identification or recommendation. The basic theme of the paper includes music identifier system, its components, and different features along with emphasize on the methods, metrics, general framework and state-of-art strategies proposed during the last two decades or so, have been empirically reviewed. The existing studies were found lacking with systematic research work on the behaviour, requirements and preferences of the users plus poor level of extraction of features and limitations in the area of evaluation of performance of the music identifier systems. Although, the study reveals that systems based on effective, social information, emotional-traits, content, context and knowledge have been widely applied and improved the quality of identification or recommendation of music to a large extend but still it is not enough. In future, more in-depth studies or research work need to be conducted based on enlarging the scope of further development of personalized contextual awareness based music identifier system and generating a continuous and automatic top playlist of music and songs with added tracks matching with profile, mood, emotional traits, and behaviour of the user in a mobile environment.
For generating hotel recommendations, clustering travelers has been demonstrated to be a viable method to elevate traveler satisfaction with the recommendation results. However, most of the existing methods that adopt this approach cluster travelers according to a variety of traveler or hotel attributes, which may not necessarily be appropriate for use in an online application such as ubiquitous hotel recommendation. To overcome this problem, a fuzzy ubiquitous traveler clustering and hotel recommendation (FUTCHR) system was developed in this study. The FUTCHR system clustered travelers according to their decision-making mechanisms that are fitted by comparing travelers’ choices with the recommendation results in the historical data. To generate recommendations, a fuzzy mixed binary-nonlinear programming model was constructed and solved. The novelty of the proposed methodology is to cluster travelers without knowing their characteristics but according to the differences in their decision-making mechanisms. The FUTCHR system was employed in a regional study, and the successful recommendation rate was superior to three existing methods in this field.
Full-text available
With the extensive development of big data and social networks, the user profile field has received much attention. User profiling is essential for understanding the characteristics of various users, contributing to better understanding of their requirements in specific scenarios. User-generated contents which directly reflect people’s thoughts and intention are a valuable source for profiling users, among which user reviews by nature are invaluable sources for acquiring user requirements and have drawn increasing attention from both academia and industry. However, review-based user profiling (RBUP), as an emerging research direction, has not been systematically reviewed, hindering researchers from further investigation. In this work, we carry out a systematic mapping study on review-based user profiling, with an emphasis on investigating the generic analysis process of RBUP and identifying potential research directions. Specifically, 51 out of 2478 papers were carefully selected for investigation under a standardized and systematic procedure. By carrying out in-depth analysis over such papers, we have identified a generic process that should be followed to perform review-based user profiling. In addition, we perform multi-dimensional analysis on each step of the process in order to review current research progress and identify challenges and potential research directions. The results show that although traditional methods have been continuously improved, they are not sufficient to unleash the full potential of large-scale user reviews, especially the use of heterogeneous data for multi-dimensional user profiling.
Full-text available
Personalized recommendation of Points of Interest (POIs) plays a key role in satisfying users on Location-Based Social Networks (LBSNs). In this article, we propose a probabilistic model to find the mapping between user-annotated tags and locations’ taste keywords. Furthermore, we introduce a dataset on locations’ contextual appropriateness and demonstrate its usefulness in predicting the contextual relevance of locations. We investigate four approaches to use our proposed mapping for addressing the data sparsity problem: one model to reduce the dimensionality of location taste keywords and three models to predict user tags for a new location. Moreover, we present different scores calculated from multiple LBSNs and show how we incorporate new information from the mapping into a POI recommendation approach. Then, the computed scores are integrated using learning to rank techniques. The experiments on two TREC datasets show the effectiveness of our approach, beating state-of-the-art methods.
Full-text available
A challenge facing all ubiquitous clinic recommendation systems is that patients often have difficulty articulating their requirements. To overcome this problem, a ubiquitous clinic recommendation mechanism was designed in this study by mining the clinic preferences of patients. Their preferences were defined using the weights in the ubiquitous clinic recommendation mechanism. An integer nonlinear programming problem was solved to tune the values of the weights on a rolling basis. In addition, since it may take a long time to adjust the values of weights to their asymptotic values, the back propagation network (BPN)-response surface method (RSM) method is applied to estimate the asymptotic values of weights. The proposed methodology was tested in a regional study. Experimental results indicated that the ubiquitous clinic recommendation system outperformed several existing methods in improving the successful recommendation rate.
Conference Paper
Full-text available
Personalized context-aware venue suggestion plays a critical role in satisfying the users needs on location-based social networks (LBSNs). In this paper, we present a set of novel scores to measure the similarity between a user and a candidate venue in a new city. The scores are based on user's history of preferences in other cities as well as user's context. We address the data sparsity problem in venue recommendation with the aid of a proposed approach to predict contextually appropriate venues. Furthermore, we show how to incorporate different scores to improve the performance of recommendation. The experimental results of our participation in the TREC 2016 Contextual Suggestion track shows that our approach beats state-of-the-art strategies.
Conference Paper
This paper extends a previous work done by the same authors [1] having the aim of improving the predictions coming from a matrix factorization based on latent factor models through an ensemble with the predictions obtained by an Opinion Mining methodology based on a linguistic approach. The experimental analysis was carried out on the Yelp business dataset, limited to the Restaurant category. An hypothesis of influence of the restaurant average rating on the number of stars given by the users is tested. An analysis of the meaning of some of the latent factors is shown.
Conference Paper
Popular microblogging sites such as Tumblr have attracted hundreds of millions of users as a content sharing platform, where users can create rich content in the form of posts that are shared with other users who follow them. Due to the sheer amount of posts created on such services, an important task is to make quality recommendations of blogs for users to follow. Apart from traditional recommender system settings where the follower graph is the main data source, additional side-information of users and blogs such as user activity (e.g., like and reblog) and rich content (e.g., text and images) are also available to be exploited for enhanced recommendation performance. In this paper, we propose a novel boosted inductive matrix completion method (BIMC) for blog recommendation. BIMC is an additive low-rank model for user-blog preferences consisting of two components; one component captures the low-rank structure of follow relationships and the other captures the latent structure using side-information. Our model formulation combines the power of the recently proposed inductive matrix completion (IMC) model (for side-information) together with a standard matrix completion (MC) model (for low-rank structure). Furthermore, we utilize recently developed deep learning techniques to obtain semantically rich feature representations of text and images that are incorporated in BIMC. Experiments on a large-scale real-world dataset from Tumblr illustrate the effectiveness of the proposed BIMC method.
Full-text available
E-commerce develops rapidly. Learning and taking good advantage of the myriad reviews from online customers has become crucial to the success in this game, which calls for increasingly more accuracy in sentiment classification of these reviews. Therefore the finer-grained review rating prediction is preferred over the rough binary sentiment classification. There are mainly two types of method in current review rating prediction. One includes methods based on review text content which focus almost exclusively on textual content and seldom relate to those reviewers and items remarked in other relevant reviews. The other one contains methods based on collaborative filtering which extract information from previous records in the reviewer-item rating matrix, however, ignoring review textual content. Here we proposed a framework for review rating prediction which shows the effective combination of the two. Then we further proposed three specific methods under this framework. Experiments on two movie review datasets demonstrate that our review rating prediction framework has better performance than those previous methods.
To design a useful recommender system, it is important to understand how products relate to each other. For example, while a user is browsing mobile phones, it might make sense to recommend other phones, but once they buy a phone, we might instead want to recommend batteries, cases, or chargers. In economics, these two types of recommendations are referred to as substitutes and complements: substitutes are products that can be purchased instead of each other, while complements are products that can be purchased in addition to each other. Such relationships are essential as they help us to identify items that are relevant to a user's search. Our goal in this paper is to learn the semantics of substitutes and complements from the text of online reviews. We treat this as a supervised learning problem, trained using networks of products derived from browsing and co-purchasing logs. Methodologically, we build topic models that are trained to automatically discover topics from product reviews that are successful at predicting and explaining such relationships. Experimentally, we evaluate our system on the Amazon product catalog, a large dataset consisting of 9 million products, 237 million links, and 144 million reviews.
Conference Paper
Full-text available
Context has been recognized as an important factor in constructing personalized recommender systems. However, most context-aware recommendation techniques mainly aim at exploiting item-level contextual information for modeling users' preferences, while few works attempt to detect more fine-grained aspect-level contextual preferences. Therefore, in this article, we propose a contextual recommendation algorithm based on user-generated reviews, from where users' context-dependent preferences are inferred through different contextual weight-ing strategies. The context-dependent preferences are further combined with users' context-independent preferences for performing recommendation. The empirical results on two real-life datasets demonstrate that our method is capable of capturing users' contextual preferences and achieving better recommendation accuracy than the related works.
Conference Paper
Full-text available
Recommending products to new buyers is an important problem for online shopping services, since there are always new buyers joining a deployed system. In some recommender systems, a new buyer will be asked to indicate her/his preferences on some attributes of the product (like camera) in order to address the so called cold-start problem. Such collected preferences are usually not complete due to the user's cog-nitive limitation and/or unfamiliarity with the product domain, which are called partial preferences. The fundamental challenge of recommendation is thus that it may be difficult to accurately and reliably find some like-minded users via collaborative filtering techniques or match inherently preferred products with content-based methods. In this paper, we propose to leverage some auxiliary data of online reviewers' aspect-level opinions, so as to predict the buyer's missing preferences. The resulted user preferences are likely to be more accurate and complete. Experiment on a real user-study data and a crawled Amazon review data shows that our solution achieves better recommendation performance than several baseline methods.
Conference Paper
Full-text available
Recommender systems are important building blocks in many of today's e-commerce applications including targeted advertising, personalized mar-keting and information retrieval. In recent years, the importance of contextual information has moti-vated many researchers to focus on designing sys-tems that produce personalized recommendations in accordance with the available contextual infor-mation of users. Compared to the traditional sys-tems that mainly utilize users' preference history, context-aware recommender systems provide more relevant results to users. We introduce a context-aware recommender system that obtains contextual information by mining user reviews and combining them with user rating history to compute a utility function over a set of items. An item utility is a measure that shows how much it is preferred ac-cording to user's current context. In our system, the context inference is modeled as a supervised topic-modeling problem in which a set of categories for a contextual attribute constitute the topic set. As an example application, we used our method to mine hidden contextual data from customers' reviews of hotels and use it to produce context-aware recom-mendations. Our evaluations suggest that our sys-tem can help produce better recommendations in comparison to a standard kN N recommender sys-tem.
Full-text available
Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people's sentiments, appraisals or feelings toward entities, events and their properties. The concept of opinion is very broad. In this chapter, we only focus on opinion expressions that convey people's positive or negative sentiments. Much of the existing research on textual information processing has been focused on mining and retrieval of factual information, e.g., information retrieval, Web search, text classification, text clustering and many other text mining and natural language processing tasks. Little work had been done on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others' opinions. This is not only true for individuals but also true for organizations. One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web. Before the Web, when an individual needed to make a decision, he/she typically asked for opinions from friends and families. When an organization wanted to find the opinions or sentiments of the general public about its products and services, it conducted opinion polls, surveys, and focus groups. However, with the Web, especially with the explosive growth of the user-generated content on the Web in the past few years, the world has been transformed. The Web has dramatically changed the way that people express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in Internet forums, discussion groups, and blogs, which are collectively called the user-generated content. This online word-of-mouth behavior represents new and measurable sources of information with many practical applications. Now if one wants to buy a product, he/she is no longer limited to asking his/her friends and families because there are many product reviews on the Web which give opinions of existing users of the product. For a company, it may no longer be necessary to conduct surveys, organize focus groups or employ external consultants in order to find consumer opinions about its products and those of its competitors because the user-generated content on the Web can already give them such information.
Full-text available
In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recom-mendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires dif-ferent amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.
Conference Paper
Full-text available
Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.
Conference Paper
Full-text available
One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study the problem of determining the semantic orientations (positive, negative or neutral) of opinions expressed on product features in reviews. This problem has many applications, e.g., opinion mining, summarization and search. Most existing techniques utilize a list of opinion (bearing) words (also called opinion lexicon) for the purpose. Opinion words are words that express desirable (e.g., great, amazing, etc.) or undesirable (e.g., bad, poor, etc) states. These approaches, however, all have some major shortcomings. In this paper, we propose a holistic lexicon-based approach to solving the problem by exploiting external evidences and linguistic conventions of natural language expressions. This approach allows the system to handle opinion words that are context dependent, which cause major difficulties for existing algorithms. It also deals with many special words, phrases and language constructs which have impacts on opinions based on their linguistic patterns. It also has an effective function for aggregating multiple conflicting opinion words in a sentence. A system, called Opinion Observer, based on the proposed technique has been implemented. Experimental results using a benchmark product review data set and some additional reviews show that the proposed technique is highly effective. It outperforms existing methods significantly.
Conference Paper
Full-text available
We have developed a method for recommending items that combines content and collaborative data under a single probabilistic framework. We benchmark our algorithm against a naïve Bayes classifier on the cold-start problem, where we wish to recommend items that no one in the community has yet rated. We systematically explore three testing methodologies using a publicly available data set, and explain how these methods apply to specific real-world applications. We advocate heuristic recommenders when benchmarking to give competent baseline performance. We introduce a new performance metric, the CROC curve, and demonstrate empirically that the various components of our testing strategy combine to obtain deeper understanding of the performance characteristics of recommender systems. Though the emphasis of our testing is on cold-start recommending, our methods for recommending and evaluation are general.
Full-text available
Recommender systems are an effective tool to help find items of interest from an overwhelming number of available items. Collaborative Filtering (CF), the best known technology for recommender systems, is based on the idea that a set of like-minded users can help each other find useful information. A new user poses a challenge to CF recommenders, since the system has no knowledge about the preferences of the new user, and therefore cannot provide personalized recommendations. A new user preference elicitation strategy needs to ensure that the user does not a) abandon a lengthy signup process, and b) lose interest in returning to the site due to the low quality of initial recommendations. We extend the work of [23] in this paper by incrementally developing a set of information theoretic strategies for the new user problem. We propose an offline simulation framework, and evaluate the strategies through extensive offline simulations and an online experiment with real users of a live recommender system.
Full-text available
Recommender systems attempt to predict items in which a user might be interested, given some information about the user's and items' profiles. Most existing recommender systems use content-based or collaborative filtering methods or hybrid methods that combine both techniques (see the sidebar for more details). We created Informed Recommender to address the problem of using consumer opinion about products, expressed online in free-form text, to generate product recommendations. Informed recommender uses prioritized consumer product reviews to make recommendations. Using text-mining techniques, it maps each piece of each review comment automatically into an ontology
Full-text available
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks. Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Reports
Full-text available
WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
Full-text available
A fast community detection algorithm based on a q-state Potts model is presented. Communities (groups of densely interconnected nodes that are only loosely connected to the rest of the network) are found to coincide with the domains of equal spin value in the minima of a modified Potts spin glass Hamiltonian. Comparing global and local minima of the Hamiltonian allows for the detection of overlapping ("fuzzy") communities and quantifying the association of nodes with multiple communities as well as the robustness of a community. No prior knowledge of the number of communities has to be assumed.
Full-text available
Starting from a general ansatz, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass. Our approach applies to weighted and directed networks alike. It contains the ad hoc introduced quality function from [J. Reichardt and S. Bornholdt, Phys. Rev. Lett. 93, 218701 (2004)] and the modularity Q as defined by Newman and Girvan [Phys. Rev. E 69, 026113 (2004)] as special cases. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further, we show how hierarchies and overlap in the community structure can be detected. Computationally efficient local update rules for optimization procedures to find the ground state are given. We show how the ansatz may be used to discover the community around a given node without detecting all communities in the full network and we give benchmarks for the performance of this extension. Finally, we give expectation values for the modularity of random graphs, which can be used in the assessment of statistical significance of community structure.
Conference Paper
Full-text available
Modern information retrieval systems match the terms included in a user's query with available documents, through the use of an index. A fuzzy thesaurus is used to enrich the query with associated terms. In this work, we use semantic entities, rather than terms; this allows us to use knowledge stored in a semantic encyclopedia, specifically the ordering relations, in order to perform a semantic expansion of the query. The process of query expansion takes into account the query context, which is defined as a fuzzy set of semantic entities. Furthermore, we integrate our approach with the user's profile.
Full-text available
Recommender systems have become valuable resources for users seeking intelligent ways to search through the enormous volume of information available to them. One crucial unsolved problem for recommender systems is how best to learn about a new user. In this paper we study six techniques that collaborative filtering recommender systems can use to learn about new users. These techniques select a sequence of items for the collaborative filtering system to present to each new user for rating. The techniques include the use of information theory to select the items that will give the most value to the recommender system, aggregate statistics to select the items the user is most likely to have an opinion about, balanced techniques that seek to maximize the expected number of bits learned per presented item, and personalized techniques that predict which items a user will have an opinion about. We study the techniques thru offline experiments with a large preexisting user data set, and thru a live experiment with over 300 users. We show that the choice of learning technique significantly affects the user experience, in both the user effort and the accuracy of the resulting predictions.
Full-text available
Several techniques are currently used to evaluate recommender systems. These techniques involve off-line analysis using evaluation methods from machine learning and information retrieval. We argue that while off-line analysis is useful, user satisfaction with a recommendation strategy can only be measured in an on-line context. We propose a new evaluation framework which involves a paired test of two recommender systems which simultaneously compete to give the best recommendations to the same user at the same time. The user interface and the interaction model for each system is the same. The framework enables you to specify an API so that different recommendation strategies may take part in such a competition. The API defines issues such as access to data, the interaction model and the means of gathering positive feedback from the user. In this way it is possible to obtain a relative measure of user satisfaction with the two systems.
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.
In this paper, we give an overview of our work to investigate the integration of context into different kind of recommender systems. Context adds an additional another dimension to the user-item data model of recommender system and can be utilized in different ways during content-based or collaborative recommendation processes. We give several application examples we are working on to apply contextual recommenders in real world scenarios.
The importance of contextual information has been recognized by researchers and practitioners in many disciplines, including e-commerce personalization, information retrieval, ubiquitous and mobile computing, data mining, marketing, and management. While a substantial amount of research has already been performed in the area of recommender systems, most existing approaches focus on recommending the most relevant items to users without taking into account any additional contextual information, such as time, location, or the company of other people (e.g., for watching movies or dining out). In this chapter we argue that relevant contextual information does matter in recommender systems and that it is important to take this information into account when providing recommendations. We discuss the general notion of context and how it can be modeled in recommender systems. Furthermore, we introduce three different algorithmic paradigms – contextual prefiltering, post-filtering, and modeling – for incorporating contextual information into the recommendation process, discuss the possibilities of combining several contextaware recommendation techniques into a single unifying approach, and provide a case study of one such combined approach. Finally, we present additional capabilities for context-aware recommenders and discuss important and promising directions for future research.
On the Web, where the search costs are low and the competition is just a mouse click away, it is crucial to segment the customers intelligently in order to offer more targeted and personalized products and services to them. Traditionally, customer segmentation is achieved using statistics-based methods that compute a set of statistics from the customer data and group customers into segments by applying distance-based clustering algorithms in the space of these statistics. In this paper, we present a direct grouping-based approach to computing customer segments that groups customers not based on computed statistics, but in terms of optimally combining transactional data of several customers to build a data mining model of customer behavior for each group. Then, building customer segments becomes a combinatorial optimization problem of finding the best partitioning of the customer base into disjoint groups. This paper shows that finding an optimal customer partition is NP-hard, proposes several suboptimal direct grouping segmentation methods, and empirically compares them among themselves, traditional statistics-based hierarchical and affinity propagation-based segmentation, and one-to-one methods across multiple experimental conditions. It is shown that the best direct grouping method significantly dominates the statistics-based and one-to-one approaches across most of the experimental conditions, while still being computationally tractable. It is also shown that the distribution of the sizes of customer segments generated by the best direct grouping method follows a power law distribution and that microsegmentation provides the best approach to personalization.
Conference Paper
Collaborative Filter (CF) methods supply favorably personalized predictions relying on adequate data from users. But the ratings, of new users or about new items are not always available and CF can’t make a precise recommendation in this case. In our paper, we present our consideration on alleviating cold-start problem by using users’ implicit feedback data, which is not the same as the traditional methods which focus completely on the sparse data. To exploit the significance of users’ implicit feedback for alleviating cold-start problem, we present two independent strategies—the neural network-based M1 method and the collaboration-based M2 method, by which the significance of users’ implicit feedback for cold-start recommendation has been preliminarily demonstrated.
Conference Paper
Standard Sentiment Analysis applies Natural Language Processing methods to assess an "approval" value of a given text, categorizing it into "negative", "neutral", or "positive" or on a linear scale. Sentiment Analysis can be used to infer ratings values for users based on textual reviews of items such as books, films, or products. We propose an approach to generalizing the concept to multiple dimensions to estimate user ratings along multiple axes such as "service", "price" and "value". We use Canonical Correlation Analysis (CCA) and derive a mathematical model that can be used as a multivariate regression tool. This model has a number of valuable properties: it can be trained offline and used efficiently on live stream of texts like blogs and tweets, can be used for visualization and data clustering and labeling, and finally it can potentially be incorporated into natural language product search algorithms. At the end we propose an evaluation procedure that can be used on live data when a ground truth is not available. Based on this model we present our preliminary results from empirical data that we have collected from our system Opinion Space1. We show that for this dataset the CCA model outperforms the PCA that was originally used in Opinion Space.
Conference Paper
Recommender systems are widely used in online e-commerce applications to improve user engagement and then to in- crease revenue. A key challenge for recommender systems is providing high quality recommendation to users in \cold- start" situations. We consider three types of cold-start prob- lems: 1) recommendation on existing items for new users; 2) recommendation on new items for existing users; 3) rec- ommendation on new items for new users. We propose predictive feature-based regression models that leverage all available information of users and items, such as user de- mographic information and item content features, to tackle cold-start problems. The resulting algorithms scale e- ciently as a linear function of the number of observations. We verify the usefulness of our approach in three cold-start settings on the MovieLens and EachMovie datasets, by com- paring with ve alternatives including random, most popu- lar, segmented most popular, and two variations of Vibes anity algorithm widely used at Yahoo! for recommenda- tion.
Conference Paper
The paper studies the Long Tail problem of recommender systems when many items in the Long Tail have only few ratings, thus making it hard to use them in recommender systems. The approach presented in the paper splits the whole itemset into the head and the tail parts and clusters only the tail items. Then recommendations for the tail items are based on the ratings in these clusters and for the head items on the ratings of individual items. If such partition and clustering are done properly, we show that this reduces the recommendation error rates for the tail items, while maintaining reasonable computational performance.
Conference Paper
The importance of contextual information has been recognized by researchers and practitioners in many disciplines, including e-commerce personalization, information retrieval, ubiquitous and mobile computing, data mining, marketing, and management. While a substantial amount of research has already been performed in the area of recommender systems, many existing approaches focus on recommending the most relevant items to users without taking into account any additional contextual information, such as time, location, or the company of other people (e.g.,for watching movies or dining out). There is growing understanding that relevant contextual information does matter in recommender systems and that it is important to take this information into account when providing recommendations. We discuss the general notion of context and how it can be modeled in recommender systems. We also discuss three popular algorithmic paradigms—contextual pre-filtering, post-filtering, and modeling—for incorporating contextual information into the recommendation process, and survey recent work on context-aware recommender systems. We also discuss important directions for future research.
Conference Paper
Recommender systems for automatically suggested items of interest to users have become increasingly essential in fields where mass personalization is highly valued. The popular core techniques of such systems are collaborative filtering, content-based filtering and combinations of these. In this paper, we discuss hybrid approaches, using collaborative and also content data to address cold-start - that is, giving recommendations to novel users who have no preference on any items, or recommending items that no user of the community has seen yet. While there have been lots of studies on solving the item-side problems, solution for user-side problems has not been seen public. So we develop a hybrid model based on the analysis of two probabilistic aspect models using pure collaborative filtering to combine with users' information. The experiments with MovieLen data indicate substantial and consistent improvements of this model in overcoming the cold-start user-side problem.
Conference Paper
Online reviews are an important asset for users deciding to buy a product, see a movie, or go to a restaurant, as well as for busi- nesses tracking user feedback. However, most reviews are written in a free-text format, and are therefore difficult for computer sys- tems to understand, analyze, and aggregate. One consequence of this lack of structure is that searching text reviews is often frus- trating for users. User experience would be greatly improved if the structure and sentiment conveyed in the content of the reviews were taken into account. Our work focuses on identifying this in- formation from free-form text reviews, and using the knowledge to improve user experience in accessing reviews. Specifically, we focused on improving recommendation accuracy in a restaurant re- view scenario. In this paper, we report on our classification effort, and on the insight on user-reviewing behavior that we gained in the process. We propose new ad-hoc and regression-based recommen- dation measures, that both take into account the textual component of user reviews. Our results show that using textual information re- sults in better general or personalized review score predictions than those derived from the numerical star ratings given by the users.
Conference Paper
With the increase in popularity of online review sites comes a corresponding need for tools capable of extracting the information most important to the user from the plain text data. Due to the diversity in products and services being reviewed, supervised methods are often not practical. We present an unsuper-vised system for extracting aspects and determining sentiment in review text. The method is simple and flexible with regard to domain and language, and takes into account the influence of aspect on sentiment polarity, an issue largely ignored in previous literature. We demonstrate its effectiveness on both component tasks, where it achieves similar results to more complex semi-supervised methods that are restricted by their reliance on manual annotation and extensive knowledge sources.
On the Web, where the search costs are low and the competition is just a mouse click away, it is crucial to segment the customers intelligently in order to offer more targeted and personalized products and services to them. Traditionally, customer segmentation is achieved using statistics-based methods that compute a set of statistics from the customer data and group customers into segments by applying distance-based clustering algorithms in the space of these statistics. In this paper, we present a direct grouping based approach to computing customer segments that groups customers not based on computed statistics, but in terms of optimally combining transactional data of several customers to build a data mining model of customer behavior for each group. Then building customer segments becomes a combinatorial optimization problem of finding the best partitioning of the customer base into disjoint groups. The paper shows that finding an optimal customer partition is NP-hard, proposes a suboptimal direct grouping segmentation method and empirically compares it against traditional statistics-based segmentation and 1-to-1 methods across multiple experimental conditions. We show that the direct grouping method significantly dominates the statistics-based and 1-to-1 approaches across all the experimental conditions, while still being computationally tractable. We also show that there are very few size-one customer segments generated by the best direct grouping method and that micro-segmentation provides the best approach to personalization.
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.
Online Travel Market
  • N M T Watch
N. M. T. Watch, "Online Travel Market," April 2011. [Online]. Available:
Lexical acquisition: exploiting on-line resources to build a lexicon
  • K Church
  • W Gale
  • P Hanks
  • D Kindle
K. Church, W. Gale, P. Hanks, and D. Kindle, "6. using statistics in lexical analysis," Lexical acquisition: exploiting on-line resources to build a lexicon, 1991.
An unsupervised aspect-sentiment model for online reviews, " in Human Language Technologies: The 2010 Annual Conference of the North American Chapter
  • S Brody
  • N Elhadad
S. Brody and N. Elhadad, " An unsupervised aspect-sentiment model for online reviews, " in Human Language Technologies: The 2010 Annual Conference of the North American Chapter. Association for Computational Linguistics, 2010.
Statistical mechanics of community detection
  • J Reichardt
  • S Bornholdt
J. Reichardt and S. Bornholdt, "Statistical mechanics of community detection," Physical Review E, 2006.