Conference Paper

A Method for Evaluating the Navigability of Recommendation Algorithms

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recommendations are increasingly used to support and enable discovery, browsing and exploration of large item collections, especially when no clear classification of items exists. Yet, the suitability of a recommendation algorithm to support these use cases cannot be comprehensively evaluated by any evaluation measures proposed so far. In this paper, we propose a method to expand the repertoire of existing recommendation evaluation techniques with a method to evaluate the navigability of recommendation algorithms. The proposed method combines approaches from network science and information retrieval and evaluates navigability by simulating three different models of information seeking scenarios and measuring the success rates. We show the feasibility of our method by applying it to four non-personalized recommendation algorithms on three datasets and also illustrate its applicability to personalized algorithms. Our work expands the arsenal of evaluation techniques for recommendation algorithms, extends from a one-click-based evaluation towards multi-click analysis and presents a general, comprehensive method to evaluating navigability of arbitrary recommendation algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Dans le cadre des graphes, cela correspond également à la capacité d'accéder aisément à l'ensemble des informations disponibles. Cette notion a été partiellement abordée dans des travaux tels que les graphes de recommandation (Lamprecht et al., 2016), notamment dans le domaine musical où Seyerlehner et al. (2009) tentent d'améliorer l'explorabilité (sous la notion de browsability dans leurs travaux) des recommandations musicales. Dans leurs travaux, cette explorabilité se limite à la possibilité de consulter l'ensemble de la collection en allant de recommandation en recommandation, sans toutefois s'intéresser à la réelle accessibilité des documents, dont certains peuvent se trouver en bout d'une longue chaîne de recommandations successives. ...
... Le terme « navigabilité » a été utilisé afin de décrire des graphes que nous appelons dans cette thèse « explorables », par exemple dans le domaine de la recommandation (Lamprecht et al., 2016). Or, la navigabilité fait également référence à une propriété mathématique des graphes, issue de l'analyse du « phénomène du petit monde » (Milgram, 1967). ...
Thesis
Cette thèse en informatique s’intéresse à la structuration et à l’exploration de collections journalistiques. Elle fait appel à plusieurs domaines de recherches : sciences sociales, à travers l’étude de la production journalistique ; ergonomie; traitement des langues et la recherche d’information ; multimédia et notamment la recherche d’information multimédia. Une branche de la recherche d’information multimédia, appelée hyperliage, constitue la base sur laquelle cette thèse est construite. L’hyperliage consiste à construire automatiquement des liens entre documents multimédias. Nous étendons ce concept en l’appliquant à l’entièreté d’une collection afin d’obtenir un hypergraphe, et nous intéressons notamment à ses caractéristiques topologiques.Nous proposons dans cette thèse des améliorations de l’état de l’art selon trois axes principaux : une structuration de collections d’actualités à l’aide de graphes mutlisources et multimodaux fondée sur la création de liens inter-documents, son association à une diversité importante des liens permettant de représenter la grande variété des intérêts que peuvent avoir différents utilisateurs, et enfin l’ajout d’un typage des liens créés permettant d’expliciter la relation existant entre deux documents. Ces différents apports sont renforcés par des études utilisateurs démontrant leurs intérêts respectifs.
... The task of searching for and navigating in large networks has been extensively studied in the past for a broad range of domains such as the Web, social networks and peer-to-peer networks [2], [10], [11], [12], [13], [21], [25], [26]. ...
Preprint
The groundbreaking experiment of Travers and Milgram demonstrated the so-called "six degrees of separation" phenomenon, by which any individual in the world is able to contact an arbitrary, hitherto-unknown, individual by means of a short chain of social ties. Despite the large number of empirical and theoretical studies to explain the Travers-Milgram experiment, some fundamental questions are still open: why some individuals are more likely than others to discover short friend-of-a-friend communication chains? Can we rank individuals on the basis of their ability to discover short chains? To answer these questions, we extend the concept of potential gain, originally defined in the context of Web analysis, to social networks and we define a novel index, called "the navigability score," that ranks nodes in a network on the basis of how their position facilitates the discover of short chains that connect to arbitrary target nodes in the network. We define two variants of potential gain, called the geometric and the exponential potential gain, and present fast algorithms to compute them. Our theoretical and experimental analysis proves that the computation of the geometric and exponential gain are affordable even on large real-life graphs.
... Recommender system as defined from the perspective of E commerce as a tool that helps users search through records of knowledge which is related to users interest and preference for a recommender system to implement its core function of identifying useful items for the user, [1,2]. In [3] RSs is defined as a means of assisting and augmenting the social process of using recommendations of others to make choices when there is no sufficient personal knowledge or experience of the alternatives. (RS) must predict that an item is worth recommending [4,5]. ...
Article
Full-text available
The era of big data has witnessed the explosion of tensor datasets, and large scale Probabilistic Tensor Factorization (PTF) analysis is important to accommodate such increasing trend of data. Sparsity, and Cold-Start are some of the inherent problems of recommender systems in the era of big data. This paper proposes a novel Sentiment-Based Probabilistic Tensor Analysis technique senti-PTF to address the problems. The propose framework first applies a Natural Language Processing technique to perform sentiment analysis taking advantage of the huge sums of textual data generated available from the social media which are predominantly left untouched. Although some current studies do employ review texts, many of them do not consider how sentiments in reviews influence recommendation algorithm for prediction. There is therefore this big data text analytics gap whose modeling is computationally expensive. From our experiments, our novel machine learning sentiment-based tensor analysis is computationally less expensive, and addresses the cold-start problem, for optimal recommendation prediction.
Article
Centrality metrics are a popular tool in Network Science to identify important nodes within a graph. We introduce the Potential Gain as a centrality measure that unifies many walk-based centrality metrics in graphs and captures the notion of node navigability, interpreted as the property of being reachable from anywhere else (in the graph) through short walks. Two instances of the Potential Gain (called the Geometric and the Exponential Potential Gain ) are presented and we describe scalable algorithms for computing them on large graphs. We also give a proof of the relationship between the new measures and established centralities. The geometric potential gain of a node can thus be characterized as the product of its Degree centrality by its Katz centrality scores. At the same time, the exponential potential gain of a node is proved to be the product of Degree centrality by its Communicability index. These formal results connect potential gain to both the “popularity” and “similarity” properties that are captured by the above centralities.
Conference Paper
The success of Google’s PageRank algorithm popularized graphs as a tool to model the web’s navigability. At that time, the web topology was resulting from human edition of hyper-links. Nowadays, that topology is mostly resulting from algorithms. In this paper, we propose to study the topology realized by a class of such algorithms: recommenders. By modeling the output of recommenders as graphs, we show that a vast array of topological observations become easily accessible, using a simple web-crawler. We give models and illustrations for those graph representations. We then propose a graph-based methodology for addressing an algorithmic transparency problem: recommendation bias detection. We illustrate this approach on YouTube crawls, targeting the prediction of “Recommended for you” links.
Conference Paper
Full-text available
The Internet Movie Database (IMDb) is the world's largest collection of facts about movies and features large-scale recommendation systems connecting hundreds of thousands of items. In the past, the principal evaluation criterion for such recommender systems has been the rating accuracy prediction for recommendations within the immediate one-hop-neighborhood. Apart from a few isolated studies, the evaluation methodology for recommender systems has so far lacked approaches that quantify and measure the exposure to novel content while navigating a recommender system. As such, little is known about the support for navigation and browsing as methods to explore, browse and discover novel items within these systems. In this article, we study the navigability of IMDb's recommender systems over multiple hops. To this end, we analyze the recommendation networks of IMDb with a two-level approach: First, we study reachability in terms of components, path lengths and a bow-tie analysis. Second, we simulate practical browsing scenarios based on greedy decentralized search. Our results show that the IMDb recommendation networks are not very well-suited for navigation scenarios. To mitigate this, we apply a method for diversifying recommendations by specifically selecting recommendations which improve connectivity but do not compromise relevance. We demonstrate that this leads to improved reachability and navigability in both recommender systems. Our work underlines the importance of navigability and reachability as evaluation dimension of a large movie recommender system and shows up ways to increase navigational diversity.
Article
Full-text available
The need to examine the behavior of different user groups is a fundamental requirement when building information systems. In this paper, we present Ontology-based Decentralized Search (OBDS), a novel method to model the navigation behavior of users equipped with different types of background knowledge. Ontology-based Decentralized Search combines ontologies and decentralized search, an established method for navigation in social networks, to model navigation behavior in information networks. The method uses ontologies as an explicit representation of background knowledge to inform the navigation process and guide it towards navigation targets. By using different ontologies, users equipped with different types of background knowledge can be represented. We demonstrate our method using four biomedical ontologies and their associated Wikipedia articles. We compare our simulation results with base line approaches and with results obtained from a user study and find that our method produces click paths that have properties similar to those originating from human navigators. The results suggest that our method can be used to model human navigation behavior in systems that are based on information networks such as Wikipedia.
Conference Paper
Full-text available
Decentralized search in networks is an activity that is often performed in online tasks. It refers to situations where a user has no global knowledge of a network’s topology, but only local knowledge. On Wikipedia for instance, humans typically have local knowledge of the links emanating from a given Wikipedia article, but no global knowledge of the entire Wikipedia graph. This makes the task of navigation to a target Wikipedia article from a given starting article an interesting problem for both humans and algorithms. As we know from previous studies, people can have very efficient decentralized search procedures that find shortest paths in many cases, using intuitions about a given network. These intuitions can be modeled as hierarchical background knowledge that people access to approximate a networks’ topology. In this paper, we explore the differences and similarities between decentralized search that utilizes hierarchical background knowledge and actual human navigation in information networks. For that purpose we perform a large scale study on the Wikipedia information network with over 500,000 users and 1,500,000 click trails. As our results reveal, a decentralized search procedure based on hierarchies created directly from the link structure of the information network simulates human navigational behavior better than simulations based on hierarchies that are created from external knowledge.
Conference Paper
Full-text available
This paper considers a popular class of recommender systems that are based on Collaborative Filtering (CF) and proposes a novel technique for diversifying the recommendations that they give to users. Items are clustered based on a unique notion of priority-medoids that provides a natural balance between the need to present highly ranked items vs. highly diverse ones. Our solution estimates items diversity by comparing the rankings that different users gave to the items, thereby enabling diversification even in common scenarios where no semantic information on the items is available. It also provides a natural zoom-in mechanism to focus on items (clusters) of interest and recommending diversified similar items. We present DiRec a plug-in that implements the above concepts and allows CF Recommender systems to diversify their recommendations. We illustrate the operation of DiRec in the context of a movie recommendation system and present a thorough experimental study that demonstrates the effectiveness of our recommendation diversification technique and its superiority over previous solutions.
Conference Paper
Full-text available
On the Web, users typically forage for information by navigating from page to page along Web links. Their surfing patterns or actions are guided by their information needs. Researchers need tools to explore the complex interactions between user needs, user actions, and the structures and contents of the Web. In this paper, we describe two computational methods for understanding the relationship between user needs and user actions. First, for a particular pattern of surfing, we seek to infer the associated information need. Second, given an information need, and some pages as starting pints, we attempt to predict the expected surfing patterns. The algorithms use a concept called “information scent”, which is the subjective sense of value and cost of accessing a page based on perceptual cues. We present an empirical evaluation of these two algorithms, and show their effectiveness.
Conference Paper
Full-text available
As online groups grow in number and type, understanding lurking is becoming increasingly important. Recent reports indicate that lurkers make up over 90% of online groups, yet little is known about them.This paper presents a demographic study of lurking in email-based discussion lists (DLs) with an emphasis on health and software-support DLs. Four primary questions are examined. One, how prevalent is lurking, and do health and software-support DLs differ? Two, how do lurking levels vary as the definition is broadened from zero posts in 12 weeks to 3 or fewer posts in 12 weeks? Three, is there a relationship between lurking and the size of the DL, and four, is there a relationship between lurking and traffic level?When lurking is defined as no posts, the mean lurking level for all DLs is lower than the reported 90%. Health-support DLs have on average significantly fewer lurkers (46%) than software-support DLs (82%). Lurking varies widely ranging from 0 to 99%. The relationships between lurking, group size and traffic are also examined.
Conference Paper
Full-text available
Even though people are attracted by large, high quality recommendation sets, psychological research on choice overload shows that choosing an item from recommendation sets containing many attractive items can be a very difficult task. A web-based user experiment using a matrix factorization algorithm applied to the MovieLens dataset was used to investigate the effect of recommendation set size (5 or 20 items) and set quality (low or high) on perceived variety, recommendation set attractiveness, choice difficulty and satisfaction with the chosen item. The results show that larger sets containing only good items do not necessarily result in higher choice satisfaction compared to smaller sets, as the increased recommendation set attractiveness is counteracted by the increased difficulty of choosing from these sets. These findings were supported by behavioral measurements revealing intensified information search and increased acquisition times for these large attractive sets. Important implications of these findings for the design of recommender system user interfaces will be discussed.
Conference Paper
Full-text available
We discuss the video recommendation system in use at YouTube, the world's most popular online video community. The system recommends personalized sets of videos to users based on their activity on the site. We discuss some of the unique challenges that the system faces and how we address them. In addition, we provide details on the experimentation and evaluation framework used to test and tune new algorithms. We also present some of the findings from these experiments.
Conference Paper
Full-text available
This paper presents two methods, named Item- and User-centric, to evaluate the quality of novel recommendations. The former method focuses on analyzing the item-based recommendation network. The aim is to detect whether the network topology has any pathology that hinders novel recommendations. The latter, user-centric evaluation, aims at measuring users' perceived quality of novel, previously unknown, recommendations. The results of the experiments, done in the music recommendation context, show that last.fm social recommender, based on collaborative filtering, is prone to popularity bias. This has direct consequences on the topology of the item-based recommendation network. Pure audio content-based methods (CB) are not affected by popularity. However, a user-centric experiment done with 288 subjects shows that even though a social-based approach recommends less novel items than our CB, users' perceived quality is better than those recommended by a pure CB method.
Conference Paper
Full-text available
Recently, a number of algorithms have been proposed to obtain hierarchical structures — so-called folksonomies — from social tagging data. Work on these algorithms is in part driven by a belief that folksonomies are useful for tasks such as: (a) Navigating social tagging systems and (b) Acquiring semantic relationships between tags. While the promises and pitfalls of the latter have been studied to some extent, we know very little about the extent to which folksonomies are pragmatically useful for navigating social tagging systems. This paper sets out to address this gap by presenting and applying a pragmatic framework for evaluating folksonomies. We model exploratory navigation of a tagging system as decentralized search on a network of tags. Evaluation is based on the fact that the performance of a decentralized search algorithm depends on the quality of the background knowledge used. The key idea of our approach is to use hierarchical structures learned by folksonomy algorithms as background knowledge for decentralized search. Utilizing decentralized search on tag networks in combination with different folksonomies as hierarchical background knowledge allows us to evaluate navigational tasks in social tagging systems. Our experiments with four state-of-the-art folksonomy algorithms on five different social tagging datasets reveal that existing folksonomy algorithms exhibit significant, previously undiscovered, differences with regard to their utility for navigation. Our results are relevant for engineers aiming to improve navigability of social tagging systems and for scientists aiming to evaluate different folksonomy algorithms from a pragmatic perspective
Conference Paper
Full-text available
Many music portals offer the possibility to explore mu- sic collections via browsing automatically generated mu- sic recommendations. In this paper we argue that such music recommender systems can be transformed into an equivalent recommendation graph. We then analyze the recommendation graph of a real-world content-based mu- sic recommender systems to find out if users can really explore the underlying song database by following those recommendations. We find that some songs are not rec- ommended at all and are consequently not reachable via browsing. We then take a first attempt to modify a recom- mendation network in such a way that the resulting net- work is better suited to explore the respective music space.
Article
Full-text available
Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.
Article
Full-text available
Social networks have the surprising property of being “searchable”: Ordinary people are capable of directing messages through their network of acquaintances to reach a specific but distant target person in only a few steps. We present a model that offers an explanation of social network searchability in terms of recognizable personal identities: sets of characteristics measured along a number of social dimensions. Our model defines a class of searchable networks and a method for searching them that may be applicable to many network search problems, including the location of data files in peer-to-peer networks, pages on the World Wide Web, and information in distributed databases.
Conference Paper
Full-text available
This paper presents a modified diary study that investigated how people performed personally motivated searches in their email, in their files, and on the Web. Although earlier studies of directed search focused on keyword search, most of the search behavior we observed did not involve keyword search. Instead of jumping directly to their information target using keywords, our participants navigated to their target with small, local steps using their contextual knowledge as a guide, even when they knew exactly what they were looking for in advance. This stepping behavior was especially common for participants with unstructured information organization. The observed advantages of searching by taking small steps include that it allowed users to specify less of their information need and provided a context in which to understand their results. We discuss the implications of such advantages for the design of personal information management tools.
Conference Paper
Full-text available
In this work we present topic diversification, a novel method designed to balance and diversify personalized recommendation lists in order to reflect the user's complete spectrum of interests. Though being detrimental to average accuracy, we show that our method improves user satisfaction with recommendation lists, in particular for lists generated using the common item-based collaborative filtering algorithm.Our work builds upon prior research on recommender systems, looking at properties of recommendation lists as entities in their own right rather than specifically focusing on the accuracy of individual recommendations. We introduce the intra-list similarity metric to assess the topical diversity of recommendation lists and the topic diversification approach for decreasing the intra-list similarity. We evaluate our method using book recommendation data, including offline analysis on 361, !, 349 ratings and an online study involving more than 2, !, 100 subjects.
Article
Full-text available
We study the topology of several music recommendation networks, which arise from relationships between artist, co-occurrence of songs in play lists or experts' recommendation. The analysis uncovers the emergence of complex network phenomena in these kinds of recommendation networks, built considering artists as nodes and their resemblance as links. We observe structural properties that provide some hints on navigation and possible optimizations on the design of music recommendation systems. Finally, the analysis derived from existing music knowledge sources provides a deeper understanding of the human music similarity perception.
Article
Full-text available
The new social media sites - blogs, wikis, del.icio.us and Flickr, among others - underscore the transformation of the Web to a participatory medium in which users are actively creating, evaluating and distributing information. The photo-sharing site Flickr, for example, allows users to upload photographs, view photos created by others, comment on those photos, etc. As is common to other social media sites, Flickr allows users to designate others as ``contacts'' and to track their activities in real time. The contacts (or friends) lists form the social network backbone of social media sites. We claim that these social networks facilitate new ways of interacting with information, e.g., through what we call social browsing. The contacts interface on Flickr enables users to see latest images submitted by their friends. Through an extensive analysis of Flickr data, we show that social browsing through the contacts' photo streams is one of the primary methods by which users find new images on Flickr. This finding has implications for creating personalized recommendation systems based on the user's declared contacts lists.
Conference Paper
Wikipedia supports its users to reach a wide variety of goals: looking up facts, researching a topic, making an edit or simply browsing to pass time. Some of these goals, such as the lookup of facts, can be effectively supported by search functions. However, for other use cases such as researching an unfamiliar topic, users need to rely on the links to connect articles. In this paper, we investigate the state of navigability in the article networks of eight language versions of Wikipedia. We find that, when taking all links of articles into account, all language versions enable mutual reachability for almost all articles. However, previous research has shown that visitors of Wikipedia focus most of their attention on the areas located close to the top. We therefore investigate different restricted navigational views that users could have when looking at articles. We find that restricting the view of articles strongly limits the navigability of the resulting networks and impedes navigation. Based on this analysis we then propose a link recommendation method to augment the link network to improve navigability in the network. Our approach selects links from a less restricted view of the article and proposes to move these links into more visible sections. The recommended links are therefore relevant for the article. Our results are relevant for researchers interested in the navigability of Wikipedia and open up new avenues for link recommendations in Wikipedia editing.
Chapter
Recommender systemsare now popular both commercially and in the research community, where many approaches have been suggested for providing recommendations. In many cases a system designer that wishes to employ a recommendater system must choose between a set of candidate approaches. A first step towards selecting an appropriate algorithm is to decide which properties of the application to focus upon when making this choice. Indeed, recommender systems have a variety of properties that may affect user experience, such as accuracy, robustness, scalability, and so forth. In this paper we discuss how to compare recommenders based on a set of properties that are relevant for the application. We focus on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. We describe experimental settings appropriate for making choices between algorithms. We review three types of experiments, starting with an offline setting, where recommendation approaches are compared without user interaction, then reviewing user studies, where a small group of subjects experiment with the system and report on the experience, and finally describe large scale online experiments, where real user populations interact with the system. In each of these cases we describe types of questions that can be answered, and suggest protocols for experimentation. We also discuss how to draw trustworthy conclusions from the conducted experiments. We then review a large set of properties, and explain how to evaluate systems given relevant properties. We also survey a large set of evaluation metrics in the context of the property that they evaluate.
Chapter
Novelty and diversity have been identified, along with accuracy, as foremost properties of useful recommendations. Considerable progress has been made in the field in terms of the definition of methods to enhance such properties, as well as methodologies and metrics to assess how well such methods work. In this chapter we give an overview of the main contributions to this area in the field of recommender systems, and seek to relate them together in a unified view, analyzing the common elements underneath the different forms under which novelty and diversity have been addressed, and identifying connections to closely related work on diversity in other fields.
Conference Paper
Models of human navigation play an important role for understanding and facilitating user behavior in hypertext systems. In this paper, we conduct a series of principled experiments with decentralized search - an established model of human navigation in social networks - and study its applicability to information networks. We apply several variations of decentralized search to model human navigation in information networks and we evaluate the outcome in a series of experiments. In these experiments, we study the validity of decentralized search by comparing it with human navigational paths from an actual information network - Wikipedia. We find that (i) navigation in social networks appears to differ from human navigation in information networks in interesting ways and (ii) in order to apply decentralized search to information networks, stochastic adaptations are required. Our work illuminates a way towards using decentralized search as a valid model for human navigation in information networks in future work. Our results are relevant for scientists who are interested in modeling human behavior in information networks and for engineers who are interested in using models and simulations of human behavior to improve on structural or user interface aspects of hypertextual systems.
Conference Paper
Eli Pariser coined the term 'filter bubble' to describe the potential for online personalization to effectively isolate people from a diversity of viewpoints or content. Online recommender systems - built on algorithms that attempt to predict which items users will most enjoy consuming - are one family of technologies that potentially suffers from this effect. Because recommender systems have become so prevalent, it is important to investigate their impact on users in these terms. This paper examines the longitudinal impacts of a collaborative filtering-based recommender system on users. To the best of our knowledge, it is the first paper to measure the filter bubble effect in terms of content diversity at the individual level. We contribute a novel metric to measure content diversity based on information encoded in user-generated tags, and we present a new set of methods to examine the temporal effect of recommender systems on the user experience. We do find that recommender systems expose users to a slightly narrowing set of items over time. However, we also see evidence that users who actually consume the items recommended to them experience lessened narrowing effects and rate items more positively.
Article
First, a new model of searching in online and other information systems, called 'berrypicking', is discussed. This model, it is argued, is much closer to the real behavior of information searchers than the traditional model of information retrieval is, and, consequently, will guide our thinking better in the design of effective interfaces. Second, the research literature of manual information seeking behavior is drawn on for suggestions of capabilities that users might like to have in online systems, Third, based on the new model and the research on information seeking, suggestions are made for how new search capabilities could be incorporated into the design of search interfaces. Particular attention is given to the nature and types of browsing that can be facilitated.
Article
Recommender systems based on collaborative filtering predict user preferences for products or services by learning past user-item re-lationships. A predominant approach to collaborative filtering is neighborhood based ("k-nearest neighbors"), where a user-item pref-erence rating is interpolated from ratings of similar items and/or users. In this work, we enhance the neighborhood-based approach leading to a substantial improvement of prediction accuracy, with-out a meaningful increase in running time. First, we remove certain so-called "global effects" from the data to make the different ratings more comparable, thereby improving interpolation accuracy. Sec-ond, we show how to simultaneously derive interpolation weights for all nearest neighbors. Unlike previous approaches where each interpolation weight is computed separately, simultaneous interpo-lation accounts for the many interactions between neighbors by globally solving a suitable optimization problem, also leading to improved accuracy. Our method is very fast in practice, generat-ing a prediction in about 0.2 milliseconds. Importantly, it does not require training many parameters or a lengthy preprocessing, mak-ing it very practical for large scale applications. The method was evaluated on the Netflix dataset. We could process the 2.8 million queries of the Qualifying set in 10 minutes yielding a RMSE of 0.9086. Moreover, when an extensive training is allowed, such as SVD-factorization at the preprocessing stage, our method can pro-duce results with a RMSE of 0.8982.
Article
First, a new model of searching in online and other information systems, called 'berrypicking', is discussed. This model, it is argued, is much closer to the real behavior of information searchers than the traditional model of information retrieval is, and, consequently, will guide our thinking better in the design of effective interfaces. Second, the research literature of manual information seeking behavior is drawn on for suggestions of capabilities that users might like to have in online systems. Third, based on the new model and the research on information seeking, suggestions are made for how new search capabilities could be incorporated into the design of search interfaces. Particular attention is given to the nature and types of browsing that can be facilitated.
Article
As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Article
The small-world phenomenon - the principle that most of us are linked by short chains of acquaintances - was first investigated as a question in sociology and is a feature of a range of networks arising in nature and technology. Experimental study of the phenomenon revealed that it has two fundamental components: first, such short chains are ubiquitous, and second, individuals operating with purely local information are very adept at finding these chains. The first issue has been analysed, and here I investigate the second by modelling how individuals can find short chains in a large social network.
Article
Recommendation algorithms are best known for their use on e-commerce Web sites, where they use input about a customer's interests to generate a list of recommended items. Many applications use only the items that customers purchase and explicitly rate to represent their interests, but they can also use other attributes, including items viewed, demographic data, subject interests, and favorite artists. At Amazon.com, we use recommendation algorithms to personalize the online store for each customer. The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother. There are three common approaches to solving the recommendation problem: traditional collaborative filtering, cluster models, and search-based methods. Here, we compare these methods with our algorithm, which we call item-to-item collaborative filtering. Unlike traditional collaborative filtering, our algorithm's online computation scales independently of the number of customers and number of items in the product catalog. Our algorithm produces recommendations in real-time, scales to massive data sets, and generates high quality recommendations.
Article
Long a matter of folklore, the "small-world phenomenon" --- the principle that we are all linked by short chains of acquaintances --- was inaugurated as an area of experimental study in the social sciences through the pioneering work of Stanley Milgram in the 1960's. This work was among the first to make the phenomenon quantitative, allowing people to speak of the "six degrees of separation" between any two people in the United States. Since then, a number of network models have been proposed as frameworks in which to study the problem analytically. One of the most refined of these models was formulated in recent work of Watts and Strogatz; their framework provided compelling evidence that the small-world phenomenon is pervasive in a range of networks arising in nature and technology, and a fundamental ingredient in the evolution of the World Wide Web. But existing models are insu#cient to explain the striking algorithmic component of Milgram's original findings: that individuals using local information are collectively very e#ective at actually constructing short paths between two points in a social network. Although recently proposed network models are rich in short paths, we prove that no decentralized algorithm, operating with local information only, can construct short paths in these networks with non-negligible probability. We then define an infinite family of network models that naturally generalizes the Watts-Strogatz model, and show that for one of these models, there is a decentralized algorithm capable of finding short paths with high probability. More generally, we provide a strong characterization of this family of network models, showing that there is in fact a unique model within the family for which decentralized algorithms are e#ect...
Article
The acquisition of information is generally thought to be deliberately sought using a search or query mechanism or by browsing or scanning an information space. People, however, find information without seeking it through accidental, incidental or serendipitous discoveries, often in combination with other information acquisition episodes. The value of this phenomenon to an individual or an organization can be equated with the impact of serendipitous breakthroughs in science and medicine. Although largely ignored in information systems development and research, serendipitous retrieval complements querying and browsing, and together they provide a holistic, ecological approach to information acquisition and define the key approaches to a digital library. In this paper, the concept of serendipitous information retrieval is introduced and validated with data from a study of news readers, along with some approaches for how to facilitate it.
The 90-9-1 rule for participation inequality in social media and online communities
  • J Nielsen