In many visually-oriented applications, users can select and group images that they find interesting into coherent clusters. For instance , we encounter these in the form of hashtags on Instagram, galleries on Flickr, or boards on Pinterest. The selection and coherence of such user-curated visual clusters arise from a user's preference for a certain type of content as well as her own perception of which images are similar and thus belong to a cluster. We seek to model such curation behaviors towards supporting users in their future activities such as expanding existing clusters or discovering new clusters altogether. This paper proposes a framework, namely Collaborative Curating that jointly models the interrelated modalities of preference expression and similarity perception. Extensive experiments on real-world datasets of various categories from a visual curating platform show that the proposed framework significantly outperforms baselines focusing on either clustering behaviors or preferences alone.
All content in this area was uploaded by Dung D. Le on Oct 24, 2021
Content may be subject to copyright.
A preview of the PDF is not available
... Han et al. [17] formulated this as a concept discovery problem, introducing spatially-aware methods to automatically learn attribute relationships from visual data. Le et al. [33] proposed a joint modeling approach that combines preference signals with similarity metrics to enable dynamic collection generation. Our work builds on these directions by introducing a content-first paradigm to create collections that shared the same attributes including visual characteristics, semantic descriptions, and contextual information like occasions. ...
Online platforms like Pinterest hosting vast content collections traditionally rely on manual curation or user-generated search logs to create keyword landing pages (KLPs) -- topic-centered collection pages that serve as entry points for content discovery. While manual curation ensures quality, it doesn't scale to millions of collections, and search log approaches result in limited topic coverage and imprecise content matching. In this paper, we present PinLanding, a novel content-first architecture that transforms the way platforms create topical collections. Instead of deriving topics from user behavior, our system employs a multi-stage pipeline combining vision-language model (VLM) for attribute extraction, large language model (LLM) for topic generation, and a CLIP-based dual-encoder architecture for precise content matching. Our model achieves 99.7% Recall@10 on Fashion200K benchmark, demonstrating strong attribute understanding capabilities. In production deployment for search engine optimization with 4.2 million shopping landing pages, the system achieves a 4X increase in topic coverage and 14.29% improvement in collection attribute precision over the traditional search log-based approach via human evaluation. The architecture can be generalized beyond search traffic to power various user experiences, including content discovery and recommendations, providing a scalable solution to transform unstructured content into curated topical collections across any content domain.
... In recent years, list recommendation (or bundle recommendation) has received growing attention in online services [1][2][3]. Specifically, compared with conventional recommendation [4][5][6], which is designed for recommending a single item, list or bundle recommendation aims to present users with a collection of items that coheres around a specific semantic concept (Fig. 1a). By presenting a variety of carefully designed lists, the search scope is largely narrowed down, and users can find their desired items more efficiently. ...
... Table 1 summarizes the features of the datasets. In order to quantify the outcome of our experiments, we apply the typical "leave-oneout cross-validation" (or "hide one technique" for short), where each time the CF system predicts the rating value of one hidden user rating of the dataset [48][49][50]. In particular, we executed two sets of experiments, where for the first one, for each user, one of the user's ratings, randomly, is predicted, whereas for the second one, for each user, the user's last rating value (using the timestamp each rating has) is predicted. ...
Collaborative filtering has proved to be one of the most popular and successful rating prediction techniques over the last few years. In collaborative filtering, each rating prediction, concerning a product or a service, is based on the rating values that users that are considered “close” to the user for whom the prediction is being generated have given to the same product or service. In general, “close” users for some user u correspond to users that have rated items similarly to u and these users are termed as “near neighbors”. As a result, the more reliable these near neighbors are, the more successful predictions the collaborative filtering system will compute and ultimately, the more successful recommendations the recommender system will generate. However, when the dataset’s density is relatively low, it is hard to find reliable near neighbors and hence many predictions fail, resulting in low recommender system reliability. In this work, we present a method that enhances rating prediction quality in low-density collaborative filtering datasets, by considering predictions whose features are associated with high prediction accuracy as additional ratings. The presented method’s efficacy and applicability are substantiated through an extensive multi-parameter evaluation process, using widely acceptable low-density collaborative filtering datasets.
Product bundling, offering a combination of items to customers, is one of the marketing strategies commonly used in online e-commerce and offline retailers. A high-quality bundle generalizes frequent items of interest, and diversity across bundles boosts the user-experience and eventually increases transaction volume. In this paper, we formalize the personalized bundle list recommendation as a structured prediction problem and propose a bundle generation network (BGN), which decomposes the problem into quality/diversity parts by the determinantal point processes (DPPs). BGN uses a typical encoder-decoder framework with a proposed feature-aware softmax to alleviate the inadequate representation of traditional softmax, and integrates the masked beam search and DPP selection to produce high-quality and diversified bundle list with an appropriate bundle size. We conduct extensive experiments on three public datasets and one industrial dataset, including two generated from co-purchase records and the other two extracted from real-world online bundle services. BGN significantly outperforms the state-of-the-art methods in terms of quality, diversity and response time over all datasets. In particular, BGN improves the precision of the best competitors by 16\% on average while maintaining the highest diversity on four datasets, and yields a 3.85x improvement of response time over the best competitors in the bundle list recommendation problem.
Over the past three years Pinterest has experimented with several visual search and recommendation systems, from enhancing existing products such as Related Pins (2014), to powering new products such as Similar Looks (2015), Flashlight (2016), and Lens (2017). This paper presents an overview of our visual discovery engine powering these services, and shares the rationales behind our technical and product decisions such as the use of object detection and interactive user interfaces. We conclude that this visual discovery engine significantly improves engagement in both search and recommendation tasks.
Playlists have become a significant part of our listening experience because of digital cloud-based services such as Spotify, Pandora, Apple Music, making playlist recommendation crucial to music services today. With an aim towards playlist discovery and recommendation, we leverage sequence-to-sequence modeling to learn a fixed-length representation of playlists in an unsupervised manner. We evaluate our work using a recommendation task, along with embedding-evaluation tasks, to study the extent to which semantic characteristics such as genre, song-order, etc. are captured by the playlist embeddings and how they can be leveraged for music recommendation.
Although widely used, the majority of current music recommender systems still focus on recommendations' accuracy, user preferences and isolated item characteristics, without evaluating other important factors, like the joint item selections and the recommendation moment. However, when it comes to playlist recommendations, additional dimensions, as well as the notion of user experience and perception, should be taken into account to improve recommendations' quality. In this work, HybA, a hybrid recommender system for automatic playlist continuation, that combines Latent Dirichlet Allocation and Case-Based Reasoning, is proposed. This system aims to address "similar concepts" rather than similar users. More than generating a playlist based on user requirements, like automatic playlist generation methods, HybA identifies the semantic characteristics of a started playlist and reuses the most similar past ones, to recommend relevant playlist continuations. In addition, support to beyond accuracy dimensions, like increased coherence or diverse items' discovery, is provided. To overcome the semantic gap between music descriptions and user preferences, identify playlist structures and capture songs' similarity, a graph model is used. Experiments on real datasets have shown that the proposed algorithm is able to outperform other state of the art techniques, in terms of accuracy, while balancing between diversity and coherence.
Most recommendation research has been concentrated on recommending single items to users, such as the considerable work on collaborative filtering that models the interaction between a user and an item. However, in many real-world scenarios, the platform needs to show users a set of items, e.g., the marketing strategy that offers multiple items for sale as one bundle.In this work, we consider recommending a set of items to a user, i.e., the Bundle Recommendation task, which concerns the interaction modeling between a user and a set of items. We contribute a neural network solution named DAM, short for Deep Attentive Multi-Task model, which is featured with two special designs: 1) We design a factorized attention network to aggregate the item embeddings in a bundle to obtain the bundle's representation; 2) We jointly model user-bundle interactions and user-item interactions in a multi-task manner to alleviate the scarcity of user-bundle interactions. Extensive experiments on a real-world dataset show that DAM outperforms the state-of-the-art solution, verifying the effectiveness of our attention design and multi-task learning in DAM.
At Pinterest, we utilize image embeddings throughout our search and recommendation systems to help our users navigate through visual content by powering experiences like browsing of related content and searching for exact products for shopping. In this work we describe a multi-task deep metric learning system to learn a single unified image embedding which can be used to power our multiple visual search products. The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for each product.
We discuss the challenges of handling images from different domains such as camera photos, high quality web images, and clean product catalog images. We also detail how to jointly train for multiple product objectives and how to leverage both engagement data and human labeled data. In addition, our trained embeddings can also be binarized for efficient storage and retrieval without compromising precision and recall. Through comprehensive evaluations on offline metrics, user studies, and online A/B experiments, we demonstrate that our proposed unified embedding improves both relevance and engagement of our visual search products for both browsing and searching purposes when compared to existing specialized embeddings. Finally, the deployment of the unified embedding at Pinterest has drastically reduced the operation and engineering cost of maintaining multiple embeddings while improving quality.
Recommender systems are aimed at generating a personalized ranked list of items that an end user might be interested in. With the unprecedented success of deep learning in computer vision and speech recognition, recently it has been a hot topic to bridge the gap between recommender systems and deep neural network. And deep learning methods have been shown to achieve state-of-the-art on many recommendation tasks. For example, a recent model, NeuMF, first projects users and items into some shared low-dimensional latent feature space, and then employs neural nets to model the interaction between the user and item latent features to obtain state-of-the-art performance on the recommendation tasks. NeuMF assumes that the non-interacted items are inherent negative and uses negative sampling to relax this assumption. In this paper, we examine an alternative approach which does not assume that the non-interacted items are necessarily negative, just that they are less preferred than interacted items. Specifically, we develop a new classification strategy based on the widely used pairwise ranking assumption. We combine our classification strategy with the recently proposed neural collaborative filtering framework, and propose a general collaborative ranking framework called Neural Network based Collaborative Ranking (NCR). We resort to a neural network architecture to model a user's pairwise preference between items, with the belief that neural network will effectively capture the latent structure of latent factors. The experimental results on two real-world datasets show the superior performance of our models in comparison with several state-of-the-art approaches.
When people make decisions with a number of ideas, designs, or other kinds of objects, one attempt is probably to organize them into several groups of objects and to prioritize them according to some preference. The grouping task is referred to as clustering and the prioritizing task is called as ranking. These tasks are often outsourced with the help of human judgments in the form of pairwise comparisons. Two objects are compared on whether they are similar in the clustering problem, while the object of higher priority is determined in the ranking problem. Our research question in this paper is whether the pairwise comparisons for clustering also help ranking (and vice versa). Instead of solving the two tasks separately, we propose a unified formulation to bridge the two types of pairwise comparisons. Our formulation simultaneously estimates the object embeddings and the preference criterion vector. The experiments using real datasets support our hypothesis; our approach can generate better neighbor and preference estimation results than the approaches that only focus on a single type of pairwise comparisons.