Figure - available from: Data Mining and Knowledge Discovery
This content is subject to copyright. Terms and conditions apply.
Performance of ColdRoute-T, different kinds of regressors (with using the same feature set as ColdRoute-T), CQARank and LDA for cold questions asked by new askers on 8 Stack Exchange sites. a MRR. b Accuracy. c Precision@1. d Precision@3
Source publication
Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We...
Citations
... Expert recommendation based on link analysis [8][9][10][11][12][13][14] constructs a question-answer relationship directed graph based on the historical interaction behavior between users in the community and then performs link analysis on the directed graph and calculates the authority of each user. Neural-network-based expert recommendation [15][16][17][18][19][20] encodes higher-level questions and feature representations of expert texts with the help of word2vec and graphs and then extracts features by convolutional neural networks and recurrent neural networks. ...
... Chen et al. [18] applied a random walk learning method based on a neural network with RNN to match similarities between new and historical questions. Sun et al. [19] modeled the interaction between different objects (questions, questioners, and answerers) by FMs, which takes advantage of users' historical question answering behavior. Jian et al. [20] proposed an undirected heterogeneous graph that encodes users' past question answering activities and textual information of the questions. ...
Community question answering (CQA), with its flexible user interaction characteristics, is gradually becoming a new knowledge-sharing platform that allows people to acquire knowledge and share experiences. The number of questions is rapidly increasing with the open registration of communities and the massive influx of users, which makes it impossible to match many questions to suitable question answering experts (noted as experts) in a timely manner. Therefore, it is of great importance to perform expert recommendation in CQA. Existing expert recommendation algorithms only use data from a single platform, which is not ideal for new CQA platforms with sparse historical interaction and a small number of questions and users. Considering that many mature CQA platforms (source platforms) have rich historical interaction data and a large amount of questions and experts, this paper will fully mine the information and transfer it to new platforms with sparse data (target platform), which can effectively alleviate the data sparsity problem. However, the feature composition of questions and experts in different platforms is inconsistent, so the data from the source platform cannot be directly transferred for training in the target platform. Therefore, this paper proposes feature-alignment-based cross-platform question answering expert recommendation (FA-CPQAER), which can align expert and question features while transferring data. First, we use the rating predictor composed by the BP network for expert recommendation within the domains, and then the feature matching of questions and experts between two domains by similarity calculation is achieved for the purpose of using the information in the source platform to assist expert recommendation in the target platform. Meanwhile, we train a stacked denoising autoencoder (SDAE) in both domains, which can map user and question features to the same dimension and align the data distributions. Extensive experiments are conducted on two real CQA datasets, Toutiao and Zhihu datasets, and the results show that compared to the other advanced expert recommendation algorithms, this paper’s method achieves better results in the evaluation metrics of MAE, RMSE, Accuracy, and Recall, which fully demonstrates the effectiveness of the method in this paper to solve the data sparsity problem in expert recommendation.
... In online communities and peer-production systems, there is a thread of works that propose using personalized recommendations for the crowds (AlGhamdi et al, 2021;Cosley et al, 2007;Moskalenko et al, 2020;Dror et al, 2011;Liu et al, 2017;Sun et al, 2018Sun et al, , 2019Kurup and Sajeev, 2017;Safran and Che, 2018). Each of these works attempts to design an appropriate recommender ...
... In community question-answering (CQA), several papers develop recommender systems to route users to questions they might be interested in answering, hence improving their engagement on the platform and reducing question answering time (Dror et al, 2011;Liu et al, 2017;Sun et al, 2018Sun et al, , 2019. Their approaches follow specific design criteria to deal with the distinct features in the CQA. ...
... Their approaches follow specific design criteria to deal with the distinct features in the CQA. (Dror et al, 2011;Sun et al, 2018) model recommendation as a classification problem using different machine learning techniques. They are addressing the sparsity issue by applying various mechanisms: 1) implementing a hybrid approach with content and collaborative knowledge to exploit the different families of item descriptors; 2) capturing the diverse types of user-item interactions and differentiating between the types that are more indicative than others. ...
Wikidata is an open knowledge graph created, managed, and maintained collaboratively by a global community of volunteers. As it continues to grow, it faces substantial editor engagement challenges, including acquiring new editors to tackle an increasing workload and retaining existing editors. Experiences from other online communities and peer-production systems, including Wikipedia, suggest that recommending tasks to editors could help with both. Our aim with this paper is to elicit the user requirements for a Wikidata recommendations system. We conduct a mixed-methods study with a thematic analysis of in-depth interviews with 31 Wikidata editors and three Wikimedia managers, complemented by a quantitative analysis of edit records of 3,740 Wikidata editors. The insights gained from the study help us outline design requirements for the Wikidata recommender system. We conclude with a discussion of the implications of this work and directions for future work.
... In community question-answering (CQA), several papers develop recommender systems to route users to questions they might be interested in answering, hence improving their engagement on the platform and reducing question answering time [7,21,37,38]. [7,37] model recommendation as a classification problem, using different machine learning techniques. They implement a hybrid approach with content and collaborative knowledge to address the sparsity problem. ...
... In community question-answering (CQA), several papers develop recommender systems to route users to questions they might be interested in answering, hence improving their engagement on the platform and reducing question answering time [7,21,37,38]. [7,37] model recommendation as a classification problem, using different machine learning techniques. They implement a hybrid approach with content and collaborative knowledge to address the sparsity problem. ...
Wikidata is an open knowledge graph built by a global community of volunteers. As it advances in scale, it faces substantial challenges around editor engagement. These challenges are in terms of both attracting new editors to keep up with the sheer amount of work and retaining existing editors. Experience from other online communities and peer-production systems, including Wikipedia, suggests that personalised recommendations could help, especially newcomers, who are sometimes unsure about how to contribute best to an ongoing effort. For this reason, we propose a recommender system WikidataRec for Wikidata items. The system uses a hybrid of content-based and collaborative filtering techniques to rank items for editors relying on both item features and item-editor previous interaction. A neural network, named a neural mixture of representations, is designed to learn fine weights for the combination of item-based representations and optimize them with editor-based representation by item-editor interaction. To facilitate further research in this space, we also create two benchmark datasets, a general-purpose one with 220,000 editors responsible for 14 million interactions with 4 million items and a second one focusing on the contributions of more than 8,000 more active editors. We perform an offline evaluation of the system on both datasets with promising results. Our code and datasets are available at https://github.com/WikidataRec-developer/Wikidata_Recommender.
... Factorization machines based algorithms also make use of the item side-information to predict the relevance scores for cold items. In specific application settings with abundant item metadata, such as question-answering, many cold-start recommendation algorithms based on the idea of using factorization machines are proposed in the past (Sun et al. 2018;Piazza et al. 2017). In factorization machines, each user-item interaction is represented as a unique feature-label pair where the feature is constructed by using the user and item side-information data. ...
... Our proposed method differs from the above methods as our embeddings are soft-cluster embeddings which are not constrained to work only with the textual description of the items. It is generally believed that the subpar performance of the deep learning models for information retrieval tasks as opposed to the language processing and vision tasks is due to the sparsity in the input features (Sun et al. 2018;McMahan et al. 2013). Moreover, in light of recent studies on the efficacy of deep learning methods for recommendation tasks, one should be sceptical about the use of these methods (Dacrema et al. 2019;Ludewig et al. 2019). ...
Recommender systems are widely used in online platforms for easy exploration of personalized content. The best available recommendation algorithms are based on using the observed preference information among collaborating entities. A significant challenge in recommender system continues to be item cold-start recommendation: how to effectively recommend items with no observed or past preference information. Here we propose a two-stage algorithm based on soft clustering to provide an efficient solution to this problem. The crux of our approach lies in representing the items as soft-cluster embeddings in the space spanned by the side-information associated with the items. Though many item embedding approaches have been proposed for item cold-start recommendations in the past—and simple as they might appear—to the best of our knowledge, the approach based on soft-cluster embeddings has not been proposed in the research literature. Our experimental results on four benchmark datasets conclusively demonstrate that the proposed algorithm makes accurate recommendations in item cold-start settings compared to the state-of-the-art algorithms according to commonly used ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP). The performance of our proposed algorithm on the MovieLens 20M dataset clearly demonstrates the scalability aspect of our algorithm compared to other popular algorithms. We also propose the metric Cold Items Precision (CIP) to quantify the ability of a system to recommend cold-start items. CIP can be used in conjunction with relevance ranking metrics like NDCG and MAP to measure the effectiveness of the cold-start recommendation algorithm.
... However, these models are vulnerable to noisy information contained in feature fields such as question tags and question text, where exist multi-non-zero values if converted to sparse vector representation. Meanwhile, previous research [13] has demonstrated that semantic matching which leverages textual information solely could not achieve satisfactory results and question tags play a more important role than other variables for expert recommendation. Therefore, how to filter out uninformative content from textual information and select the most representative tags for prediction is an important key for field modelling of FM. ...
... Debatableness is related to the number of answers to a question, while utility is negatively correlated to the ranking of an answer among all the answers in a given question. Sun et al. [13] handles cold-start problem of question routing in CQA by introduction factorization machines. Their results indicate that critical features such as question tags play a more important role than other content. ...
... number_of_answer tag upvotes (13) where y consists of two parts: topical interest and topical expertise. The topical interest of user u to question q is represented by the total number of questions answered by u that contain q's tags, i.e. number_of_answer(tag i,u ), while topical expertise of user u to question q is represented by the received upvotes. ...
The most challenging task of Community Question Answering (CQA) is to provide high-quality answers to users’ questions. Currently, a variety of expert recommendation methods have been proposed and greatly improved the effec-tive matching between questions and potential good answerers. However, the performance of existing methods can be ad-versely affected by many common factors such as data sparsity and noise problem, which cause less precise user modeling. Moreover, existing methods often model user-question interactions through simple ways, failing to capture the multiple scale interactions of question and answerers, which make it difficult to find answerers who are able to provide the best answers. In this paper, we propose an attention-based variant of Factorization Machines (FM) called Hierarchical Attentional Factorization Machines (HaFMRank) for answerer recommendation in CQA, which not only models the interactions between pairs of indi-vidual features but emphasizes the roles of crucial features and pairwise interactions. Specifically, we introduce the within-field attention layer to capture the inner structure of features belonging to the same field, while a feature-interaction attention layer is adopted to examine the importance of each pairwise interaction. A pre-training procedure is designed to generate latent FM feature embedding that encode question context and user history into the training process of HaFMRank. The performance of the proposed HaFMRank is evaluated by using real-world datasets of Stack Exchange and experimental results demonstrate that it outperforms several state-of-the-art methods in best answerer recommendation.
Community question answering forums allow users to find knowledge on a topic of interest by asking questions and getting answers from experts. However, it can be challenging to find experts who are knowledgeable in a particular subject, especially when there are millions of questions and thousands of new queries every day.This paper proposes a novel expert recommendation system called Semantic Similarity and Clustering-based Collaborative Filtering (SSC-CF). SSC-CF addresses two key drawbacks of collaborative filtering: scalability and sparsity. Sparsity is addressed by using matrix factorization. In matrix factorization, latent features are identified to detect similarity and generate a prediction based on both the question and the user entities. Whereas a clustering method is employed to group users and questions with shared interests to address scalability. The recommendation system’s accuracy is further improved by incorporating semantic similarity. SSC-CF is evaluated on three Stack Exchange sites: gaming, physics, and scifi. The results clearly show that the proposed technique, SSC-CF, is effective in addressing both scalability and sparsity.
Wikidata is an open knowledge graph built by a global community of volunteers. As it advances in scale, it faces substantial challenges around editor engagement. These challenges are in terms of both attracting new editors to keep up with the sheer amount of work, and retaining existing editors. Experience from other online communities and peer-production systems, including Wikipedia, suggests that personalised recommendations could help, especially newcomers, who are sometimes unsure about how to contribute best to an ongoing effort. For this reason, we propose a recommender system WikidataRec for Wikidata items. The system uses a hybrid of content-based and collaborative filtering techniques to rank items for editors relying on both item features and item-editor previous interaction. A neural network, named neural mixture of representations, is designed to learn fine weights for the combination of item-based representations and optimize them with editor-based representation by item-editor interaction. To facilitate further research in this space, we also create two benchmark datasets, a general-purpose one with 220, 000 editors responsible for 14 million interactions with 4 million items, and a second one focusing on the contributions of more than 8, 000 more active editors. We perform an offline evaluation of the system on both datasets with promising results. Our code and datasets are available at https://github.com/WikidataRec-developer/Wikidata_Recommender.