Shaoping Ma's research while affiliated with Tsinghua University and other places

Publications (308)

Article
In legal case retrieval, existing work has shown that human-mediated conversational search can improve users’ search experience. In practice, a suitable workflow can provide guidelines for constructing a machine-mediated agent replacing of human agents. Therefore, we conduct a comparison analysis and summarize two challenges when directly applying...
Preprint
Recent advance in Dense Retrieval (DR) techniques has significantly improved the effectiveness of first-stage retrieval. Trained with large-scale supervised data, DR models can encode queries and documents into a low-dimensional dense space and conduct effective semantic matching. However, previous studies have shown that the effectiveness of DR mo...
Article
Full-text available
Sentiment analysis is an essential task in natural language processing researches. Although existing works have gained much success with both statistical and neural-based solutions, little is known about the human decision process while performing this kind of complex cognitive task. Considering recent advances in human-inspired model design for NL...
Conference Paper
Full-text available
Recently, pre-training methods tailored for IR tasks have achieved great success. However, as the mechanisms behind the performance improvement remain under-investigated, the interpretability and robustness of these pre-trained models still need to be improved. Axiomatic IR aims to identify a set of desirable properties expressed mathematically as...
Article
Recommender systems are an essential tool to relieve the information overload challenge and play an important role in people’s daily lives. Since recommendations involve allocations of social resources (e.g., job recommendation), an important issue is whether recommendations are fair. Unfair recommendations are not only unethical but also harm the...
Conference Paper
Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Repres...
Preprint
Full-text available
Collaborative filtering (CF) plays a critical role in the development of recommender systems. Most CF methods utilize an encoder to embed users and items into the same representation space, and the Bayesian personalized ranking (BPR) loss is usually adopted as the objective function to learn informative encoders. Existing studies mainly focus on de...
Preprint
Recommender systems are an essential tool to relieve the information overload challenge and play an important role in people's daily lives. Since recommendations involve allocations of social resources (e.g., job recommendation), an important issue is whether recommendations are fair. Unfair recommendations are not only unethical but also harm the...
Article
Recommendation in legal scenario (Legal-Rec) is a specialized recommendation task that aims to provide potential helpful legal documents for users. While there are mainly three differences compared with traditional recommendation: (1) Both the structural connections and textual contents of legal information are important in the Legal-Rec scenario,...
Preprint
A retrieval model should not only interpolate the training data but also extrapolate well to the queries that are rather different from the training data. While dense retrieval (DR) models have been demonstrated to achieve better retrieval performance than the traditional term-based retrieval models, we still know little about whether they can extr...
Preprint
Conversational Search has been paid much attention recently with the increasing popularity of intelligent user interfaces. However, compared with the endeavour in designing effective conversational search algorithms, relatively much fewer researchers have focused on the construction of benchmark datasets. For most existing datasets, the information...
Preprint
Overfitting is a common problem in machine learning, which means the model too closely fits the training data while performing poorly in the test data. Among various methods of coping with overfitting, dropout is one of the representative ways. From randomly dropping neurons to dropping neural structures, dropout has achieved great success in impro...
Article
Recommendation systems play a vital role in alleviating information overload. Generally, a recommendation model is trained to discern between positive (liked) and negative (disliked) instances for each user. However, under the open-world assumption, there are only positive instances but no negative instances from users’ implicit feedback, which pos...
Article
Result ranking is one of the major concerns for Web search technologies. Most existing methodologies rank search results in descending order of relevance. To model the interactions among search results, Reinforcement Learning (RL) algorithms have been widely adopted for ranking tasks. However, the online training of RL methods is time and resource...
Article
Modern search engine result pages (SERPs) become increasingly complex with heterogeneous information aggregated from various sources. In many cases, these SERPs also display results in the right rail besides the traditional left-rail result lists, which change the linear result list to a non-linear panel and might influence user search behavior pat...
Article
Previous studies have demonstrated the potential bias and fairness issues in real recommender systems. Fairness issue is generally defined as equality of services between groups. However, fairness does not imply equality for all users/items in real scenarios, as premium users/items, who have paid for their services, are supposed to have better expe...
Preprint
Dense Retrieval (DR) reaches state-of-the-art results in first-stage retrieval, but little is known about the mechanisms that contribute to its success. Therefore, in this work, we conduct an interpretation study of recently proposed DR models. Specifically, we first discretize the embeddings output by the document and query encoders. Based on the...
Conference Paper
Full-text available
While batch evaluation plays a central part in Information Retrieval (IR) research, most evaluation metrics are based on user models which mainly focus on browsing and clicking behaviors. As users' perceived satisfaction may also be impacted by their search intent, constructing different user models across various search intent may help design bett...
Preprint
While search technologies have evolved to be robust and ubiquitous, the fundamental interaction paradigm has remained relatively stable for decades. With the maturity of the Brain-Machine Interface, we build an efficient and effective communication system between human beings and search engines based on electroencephalogram~(EEG) signals, called Br...
Preprint
Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Repres...
Chapter
Relationships among items, especially complementarity, have shown great potential to empower the performance and explainability of recommender systems. However, there are two key limitations: 1) Most previous methods use co-occurrence to quantify item complementary relationship, which lacks theoretical support and overlooks the fact that co-occurre...
Preprint
Web search heavily relies on click-through behavior as an essential feedback signal for performance improvement and evaluation. Traditionally, click is usually treated as a positive implicit feedback signal of relevance or usefulness, while non-click (especially non-click after examination) is regarded as a signal of irrelevance or uselessness. How...
Preprint
Reading comprehension is a complex cognitive process involving many human brain activities. Plenty of works have studied the reading patterns and attention allocation mechanisms in the reading process. However, little is known about what happens in human brain during reading comprehension and how we can utilize this information as implicit feedback...
Preprint
Recently, Information Retrieval community has witnessed fast-paced advances in Dense Retrieval (DR), which performs first-stage retrieval by encoding documents in a low-dimensional embedding space and querying them with embedding-based search. Despite the impressive ranking performance, previous studies usually adopt brute-force search to acquire c...
Article
Full-text available
Self-awareness is an essential concept in physiology and psychology. Accurate overall self-awareness benefits the development and well being of an individual. The previous research studies on self-awareness mainly collect and analyze data in the laboratory environment through questionnaires, user study, or field research study. However, these metho...
Preprint
Data plays a vital role in machine learning studies. In the research of recommendation, both user behaviors and side information are helpful to model users. So, large-scale real scenario datasets with abundant user behaviors will contribute a lot. However, it is not easy to get such datasets as most of them are only hold and protected by companies....
Chapter
Diversity is believed to be an essential factor in improving user satisfaction in recommender systems, while how to take advantage of it has long been a problem worth exploring. Existing work either ignores the influence of diversity or overlooks users’ different diversity demands in recommendations. In this study, we analyze users’ behaviors on a...
Preprint
Full-text available
As queries submitted by users directly a ect search experiences, how to organize queries has always been a research focus in Web search studies. While search request becomes complex and exploratory , many search sessions contain more than a single query thus reformulation becomes a necessity. To help users better formulate their queries in these co...
Preprint
Ranking has always been one of the top concerns in information retrieval researches. For decades, the lexical matching signal has dominated the ad-hoc retrieval process, but solely using this signal in retrieval may cause the vocabulary mismatch problem. In recent years, with the development of representation learning techniques, many researchers t...
Article
User intention is an important factor to be considered for recommender systems, which always changes dynamically in different contexts. Recent studies (represented by sequential recommendation) begin to focus on predicting what users want beyond what users like, which are better at capturing user intention and have attracted a surge of interest. Ho...
Preprint
In this paper, we present our methodologies for tackling the challenges of legal case retrieval and entailment in the Competition on Legal Information Extraction / Entailment 2020 (COLIEE-2020). We participated in the two case law tasks, i.e., the legal case retrieval task and the legal case entailment task. Task 1 (the retrieval task) aims to auto...
Conference Paper
Full-text available
Historically, research on better recommendations revolved mainly around feeding the user with accurate recommendations. However, now the spectrum has shifted from the conventional idea of focusing only on accuracy, to focusing on other dimensions like serendipity, novelty, and diversity, and creating an optimal balance to increase the users’ satisf...
Article
While search engines have reshaped how human beings learn and think, the interaction paradigm of search has remained relatively stable for decades. With the development of neural science and biomedical engineering, it is possible to build a direct communication pathway between a computing device and the human brain via Brain-machine Interfaces (BMI...
Preprint
Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem. Recently, Dense Retrieval (DR) technique has been proposed to alleviate these limitations by capturing the deep...
Chapter
News recommendation, which aims to help users find the news they are interested in, is essential for online news platforms to alleviate the information overload problem. News is full of textual information with some knowledge entities, so recent studies try to leverage knowledge graphs (KGs) as side information to better model user preferences over...
Chapter
Content-based (CB) and collaborative filtering (CF) are two classical types of recommendation methods that widely applied in various online services. Recently, sequential based recommender systems achieved good performance. However, how to integrate the advantages of these recommendation systems has not been well studied yet. Besides, most previous...
Article
Query logs include valuable information for understanding user intent and behavior in Web search. In this article, we investigate COVID-19-related query logs by dividing search sessions into different intent and analyzing the user behavior of groups and individuals. We believe it important to learn about the epidemic's influence on users' search be...
Chapter
With the development and popularization of smartphones, search on mobile devices has become more and more popular in recent years. Existing research found that users’ search interaction patterns in the mobile environment are different from those in the desktop environment. As we know, there are a number of vertical results and richly informative sn...
Chapter
Full-text available
As an essential part in web search, search snippets usually provide result previews for users to either gather useful information or make click-through decisions. In complex search scenarios, users may need to submit multiple queries to search systems until their information needs are satisfied. As user intents tend to be ambiguous, incorporating c...
Conference Paper
Legal case retrieval is a specialized IR task that involves retrieving supporting cases given a query case. Compared with traditional ad-hoc text retrieval, the legal case retrieval task is more challenging since the query case is much longer and more complex than common keyword queries. Besides that, the definition of relevance between a query cas...
Preprint
Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ra...
Article
Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluc...