Shuaiqiang Wang

Shuaiqiang Wang
Baidu Inc. · Search Strategy Department

PhD

About

71
Publications
0
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,205
Citations
Citations since 2016
40 Research Items
1127 Citations
2016201720182019202020212022050100150200250
2016201720182019202020212022050100150200250
2016201720182019202020212022050100150200250
2016201720182019202020212022050100150200250
Additional affiliations
April 2020 - September 2020
Baidu Inc.
Position
  • Engineer
Description
  • Shuaiqiang Wang is now a Senior Algorithm Engineer at Baidu inc., leading the ranking science group of the Baidu Search Engine.
September 2017 - April 2020
JD.COM
Position
  • Researcher
February 2017 - August 2017
The University of Manchester
Position
  • Lecturer
Education
July 2009 - September 2009
Hong Kong Baptist University
Field of study
  • Computer Science
September 2004 - December 2009
Shandong University
Field of study
  • Computer Science
September 2000 - July 2004
Shandong University
Field of study
  • Computer Science

Publications

Publications (71)
Article
Pre-trained language representation models (PLMs) such as BERT and ERNIE have been integral to achieving recent improvements on various downstream tasks, including information retrieval. However, it is nontrivial to directly utilize these models for the large-scale web search due to the following challenging issues: (1) the prohibitively expensive...
Preprint
Full-text available
Extracting query-document relevance from the sparse, biased clickthrough log is among the most fundamental tasks in the web search system. Prior art mainly learns a relevance judgment model with semantic features of the query and document and ignores directly counterfactual relevance evaluation from the clicking log. Though the learned semantic mat...
Preprint
Full-text available
The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to the following disadvantages observed from those popular benchmark datasets: (1) outdated sem...
Preprint
Full-text available
While China has become the biggest online market in the world with around 1 billion internet users, Baidu runs the world largest Chinese search engine serving more than hundreds of millions of daily active users and responding billions queries per day. To handle the diverse query requests from users at web-scale, Baidu has done tremendous efforts i...
Preprint
Full-text available
Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new state-of-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly appl...
Preprint
Full-text available
Passage re-ranking is to obtain a permutation over the candidate passage set from retrieval stage. Re-rankers have been boomed by Pre-trained Language Models (PLMs) due to their overwhelming advantages in natural language understanding. However, existing PLM based re-rankers may easily suffer from vocabulary mismatch and lack of domain specific kno...
Preprint
Full-text available
Query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information. However, it is inherently challenging since it needs to capture semantic information from short and ambiguous queries and often requires massive task-specific labeled data. In recent years, pre-trained language mo...
Preprint
Full-text available
Retrieval is a crucial stage in web search that identifies a small set of query-relevant candidates from a billion-scale corpus. Discovering more semantically-related candidates in the retrieval stage is very promising to expose more high-quality results to the end users. However, it still remains non-trivial challenges of building and deploying ef...
Preprint
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize cou...
Preprint
Full-text available
As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system...
Conference Paper
Full-text available
Recommender Systems have been playing essential roles in e-commerce portals. Existing recommendation algorithms usually learn the ranking scores of items by optimizing a single task (e.g., Click-through rate prediction) based on users' historical click sequences , but they generally pay few attention to simultaneously modeling users' multiple types...
Article
Full-text available
Most recommender systems suggest items that are popular among all users and similar to items a user usually consumes. As a result, the user receives recommendations that she/he is already familiar with or would find anyway, leading to low satisfaction. To overcome this problem, a recommender system should suggest novel, relevant and unexpected i.e....
Chapter
Neural embedding has been widely applied as an effective category of vectorization methods in real-world recommender systems. However, its exploration of users’ explicit feedback on items, to create good quality user and item vectors is still limited. Existing neural embedding methods only consider the items that are accessed by the users, but negl...
Chapter
Matrix factorization (MF) is one of the most effective categories of recommendation algorithms, which makes predictions based on the user-item rating matrix. Nowadays many studies reveal that the ultimate goal of recommendations is to predict correct rankings of these unrated items. However, most of the pioneering efforts on ranking-oriented MF pre...
Article
Big Data is defined as an emerging paradigm that includes complex and large-scale information beyond the processing capability of conventional tools. Traditional data analytics methods have been commonly used for many applications, such as text classfication, image recognition, and video tracking. For analysis purposes, these data often need to be...
Article
Full-text available
We address the feature extraction problem for document ranking in information retrieval. We then propose LifeRank, a Linear feature extraction algorithm for Ranking. In LifeRank, we regard each document collection for ranking as a matrix, referred to as the original matrix. We try to optimize a transformation matrix, so that a new matrix (dataset)...
Conference Paper
Full-text available
This paper addresses a novel tour discovery problem in the domain of travel search. We create a ranking of tours for a set of travel interests, where a tour is a group of city documents and a travel interest is a query. While generating and ranking tours, it is aimed that each interest (from the interest set) is satisfied by at least one city in a...
Conference Paper
We present Etymo (https://etymo.io), a discovery engine to facilitate artificial intelligence (AI) research and development. It aims to help readers navigate a large number of AI-related papers published every week by using a novel form of search that finds relevant papers and displays related papers in a graphical interface. Etymo constructs and m...
Article
In this paper, we introduce polygene-based evolution, a novel framework for evolutionary algorithms (EAs) that features distinctive operations in the evolutionary process. In traditional EAs, the primitive evolution unit is a gene, wherein genes are independent components during evolution. In polygene-based evolutionary algorithms (PGEAs), the evol...
Article
Full-text available
We present Etymo (https://etymo.io), a discovery engine to facilitate artificial intelligence (AI) research and development. It aims to help readers navigate a large number of AI-related papers published every week by using a novel form of search that finds relevant papers and displays related papers in a graphical interface. Etymo constructs and m...
Conference Paper
Full-text available
Cross-domain recommender systems use information from source domains to improve recommendations in a target domain, where the term domain refers to a set of items that share attributes and/or user ratings. Most works on this topic focus on accuracy but disregard other properties of recommender systems. In this paper, we attempt to improve serendipi...
Conference Paper
Full-text available
We introduce CitySearcher, a vertical search engine that searches for cities when queried for an interest. Generally in search engines, utilization of semantics between words is favorable for performance improvement. Even though ambiguous query words have multiple semantic meanings, search engines can return diversified results to satisfy different...
Conference Paper
Full-text available
In this study, we investigate diversified recommendation problem by supervised learning, seeking significant improvement in diversity while maintaining accuracy. In particular, we regard each user as a training instance, and heuristically choose a subset of accurate and diverse items as ground-truth for each user. We then represent each user or ite...
Article
Concept-based image search is an emerging search paradigm that utilizes a set of concepts as intermediate semantic descriptors of images to bridge the semantic gap. Typically, a user query is rather complex and cannot be well described using a single concept. However, it is less effective to tackle such complex queries by simply aggregating the ind...
Conference Paper
A recommendation is called explainable if it not only predicts a numerical rating for an item, but also generates explanations for users' preferences. Most existing methods for explainable recommendation apply topic models to analyze user reviews to provide descriptions along with the recommendations they produce. So far, such methods have neglecte...
Article
Collaborative filtering (CF) is one of the most effective techniques in recommender systems, which can be either rating oriented or ranking oriented. Ranking-oriented CF algorithms demonstrated significant performance gains in terms of ranking accuracy, being able to estimate a precise preference ranking of items for each user rather than the absol...
Article
Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a di...
Article
Full-text available
in open access available at http://redfame.com/journal/index.php/smc/article/view/1746/1858
Article
Full-text available
Recommender systems use past behaviors of users to suggest items. Most tend to offer items similar to the items that a target user has indicated as interesting. As a result, users become bored with obvious suggestions that they might have already discovered. To improve user satisfaction, recommender systems should offer serendipitous suggestions: i...
Article
Most Web search diversity approaches can be categorized as Document Level Diversification (DocLD), Topic Level Diversification (TopicLD) or Term Level Diversification (TermLD). DocLD selects the relevant documents with minimal content overlap to each other. It does not take the coverage of query subtopics into account. TopicLD solves this by modeli...
Article
Full-text available
How to understand intents behind user queries is crucial towards improving the performance of Web search systems. NTCIR-11 IMine task focuses on this problem. In this paper, we address the NTCIR-11 IMine task with two phases referred to as Query Intent Mining (QIM) and Query Intent Ranking (QIR). (I) QIM is intended to mine users' potential intents...
Article
Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from...
Conference Paper
Full-text available
Recently, ranking-oriented collaborative filtering (CF) algorithms have achieved great success in recommender systems. They obtained state-of-the-art performances by estimating a preference ranking of items for each user rather than estimating the absolute ratings on unrated items (as conventional rating-oriented CF algorithms do). In this paper, w...
Article
Tourism service composition combines various tourism elements for users, including transportations, catering, accommodation and other tourism service elements, in order to generate a one-stop tourism service that meets multi-objective needs of users. In doing this, it should simultaneously consider the historical travel data and the current prefere...
Article
Full-text available
Collaborative Filtering (CF) is one of the most successful algorithms in recommender systems. However, it suffers from data sparsity and scalability problems. Although many clustering techniques have been incorporated to alleviate these two problems, most of them fail to achieve further significant improvement in recommendation accuracy. First of a...
Article
Full-text available
We propose CCRank, the first parallel framework for evolutionary algorithms (EA) based learning to rank, aiming to significantly improve learning efficiency while maintain accuracy. CCRank is based on cooperative coevolution (CC), a divide-andconquer framework that has demonstrated high promise in function optimization for problems with large searc...
Article
Full-text available
Collaborative filtering (CF) is an effective technique addressing the information overload problem. CF approaches generally fall into two categories: rating based and ranking based. The former makes recommendations based on historical rating scores of items and the latter based on their rankings. Ranking-based CF has demonstrated advantages in reco...
Article
Automatic image annotation plays a critical role in modern keyword-based image retrieval systems. For this task, the nearest-neighbor–based scheme works in two phases: first, it finds the most similar neighbors of a new image from the set of labeled images; then, it propagates the keywords associated with the neighbors to the new image. In this art...
Article
Full-text available
The authors investigated the use of microblogs - or weibos - and related censorship practices using 111 million microblogs collected between 1 January and 30 June 2012. Using a matched case-control study design helped researchers determine a list of Chinese terms that discriminate censored and uncensored posts written by the same microbloggers. Thi...
Conference Paper
Full-text available
Most existing recommender systems can be classified into two categories: collaborative filtering and content-based filtering. Hybrid recommender systems combine the advantages of the two for improved recommendation performance. Traditional recommender systems are rating-based. However, predicting ratings is an intermediate step towards their ultima...
Conference Paper
Full-text available
In this paper, we introduce polygene-based evolution, a novel framework for evolutionary algorithms (EAs) that features distinctive operations in the evolution process. In traditional EAs, the primitive evolution unit is gene, where genes are independent components during evolution. In polygene-based evolutionary algorithms (PGEAs), the evolution u...
Conference Paper
Full-text available
Importance weighted active learning (IWAL) introduces a weighting scheme to measure the importance of each instance for correcting the sampling bias of the probability distributions between training and test datasets. However, the weighting scheme of IWAL involves the distribution of the test data, which can be straightforwardly estimated in active...
Conference Paper
Full-text available
Automatic image annotation plays an important role in modern keyword-based image retrieval systems. Recently, many neighbor-based methods have been proposed and achieved good performance for image annotation. However, existing work mainly focused on exploring a distance metric learning algorithm to determine the neighbors of an image, and neglected...
Article
Full-text available
Collaborative filtering (CF) is an effective technique addressing the information overload problem. Recently ranking-based CF methods have shown advantages in recommendation accuracy, being able to capture the preference similarity between users even if their rating scores differ significantly. In this study, we seek accuracy improvement of ranking...
Conference Paper
Full-text available
With an increasingly amount of information in web forums, quick comprehension of threads in web forums has become a challenging research problem. To handle this issue, this paper investigates the task of Web Forum Thread Summarization (WFTS), aiming to give a brief statement of each thread that involving multiple dynamic topics. When applied to the...
Article
We propose CCRank, the first parallel algorithm for learning to rank, targeting simultaneous improvement in learning accuracy and efficiency. CCRank is based on cooperative coevolution (CC), a divide-and-conquer framework that has demonstrated high promise in function optimization for problems with large search space and complex structures. Moreove...
Conference Paper
Full-text available
Learning to rank represents a category of effective ranking methods for information retrieval. While the primary concern of existing research has been accuracy, learning efficiency is becoming an important issue due to the unprecedented availability of large-scale training data and the need for continuous update of ranking functions. In this paper,...
Conference Paper
Full-text available
We propose CCRank, the first parallel algorithm for learning to rank, targeting simultaneous improvement in learning accuracy and efficiency. CCRank is based on cooperative coevolution (CC), a divide-and-conquer framework that has demonstrated high promise in function optimization for problems with large search space and complex structures. Moreove...
Conference Paper
Full-text available
One fundamental issue of learning to rank is the choice of loss function to be optimized. Although the evaluation measures used in Information Retrieval (IR) are ideal ones, in many cases they can't be used directly because they do not satisfy the smooth property needed in conventional machine learning algorithms. In this paper a new method named R...
Article
Modeling and refining behaviors of software systems are two crucial issues in the methodology of Model-Driven Development (MDD). Traditional methods include Unified Modeling Language (UML) based methods and formal methods. Recently integrated methods by taking full advantages of these two methods have received increasing attention. Unfortunately, t...
Article
It is quite difficult but essential for Genetic Programming (GP) to evolve the choice structures. Traditional approaches usually ignore this issue. They define some “if-structures” functions according to their problems by combining “if-else” statement, conditional criterions and elemental functions together. Obviously, these if-structure functions...
Article
In this paper, we propose RankIP, the first immune programming (IP) based ranking function discovery approach. IP is a novel evolution based machine learning algorithm with the principles of immune systems, which is verified to be superior to Genetic Programming (GP) on the convergence of algorithm according to their experimental results in Musilek...
Conference Paper
Full-text available
Web spam techniques enable some web pages or sites to achieve undeserved relevance and importance. They can seriously deteriorate search engine ranking results. Combating web spam has become one of the top challenges for web search. This paper proposes to learn a discriminating function to detect web spam by genetic programming. The evolution compu...