About
14
Publications
1,303
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
399
Citations
Introduction
Qingyun Wu currently works at the Department of Computer Science , University of Virginia. Qingyun does research in Machine Learning, Artificial Intelligence, and Information Science. Their most recent publication is 'Learning Contextual Bandits in a Non-stationary Environment'.
Skills and Expertise
Publications
Publications (14)
In this paper, we study the problem of online influence maximization in social networks. In this problem, a learner aims to identify the set of "best influencers" in a network by interacting with the network, i.e., repeatedly selecting seed nodes and observing activation feedback in the network. We capitalize on an important property of the influen...
Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly. The key to such algorithms is an unbiased estimate of gradients, which is often (trivially) achieved by uniformly sampling from the entire parameter space. Unfortunately, this leads to high-variance in gradient estimation, resulting in high regret during model u...
Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly. The key of such algorithms is an unbiased estimation of gradients, which is often (trivially) achieved by uniformly sampling from the entire parameter space. This unfortunately introduces high-variance in gradient estimation, and leads to a worse regret of model...
We study the problem of online influence maximization in social networks. In this problem, a learner aims to identify the set of "best influencers" in a network by interacting with it, i.e., repeatedly selecting seed nodes and observing activation feedback in the network. We capitalize on an important property of the influence maximization problem...
Recommender systems have to handle a highly non-stationary environment, due to users' fast changing interests over time. Traditional solutions have to periodically rebuild their models, despite high computational cost. But this still cannot empower them to automatically adjust to abrupt changes in trends caused by timely information. It is importan...
Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This...
In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution is formulated to balance three competing factors during online learning, including exploitation...
We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable context...
We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the ex-ploit/explore trade-off during online learning. Observable con-te...
Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. Most contextual bandit algorithms simply assume the learner would have access to the entire set of features, which govern the generation of payoffs from a user to an item. However...
Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit param...