Online Learning in Online Auctions

Theoretical Computer Science (Impact Factor: 0.49). 08/2003; DOI: 10.1016/j.tcs.2004.05.012
Source: CiteSeer

ABSTRACT We consider the problem of revenue maximization in online auctions, that is, auctions in which bids are received and dealt with one-by-one. In this note, we demonstrate that results from online learning can be usefully applied in this context, and we derive a new auction for digital goods that achieves a constant competitive ratio with respect to the best possible (o#ine) fixed price revenue. This substantially improves upon the best previously known competitive ratio [3] of O(exp( # log log h)) for this problem. We apply our techniques to the related problem of online posted price mechanisms, where the auctioneer declares a price and a bidder only communicates his acceptance/rejection of the price. For this problem we obtain results that are (somewhat surprisingly) similar to the online auction problem.

  • [Show abstract] [Hide abstract]
    ABSTRACT: We study a multi-round optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure.
    Proceedings of the 5th conference on Innovations in theoretical computer science; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: What price should be offered to a worker for a task in an online labor market? How can one enable workers to express the amount they desire to receive for the task completion? Designing optimal pricing policies and determining the right monetary incentives is central to maximizing requester's utility and workers' profits. Yet, current crowdsourcing platforms only offer a limited capability to the requester in designing the pricing policies and often rules of thumb are used to price tasks. This limitation could result in inefficient use of the requester's budget or workers becoming disinterested in the task. In this paper, we address these questions and present mechanisms using the approach of regret minimization in online learning. We exploit a link between procurement auctions and multi-armed bandits to design mechanisms that are budget feasible, achieve near-optimal utility for the requester, are incentive compatible (truthful) for workers and make minimal assumptions about the distribution of workers' true costs. Our main contribution is a novel, no-regret posted price mechanism, BP-UCB, for budgeted procurement in stochastic online settings. We prove strong theoretical guarantees about our mechanism, and extensively evaluate it in simulations as well as on real data from the Mechanical Turk platform. Compared to the state of the art, our approach leads to a 180% increase in utility.
    Proceedings of the 22nd international conference on World Wide Web; 05/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many sequential decision making problems require an agent to balance exploration and exploitation to maximise long-term reward. Existing policies that address this tradeoff typically have parameters that are set a priori to control the amount of exploration. In finite-time problems, the optimal values of these parameters are highly dependent on the problem faced. In this paper, we propose adapting the amount of exploration performed on-line, as information is gathered by the agent. To this end we introduce a novel algorithm, e-ADAPT, which has no free parameters. The algorithm adapts as it plays and sequentially chooses whether to explore or exploit, driven by the amount of uncertainty in the system. We provide simulation results for the one armed bandit with covariates problem, which demonstrate the effectiveness of e-ADAPT to correctly control the amount of exploration in finite-time problems and yield rewards that are close to optimally tuned off-line policies. Furthermore, we show that e-ADAPT is robust to a high-dimensional covariate, as well as misspecified models. Finally, we describe how our methods could be extended to other sequential decision making problems, such as dynamic bandit problems with changing reward structures.
    Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on; 01/2010


Available from