Online Learning in Online Auctions

University of California, Berkeley, Berkeley, California, United States
Theoretical Computer Science (Impact Factor: 0.66). 08/2003; 324(2-3). DOI: 10.1016/j.tcs.2004.05.012
Source: CiteSeer


We consider the problem of revenue maximization in online auctions, that is, auctions in which bids are received and dealt with one-by-one. In this note, we demonstrate that results from online learning can be usefully applied in this context, and we derive a new auction for digital goods that achieves a constant competitive ratio with respect to the best possible (o#ine) fixed price revenue. This substantially improves upon the best previously known competitive ratio [3] of O(exp( # log log h)) for this problem. We apply our techniques to the related problem of online posted price mechanisms, where the auctioneer declares a price and a bidder only communicates his acceptance/rejection of the price. For this problem we obtain results that are (somewhat surprisingly) similar to the online auction problem.

Download full-text


Available from: Vijay Kumar, Sep 21, 2015
10 Reads
  • Source
    • "To our knowledge, the evolution of consumers' values with usage has not been studied previously in mechanism design literature. A technically unrelated direction, falling under the broad umbrella of online mechanisms, studies revenue/welfare maximization when buyers arrive sequentially, with either adversarial value distributions, or values drawn from a known/unknown distribution [13] [2] [9] [12] [8] [1]. A special kind of online mechanism is a dynamic pricing mechanism where the seller posts a price for each buyer (as opposed to more general schemes like auctions). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider pricing in settings where a consumer discovers his value for a good only as he uses it, and the value evolves with each use. We explore simple and natural pricing strategies for a seller in this setting, under the assumption that the seller knows the distribution from which the consumer's initial value is drawn, as well as the stochastic process that governs the evolution of the value with each use. We consider the differences between up-front or "buy-it-now" pricing (BIN), and "pay-per-play" (PPP) pricing, where the consumer is charged per use. Our results show that PPP pricing can be a very effective mechanism for price discrimination, and thereby can increase seller revenue. But it can also be advantageous to the buyers, as a way of mitigating risk. Indeed, this mitigation of risk can yield a larger pool of buyers. We also show that the practice of offering free trials is largely beneficial. We consider two different stochastic processes for how the buyer's value evolves: In the first, the key random variable is how long the consumer remains interested in the product. In the second process, the consumer's value evolves according to a random walk or Brownian motion with reflection at 1, and absorption at 0.
  • Source
    • "However if the player spends too much time collecting information ( " exploration " ) then he might fail to play the optimal strategy sufficiently enough. Some applications of continuum armed bandit problems are in: (i) online auction mechanism design [1] [2] where the set of feasible prices is representable as an interval and, (ii) online oblivious routing [3] where S is a flow polytope. For a d-dimensional strategy space, if the only assumption made on the reward functions is on their degree of smoothness then any algorithm will incur worst-case regret which depends exponentially on d. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+r)$ of radius $1+r$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+r) \rightarrow \mathbb{R}$ are considered to intrinsically depend on $k \ll d$ unknown linear parameters so that $r(\mathbf{x}) = g(\mathbf{A} \mathbf{x})$ where $\mathbf{A}$ is a full rank $k \times d$ matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery literature and derive an efficient randomized algorithm which achieves a regret bound of $O(C(k,d) n^{\frac{1+k}{2+k}})$ with high probability. Here $C(k,d)$ is at most polynomial in $d$ and $k$ and $n$ is the number of rounds or the sampling budget which is assumed to be known beforehand.
  • Source
    • "Kleinberg and Leighton study a posted price repeated auction with goods sold sequentially to T bidders who either all have the same fixed private value, private values drawn from a fixed distribution, or private values that are chosen by an oblivious adversary (an adversary that acts independently of observed seller behavior) [15] (see also [7] [8] [14]). Cesa-Bianchi et al. study a related problem of setting the reserve price in a second price auction with multiple (but not repeated) bidders at each round [9]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting with a seller through a posted-price mechanism. We model the buyer as a strategic agent, whose goal is to maximize her long-term surplus, and we are interested in mechanisms that maximize the seller's long-term revenue. We define the natural notion of strategic regret --- the lost revenue as measured against a truthful (non-strategic) buyer. We present seller algorithms that are no-(strategic)-regret when the buyer discounts her future surplus --- i.e. the buyer prefers showing advertisements to users sooner rather than later. We also give a lower bound on strategic regret that increases as the buyer's discounting weakens and shows, in particular, that any seller algorithm will suffer linear strategic regret if there is no discounting.
    Advances in neural information processing systems 11/2013;
Show more