Article

# Online Learning in Online Auctions

University of California, Berkeley, Berkeley, California, United States
(Impact Factor: 0.66). 08/2003; 324(2-3). DOI: 10.1016/j.tcs.2004.05.012
Source: CiteSeer

ABSTRACT

We consider the problem of revenue maximization in online auctions, that is, auctions in which bids are received and dealt with one-by-one. In this note, we demonstrate that results from online learning can be usefully applied in this context, and we derive a new auction for digital goods that achieves a constant competitive ratio with respect to the best possible (o#ine) fixed price revenue. This substantially improves upon the best previously known competitive ratio [3] of O(exp( # log log h)) for this problem. We apply our techniques to the related problem of online posted price mechanisms, where the auctioneer declares a price and a bidder only communicates his acceptance/rejection of the price. For this problem we obtain results that are (somewhat surprisingly) similar to the online auction problem.

### Full-text

Available from: Vijay Kumar, Sep 21, 2015
0 Followers
·
• Source
• "To our knowledge, the evolution of consumers' values with usage has not been studied previously in mechanism design literature. A technically unrelated direction, falling under the broad umbrella of online mechanisms, studies revenue/welfare maximization when buyers arrive sequentially, with either adversarial value distributions, or values drawn from a known/unknown distribution [13] [2] [9] [12] [8] [1]. A special kind of online mechanism is a dynamic pricing mechanism where the seller posts a price for each buyer (as opposed to more general schemes like auctions). "
##### Article: How to sell an app: pay-per-play or buy-it-now?
[Hide abstract]
ABSTRACT: We consider pricing in settings where a consumer discovers his value for a good only as he uses it, and the value evolves with each use. We explore simple and natural pricing strategies for a seller in this setting, under the assumption that the seller knows the distribution from which the consumer's initial value is drawn, as well as the stochastic process that governs the evolution of the value with each use. We consider the differences between up-front or "buy-it-now" pricing (BIN), and "pay-per-play" (PPP) pricing, where the consumer is charged per use. Our results show that PPP pricing can be a very effective mechanism for price discrimination, and thereby can increase seller revenue. But it can also be advantageous to the buyers, as a way of mitigating risk. Indeed, this mitigation of risk can yield a larger pool of buyers. We also show that the practice of offering free trials is largely beneficial. We consider two different stochastic processes for how the buyer's value evolves: In the first, the key random variable is how long the consumer remains interested in the product. In the second process, the consumer's value evolves according to a random walk or Brownian motion with reflection at 1, and absorption at 0.
• Source
• "However if the player spends too much time collecting information ( " exploration " ) then he might fail to play the optimal strategy sufficiently enough. Some applications of continuum armed bandit problems are in: (i) online auction mechanism design [1] [2] where the set of feasible prices is representable as an interval and, (ii) online oblivious routing [3] where S is a flow polytope. For a d-dimensional strategy space, if the only assumption made on the reward functions is on their degree of smoothness then any algorithm will incur worst-case regret which depends exponentially on d. "
##### Article: Stochastic continuum armed bandit problem of few linear parameters in high dimensions
[Hide abstract]
ABSTRACT: We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+r)$ of radius $1+r$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+r) \rightarrow \mathbb{R}$ are considered to intrinsically depend on $k \ll d$ unknown linear parameters so that $r(\mathbf{x}) = g(\mathbf{A} \mathbf{x})$ where $\mathbf{A}$ is a full rank $k \times d$ matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery literature and derive an efficient randomized algorithm which achieves a regret bound of $O(C(k,d) n^{\frac{1+k}{2+k}})$ with high probability. Here $C(k,d)$ is at most polynomial in $d$ and $k$ and $n$ is the number of rounds or the sampling budget which is assumed to be known beforehand.
• Source
• "Kleinberg and Leighton study a posted price repeated auction with goods sold sequentially to T bidders who either all have the same fixed private value, private values drawn from a fixed distribution, or private values that are chosen by an oblivious adversary (an adversary that acts independently of observed seller behavior) [15] (see also [7] [8] [14]). Cesa-Bianchi et al. study a related problem of setting the reserve price in a second price auction with multiple (but not repeated) bidders at each round [9]. "
##### Article: Learning Prices for Repeated Auctions with Strategic Buyers
[Hide abstract]