Article
Online Learning in Online Auctions
Theoretical Computer Science (Impact Factor: 0.49). 08/2003; DOI: 10.1016/j.tcs.2004.05.012
Source: CiteSeer

Conference Paper: Sequential decision making with vector outcomes
[Show abstract] [Hide abstract]
ABSTRACT: We study a multiround optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure.Proceedings of the 5th conference on Innovations in theoretical computer science; 01/2014  [Show abstract] [Hide abstract]
ABSTRACT: What price should be offered to a worker for a task in an online labor market? How can one enable workers to express the amount they desire to receive for the task completion? Designing optimal pricing policies and determining the right monetary incentives is central to maximizing requester's utility and workers' profits. Yet, current crowdsourcing platforms only offer a limited capability to the requester in designing the pricing policies and often rules of thumb are used to price tasks. This limitation could result in inefficient use of the requester's budget or workers becoming disinterested in the task. In this paper, we address these questions and present mechanisms using the approach of regret minimization in online learning. We exploit a link between procurement auctions and multiarmed bandits to design mechanisms that are budget feasible, achieve nearoptimal utility for the requester, are incentive compatible (truthful) for workers and make minimal assumptions about the distribution of workers' true costs. Our main contribution is a novel, noregret posted price mechanism, BPUCB, for budgeted procurement in stochastic online settings. We prove strong theoretical guarantees about our mechanism, and extensively evaluate it in simulations as well as on real data from the Mechanical Turk platform. Compared to the state of the art, our approach leads to a 180% increase in utility.Proceedings of the 22nd international conference on World Wide Web; 05/2013  [Show abstract] [Hide abstract]
ABSTRACT: Many sequential decision making problems require an agent to balance exploration and exploitation to maximise longterm reward. Existing policies that address this tradeoff typically have parameters that are set a priori to control the amount of exploration. In finitetime problems, the optimal values of these parameters are highly dependent on the problem faced. In this paper, we propose adapting the amount of exploration performed online, as information is gathered by the agent. To this end we introduce a novel algorithm, eADAPT, which has no free parameters. The algorithm adapts as it plays and sequentially chooses whether to explore or exploit, driven by the amount of uncertainty in the system. We provide simulation results for the one armed bandit with covariates problem, which demonstrate the effectiveness of eADAPT to correctly control the amount of exploration in finitetime problems and yield rewards that are close to optimally tuned offline policies. Furthermore, we show that eADAPT is robust to a highdimensional covariate, as well as misspecified models. Finally, we describe how our methods could be extended to other sequential decision making problems, such as dynamic bandit problems with changing reward structures.Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on; 01/2010
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.