- [Show abstract] [Hide abstract]

**ABSTRACT:**What price should be offered to a worker for a task in an online labor market? How can one enable workers to express the amount they desire to receive for the task completion? Designing optimal pricing policies and determining the right monetary incentives is central to maximizing requester's utility and workers' profits. Yet, current crowdsourcing platforms only offer a limited capability to the requester in designing the pricing policies and often rules of thumb are used to price tasks. This limitation could result in inefficient use of the requester's budget or workers becoming disinterested in the task. In this paper, we address these questions and present mechanisms using the approach of regret minimization in online learning. We exploit a link between procurement auctions and multi-armed bandits to design mechanisms that are budget feasible, achieve near-optimal utility for the requester, are incentive compatible (truthful) for workers and make minimal assumptions about the distribution of workers' true costs. Our main contribution is a novel, no-regret posted price mechanism, BP-UCB, for budgeted procurement in stochastic online settings. We prove strong theoretical guarantees about our mechanism, and extensively evaluate it in simulations as well as on real data from the Mechanical Turk platform. Compared to the state of the art, our approach leads to a 180% increase in utility.Proceedings of the 22nd international conference on World Wide Web; 05/2013 -
##### Article: Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

[Show abstract] [Hide abstract]

**ABSTRACT:**We consider a dynamic pricing problem under unknown demand models. In this problem a seller offers prices to a stream of customers and observes either success or failure in each sale attempt. The underlying demand model is unknown to the seller and can take one of N possible forms. In this paper, we show that this problem can be formulated as a multi-armed bandit with dependent arms. We propose a dynamic pricing policy based on the likelihood ratio test. We show that the proposed policy achieves complete learning, i.e., it offers a bounded regret where regret is defined as the revenue loss with respect to the case with a known demand model. This is in sharp contrast with the logarithmic growing regret in multi-armed bandit with independent arms.06/2012; -
##### Conference Paper: Sequential decision making with vector outcomes

[Show abstract] [Hide abstract]

**ABSTRACT:**We study a multi-round optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure.Proceedings of the 5th conference on Innovations in theoretical computer science; 01/2014

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.