Article

# Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

Pricing managers at online retailers face a unique challenge. They must decide on real-time prices for a large number of products with incomplete demand information. The manager runs price experiments to learn about each product’s demand curve and the profitmaximizing price. In practice, balanced field price experiments can create high opportunity costs, because a large number of customers are presented with suboptimal prices. In this paper, we propose an alternative dynamic price experimentation policy. The proposed approach extends multiarmed bandit (MAB) algorithms from statistical machine learning to include microeconomic choice theory. Our automated pricing policy solves this MAB problem using a scalable distribution-free algorithm. We prove analytically that our method is asymptotically optimal for any weakly downward sloping demand curve. In a series of MonteCarlo simulations,we showthat the proposed approach performs favorably compared with balanced field experiments and standard methods in dynamic pricing from computer science. In a calibrated simulation based on an existing pricing field experiment, we find that our algorithm can increase profits by 43% during the month of testing and 4% annually.

## No full-text available

... Learning incomplete demand information from pricing experiments is in emerging demand from industrial practice due to the opportunity cost accompanied with sub-optimal price and inefficient exploration. [Misra et al., 2019]. Unfortunately, even when algorithms find optimal prices successfully, the offered price degrades to sub-optimal once the market environment shifts. ...
... Solving this fundamental problem based on existing knowledge of MAB pricing [Misra et al., 2019] requires overcoming three major challenges: ...
... [Gupta et al., 2018, Besbes et al., 2014, Slivkins and Upfal, 2008, Gupta et al., 2020. Our framework is based on [Misra et al., 2019], where an upper confidence bound-based dynamic price experimentation policy was proposed under the stationary assumption and extends MAB to microeconomic choice theory. Our work further investigates the performance of Thompson sampling and Information Directed Sampling and identified the transfer inability in these methods when the market environment shifts. ...
Preprint
Full-text available
This paper presents a novel non-stationary dynamic pricing algorithm design, where pricing agents face incomplete demand information and market environment shifts. The agents run price experiments to learn about each product's demand curve and the profit-maximizing price, while being aware of market environment shifts to avoid high opportunity costs from offering sub-optimal prices. The proposed ACIDP extends information-directed sampling (IDS) algorithms from statistical machine learning to include microeconomic choice theory, with a novel pricing strategy auditing procedure to escape sub-optimal pricing after market environment shift. The proposed ACIDP outperforms competing bandit algorithms including Upper Confidence Bound (UCB) and Thompson sampling (TS) in a series of market environment shifts.
... been widely used for a variety of tasks [Bastani and Bayati, 2020;Segal et al., 2018;Misra et al., 2019] including ranger patrols to prevent poaching [Xu et al., 2021a]. In this poaching prevention setting, the patrol planner is tasked with repeatedly and efficiently allocating a limited number of patrol resources across different locations within the park [Plumptre et al., 2014;Fang et al., 2016;Xu et al., 2021b]. ...
... Multi-armed bandits MABs [Lattimore and Szepesvári, 2020] have been applied to resource allocation for healthcare [Bastani and Bayati, 2020], education [Segal et al., 2018], and dynamic pricing [Misra et al., 2019]. These papers solve various versions of the stochastic MAB problem [Auer et al., 2002]. ...
Preprint
Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.
... Each of the arms is associated with a fixed but unknown probability distribution [5,33]. An enormous literature has accumulated over the past decades on the MAB problem, such as clinical trials and drug testing [6,19], recommendation system and online advertising [7,9,42,51,56], information retrieval [8,38], and finance [23,39,40,48]. From a theoretical perspective, the MAB problem was first studied in the seminal work of [44] and followed by a vast line of work to study in regret minimization [2,4,5,10,14,18,34,36,49,53] and pure exploration [11,21,37,46]. ...
Preprint
Full-text available
This work studies the pure-exploration setting for the convex hull feasibility (CHF) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHF problem in the one-dimensional setting. We present the first asymptotically optimal algorithm called Thompson-CHF, whose modular design consists of a stopping rule and a sampling rule. In addition, we provide an extension of the algorithm that generalizes several important problems in the multi-armed bandit literature. Finally, we further investigate the Gaussian bandit case with unknown variances and address how the Thompson-CHF algorithm can be adjusted to be asymptotically optimal in this setting.
... The loop connections of Multimedia and Artificial Intelligence are illustrated below (AI). This is Fig. 3. Fig. 3: The "Loop" of Multimedia image processing with Artificial Intelligence [42]. This article discusses about how multimedia and AI interact on two levels: the content level and the application level. ...
Article
Full-text available
Purpose: The provision of a method for thoughtful decision-making is the core purpose of artificial intelligence research and development. The primary goal of artificial intelligence (AI) is to give computers the ability to do intellectual tasks such as making decisions, solving problems, seeing their surroundings, and understanding human communication. Amazon is famous for using robots—roughly 30,000 of them—within its distribution centres. The company has gained mechanical autonomous organisation, which allows the robots to function independently. Kiva in 2012. Retailers can improve their demand estimates, make better pricing decisions, and optimise product placement with the aid of AI. The end result is that customers are connected with the proper products at the suitable time, in the appropriate place, and at the appropriate price. Utilizing predictive analytics may assist in determining the amount of a product that should be ordered to ensure that shops do not end up with either an excess or a shortage of inventory. Design/Methodology/Approach: The efficiency of our workplaces may be substantially improved by the use of artificial intelligence. When AI is utilised to do tasks that are boring or dangerous, human workers are given more time to concentrate on endeavours that need capabilities such as creativity and empathy, amongst other abilities. Artificial intelligence (AI) may be of assistance to a corporation in three different areas: the automation of corporate processes; the acquisition of insight through data analysis; and the interaction with consumers and staff. Findings/Result: As a result of AI, individuals will be freed up to focus on the 20% of non-routine jobs that account for 80% of the value they create. In the future, "intelligent automation of process change" (IAPC) will be used by smart machines to constantly examine and improve the whole process of a business's response to artificial intelligence. Automated and optimised everyday chores save your time and money and improves operational efficiency and productivity. The outputs of cognitive technology may help you make quicker business judgments and benefits the AI in the Workplace. Doing the same thing over and over again might take a toll on your workers' productivity. Automating and optimising these procedures will save you money. Employees will be able to concentrate on multiple areas at once, resulting in increased production. As AI processes data more quickly than humans, this leads to increased productivity. Originality/Value: It is essential to have a deeper comprehension of the differences between AI and human intelligence if we are to be adequately prepared for a society in which AI will play a much more pervasive role in our everyday lives in the foreseeable future. The process of reproduction is intricately connected to both biological and human intelligence. The advancement of artificial intelligence (AI) is significant for the reason that it paves the way for software to perform human activities and cost-effective than it was previously possible. Paper Type: Company Analysis.
... The uncertainty comes from the fact that the reward distribution for each action is unknown and must be estimated based on previously observed actions and rewards. Bandit problems are frequently encountered in real-world problems, including clinical trials [28], [14], dynamic pricing [66], [67] and recommendation systems [60], to name just a few. ...
Preprint
PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes performance bounds for bandit problems and an experimental comparison of these bounds. Our experimental comparison has revealed that available PAC-Bayes upper bounds on the cumulative regret are loose, whereas available PAC-Bayes lower bounds on the expected reward can be surprisingly tight. We found that an offline contextual bandit algorithm that learns a policy by optimising a PAC-Bayes bound was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees.
... On the other hand, exploiting monotonicity allows for an empirical improvement in performance (Mussi et al. 2022). The same argument also holds for the work proposed by Misra, Schwartz, and Abernethy (2019), where the monotonicity property of the demand function is used to ensure faster convergence. However, monotonicity is not forced as a model-specific feature. ...
Preprint
Full-text available
According to the main international reports, more pervasive industrial and business-process automation, thanks to machine learning and advanced analytic tools, will unlock more than 14 trillion USD worldwide annually by 2030. In the specific case of pricing problems-which constitute the class of problems we investigate in this paper-, the estimated unlocked value will be about 0.5 trillion USD per year. In particular, this paper focuses on pricing in e-commerce when the objective function is profit maximization and only transaction data are available. This setting is one of the most common in real-world applications. Our work aims to find a pricing strategy that allows defining optimal prices at different volume thresholds to serve different classes of users. Furthermore, we face the major challenge, common in real-world settings, of dealing with limited data available. We design a two-phase online learning algorithm, namely PVD-B, capable of exploiting the data incrementally in an online fashion. The algorithm first estimates the demand curve and retrieves the optimal average price, and subsequently it offers discounts to differentiate the prices for each volume threshold. We ran a real-world 4-month-long A/B testing experiment in collaboration with an Italian e-commerce company, in which our algorithm PVD-B-corresponding to A configuration-has been compared with human pricing specialists-corresponding to B configuration. At the end of the experiment, our algorithm produced a total turnover of about 300 KEuros, outperforming the B configuration performance by about 55%. The Italian company we collaborated with decided to adopt our algorithm for more than 1,200 products since January 2022.
... Alternative conditions on the objective function that ensure similar rates of convergence can be found in Broadie et al. (2011). The continuous-armed bandit literature proposes several alternative policies to learn the maximum of an objective function, which can also be applied to maximize the expected revenue or profit as function of price: see, e.g., Kleinberg and Leighton (2003), Auer et al. (2007), Cope (2009), Combes and Proutiere (2014), Trovò et al. (2018), Misra et al. (2019). The algorithms proposed in these papers usually make regularity assumptions on the unknown demand function that implies existence of a unique optimal price vector. ...
Chapter
Determining the right price is a fundamental business problem that can be addressed by data-driven methods. In this chapter, we discuss several pricing policies that learn the optimal price from accumulating sales data, both in parametric and nonparametric models, and both for single-product and multiple product settings. We also discuss possible future directions for research: product differentiation, online marketplaces, and Brownian approximations.
... Multi-armed bandit (MAB) problems are sequential decision making problems where an agent sequentially selects arms to pull and receives a random reward in order to learn the reward distributions of all arms and at the same time to find a strategy that maximizes the total expected reward. Many applications, ranging from treatment design [1] and news article recommendation [2] to online marketing [3], can be formulated as MAB problems. However, risk-neutral formulations that only maximize the total expected reward do not always provide desirable solutions. ...
Preprint
Full-text available
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.
... the use of ML should be the main technique of the article, the ML techniques should have been declared by the authors (papers not showing a learning process were excluded), and the article must provide enough information concerning the technique used. Furthermore, the articles must define the technique as ML and show the application of a case with real data from verified sources (not experimental or simulated examples [27], [28]). Some articles that used semisupervised learning but did not show a real application were rejected [29]- [31]. ...
Article
Full-text available
Even though machine learning (ML) applications are not novel, they have gained popularity partly due to the advance in computing processing. This study explores the adoption of ML methods in marketing applications through a bibliographic review of the period 2008–2022. In this period, the adoption of ML in marketing has grown significantly. This growth has been quite heterogeneous, varying from the use of classical methods such as artificial neural networks to hybrid methods that combine different techniques to improve results. Generally, maturity in the use of ML in marketing and increasing specialization in the type of problems that are solved were observed. Strikingly, the types of ML methods used to solve marketing problems vary wildly, including deep learning, supervised learning, reinforcement learning, unsupervised learning, and hybrid methods. Finally, we found that the main marketing problems solved with machine learning were related to consumer behavior, recommender systems, forecasting, marketing segmentation, and text analysis—content analysis.
... Thirdly, with big data, one observes a rise of machine learning techniques in the studies of consumer behaviours in the marketing field (Chintagunta et al., 2016). However, a fundamental critique is the non-existence of theory in machine learning algorithms (Misra et al., 2019). Using a machine learning model (decision tree), we bring a new approach which associates [2425,1163,173,14] class consumers' behaviour and machine learning to respond to that criticism. ...
Preprint
Full-text available
Introduction. Until now, the impact of learning variables on consumers' choices concerning Chinese product brands in the international online shopping framework remains unknown. Accordingly, this study aims to examine the effect of those learning variables on global consumers' choices of Chinese product brands. Method. A total of 44,704 transactions related to the buying process have been collected from a programming language and the Octopus Software within a Chinese International Online Shopping platform. Analysis. The 44,704 transactions have been analyzed through a Decision Tree. Results. The study points out that the number of e-retailers' subscribers reinforces the international consumers' trust online. At the same time, the pricing levels and quantity of product availability are used by global online consumers to assess the originality of Chinese product brands. Conclusions. First, this study extends the existing literature on consumer learning by going beyond the learning variables considered. Second, the study boosts consumer learning literature by elucidating the most significant learning variables guiding international online consumers' choices and purchases. The application of the results will enable brands and e-retailers to understand (1) the stages of the international online consumers' choice; (2) the buying strategies of global consumers.
... Recently, bandit algorithms have found practical application in areas from dynamic pricing [Misra et al., 2019] and healthcare [Durand et al., 2018] to finance [Shen et al., 2015] and recommender systems [McInerney et al., 2018], and many more. For many of these application areas, generalised linear bandit algorithms are among the most commonly used approaches, as they are able to capture the structure of the rewards and actions often seen in practice. ...
Preprint
Full-text available
The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, in many real world settings, the requirement that the reward is observed immediately is not applicable. In this setting, standard algorithms are no longer theoretically understood. We study the phenomenon of delayed rewards in a theoretical manner by introducing a delay between selecting an action and receiving the reward. Subsequently, we show that an algorithm based on the optimistic principle improves on existing approaches for this setting by eliminating the need for prior knowledge of the delay distribution and relaxing assumptions on the decision set and the delays. This also leads to improving the regret guarantees from $\widetilde O(\sqrt{dT}\sqrt{d + \mathbb{E}[\tau]})$ to $\widetilde O(d\sqrt{T} + d^{3/2}\mathbb{E}[\tau])$, where $\mathbb{E}[\tau]$ denotes the expected delay, $d$ is the dimension and $T$ the time horizon and we have suppressed logarithmic terms. We verify our theoretical results through experiments on simulated data.
... More specifically, we used RL to optimize this marketing decision. Within the domain of RL, researchers have explored how this approach to optimize marketing decisions (Schwartz et al., 2017) such as framing and framing dynamic pricing policy (Misra et al., 2019) but its applicability to ad optimization has been rather limited. ...
Article
Full-text available
One of the core challenges in digital marketing is that the business conditions continuously change, which impacts the reception of campaigns. A winning campaign strategy can become unfavored over time, while an old strategy can gain new traction. In data driven digital marketing and web analytics, A/B testing is the prevalent method of comparing digital campaigns, choosing the winning ad, and deciding targeting strategy. A/B testing is suitable when testing variations on similar solutions and having one or more metrics that are clear indicators of success or failure. However, when faced with a complex problem or working on future topics, A/B testing fails to deliver and achieving long-term impact from experimentation is demanding and resource intensive. This study proposes a reinforcement learning based model and demonstrates its application to digital marketing campaigns. We argue and validate with actual-world data that reinforcement learning can help overcome some of the critical challenges that A/B testing, and popular Machine Learning methods currently used in digital marketing campaigns face. We demonstrate the effectiveness of the proposed technique on real actual data for a digital marketing campaign collected from a firm.
... In digital marketing, RL is expected to revitalize the industry and modernize various operations. For example, prior research has applied RL to solve digital marketing problems related to search [12,27,29,30], recommendation [5,23,34,35], online advertising [4,14,21,28,32,33], and pricing [18]. However, most prior works in this stream of literature only focus on a single objective. ...
Preprint
Full-text available
We utilize an offline reinforcement learning (RL) model for sequential targeted promotion in the presence of budget constraints in a real-world business environment. In our application, the mobile app aims to boost customer retention by sending cash bonuses to customers and control the costs of such cash bonuses during each time period. To achieve the multi-task goal, we propose the Budget Constrained Reinforcement Learning for Sequential Promotion (BCRLSP) framework to determine the value of cash bonuses to be sent to users. We first find out the target policy and the associated Q-values that maximizes the user retention rate using an RL model. A linear programming (LP) model is then added to satisfy the constraints of promotion costs. We solve the LP problem by maximizing the Q-values of actions learned from the RL model given the budget constraints. During deployment, we combine the offline RL model with the LP model to generate a robust policy under the budget constraints. Using both online and offline experiments, we demonstrate the efficacy of our approach by showing that BCRLSP achieves a higher long-term customer retention rate and a lower cost than various baselines. Taking advantage of the near real-time cost control method, the proposed framework can easily adapt to data with a noisy behavioral policy and/or meet flexible budget constraints.
... Companies can conduct price experiments (changing prices) to understand demand and maximise long-term profits. Misra et al. [28] proposed an experimental policy for dynamic pricing. In their work, they derived an extended bandit algorithm that balances the learning of immediate profits and future profits, combining economy theory. ...
Article
Full-text available
With the rapid development of the social economy, consumer demand is evolving towards diversification. To satisfy market demand, enterprises tend to improve competitiveness by providing differentiated products. How to price differentiated products becomes a hot topic. Traditionally, customers' preferences are assumed to be independent and identically distributed. With a known distribution, companies can easily make pricing decisions for differentiated products. However, such an assumption may be invalid in practice, especially for rapidly updating products. In this paper, a dynamic pricing policy for differentiated products with incomplete information is developed. An adaptive multi‐armed bandit algorithm based on reinforcement learning is proposed to balance exploration and exploitation. Numerical examples show that the frequency of price adjustment affects the total profit significantly. Specifically, the more chances to adjust the price, the higher the total profit. Furthermore, experiments show that the dynamic pricing policy proposed in this paper outperforms other algorithms, such as Softmax and UCB1.
... In this paper, we focus on the rested bandit setting, which has been applied to many real-world problems such as online advertising/recommendations, machine scheduling, and dynamic pricing (see, e.g., Mahajan and Teneketzis 2008, Levine et al. 2017, Warlop et al. 2018, Misra et al. 2019, Roy et al. 2020). An example is a simple machine scheduling problem (Mahajan and Teneketzis 2008) that involves multiple machines (arms) and one operator. ...
Preprint
Full-text available
The rested bandit is a classical multi-armed bandit (MAB) model that assumes that the state of each arm evolves according to a Markov process. It has more applications than the original MAB because it relaxes the assumption of independent observation sequence. Moreover, the rested bandit admits an optimal index policy-the Gittins index policy-which is computationally tractable. However, when applying the Gittins index policy to real-world problems, computing the indices requires complete information on the embedded Markov chains. This is impractical in many situations in which decisions must be made in an online manner. In this paper, we consider a problem in which decisions must be made sequentially for a rested bandit with unknown transitions of its arms. We develop an online algorithm, the state-dependent successive elimination algorithm (SDSEA), based on the successive elimination idea for the original MAB with independent and identically distributed sequences. We show that the proposed SDSEA is an efficient probably approximately correct (PAC) algorithm with a high sample efficiency, and it also offers computational tractability. This approach can also be extended to obtain an (ε, δ)-PAC learning algorithm based on an empirical Gittins index. In addition, we prove a uniform lower bound, for the first time, on the sample complexity of all efficient PAC learning algorithms for the rested bandit. We clarify that the sample complexity of the SDSEA is near optimal compared with the lower bound. Simulation studies are conducted to demonstrate the performance of the SDSEA and to demonstrate its superiority over some existing algorithms.
... Bradlow et al. (2017) provide a detailed example of a pricing experiment covering 14 categories and 788 SKUs of a large retailer. Misra et al. (2019) developed an automated pricing system for a large number of products in the face of limited demand information. The authors use the multiarmed bandit approach described above, and the pricing algorithm involves varying prices systematically over time to optimize the tradeoff between the cost of demand learning and profitability. ...
Article
The fast-paced growth of e-commerce is rapidly changing consumers’ shopping habits and shaping the future of the retail industry. While online retailing has allowed companies to overcome geographic barriers to selling and helped them achieve operational efficiencies, offline retailers have struggled to compete with online retailers, and many retailers have chosen to operate both online and offline. This paper presents a review of the literature on the interaction between e-commerce and offline retailing, highlighting empirical findings and generalizable insights, and discussing their managerial implications. Our review includes studies published in more than 50 different academic journals spanning various disciplines from the inception of the internet to present. We organize our paper around three main research questions. First, what is the relationship between online and offline retail channels including competition and complementarity between online and offline sellers as well as online and offline channels of an omnichannel retailer? Under this question we also try to understand the impact of e-commerce on market structure and what factors impact the intensity of competition /complementarity. Second, what is the impact of e-commerce on consumer behavior? We specifically investigate how e-commerce has impacted consumer search, its implications for price dispersion, and user generated content. Third, how has e-commerce impacted retailers’ key managerial decisions? The key research questions under this heading include: (i) What is the impact of big data on retailing? (ii) What is the impact of digitization on retailer outcomes? (iii) What is the impact of e-commerce on sales concentration? (iv) What is the impact of e-commerce and platforms on pricing? And (v) How should retailers manage product returns across online and offline channels? Under each section, we also develop detailed recommendations for future research which we hope will inspire continued interest in this domain.
... These criteria were the following: 145 the use of ML should be the main technique of the article, 146 the ML techniques should have been declared by the authors 147 (papers not showing a learning process were excluded), and 148 the article must provide enough information concerning the 149 technique used. Furthermore, the articles must define the 150 technique as ML and show the application of a case with 151 real data from verified sources (not experimental or simulated 152 examples [27], [28]). Some articles that used semi-supervised 153 learning but did not show a real application were rejected 154 [29], [30], [31]. ...
... Kephart and Tesauro [2000] use Q-learning in one of such settings, and Könönen [2006] use Q-learning with function approximation as well as policy gradient method. Misra et al. [2019] proposed a multi-armed bandit based algorithm for a multi-period dynamic pricing problem, where firms face ambiguity. Hansen et al. [2020], Trovo et al. [2015] studied the applicability of MAB algorithms on dynamic pricing problems and proposed variants of the UCB algorithm. ...
Preprint
Full-text available
We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting for modeling repeated Cournot oligopoly games, where the firms acting as agents choose from the set of arms representing production quantity (a discrete value). Agents interact with separate and independent bandit problems. In this formulation, each agent makes sequential choices among arms to maximize its own reward. Agents do not have any information about the environment; they can only see their own rewards after taking an action. However, the market demand is a stationary function of total industry output, and random entry or exit from the market is not allowed. Given these assumptions, we found that an $\epsilon$-greedy approach offers a more viable learning mechanism than other traditional MAB approaches, as it does not require any additional knowledge of the system to operate. We also propose two novel approaches that take advantage of the ordered action space: $\epsilon$-greedy+HL and $\epsilon$-greedy+EL. These new approaches help firms to focus on more profitable actions by eliminating less profitable choices and hence are designed to optimize the exploration. We use computer simulations to study the emergence of various equilibria in the outcomes and do the empirical analysis of joint cumulative regrets.
... They showed that full collusion can be achieved with their approach; however, they did not study the scalability of their approach. There is a wide variety of work that uses RL and MAB approaches for dynamic pricing problems; the market models considered in these works are diverse [Den Boer, 2015, Kephart and Tesauro, 2000, Könönen, 2006, Misra et al., 2019, Hansen et al., 2020, Trovo et al., 2015. ...
Preprint
Many past attempts at modeling repeated Cournot games assume that demand is stationary. This does not align with real-world scenarios in which market demands can evolve over a product's lifetime for a myriad of reasons. In this paper, we model repeated Cournot games with non-stationary demand such that firms/agents face separate instances of non-stationary multi-armed bandit problem. The set of arms/actions that an agent can choose from represents discrete production quantities; here, the action space is ordered. Agents are independent and autonomous, and cannot observe anything from the environment; they can only see their own rewards after taking an action, and only work towards maximizing these rewards. We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $\epsilon$-greedy' which is remotely based on the well-known $\epsilon$-greedy approach. This algorithm detects and quantifies changes in rewards due to varying market demand and varies learning rate and exploration rate in proportion to the degree of changes in demand, thus enabling agents to better identify new optimal actions. For efficient exploration, it also deploys a mechanism for weighing actions that takes advantage of the ordered action space. We use simulations to study the emergence of various equilibria in the market. In addition, we study the scalability of our approach in terms number of total agents in the system and the size of action space. We consider both symmetric and asymmetric firms in our models. We found that using our proposed method, agents are able to swiftly change their course of action according to the changes in demand, and they also engage in collusive behavior in many simulations.
... Multi-armed bandits are increasingly replacing A/B tests as tools to optimize the process of "earning while learning." In the recent academic literature, multi-armed bandit modeling has been incorporated into real-time website optimization , online advertising (Baardman et al. 2019, Schwartz et al. 2017, and pricing problems (Misra et al. 2019). Liberali and Ferecatu (2019) propose a real-time optimization method in which a hidden Markov model is first used to infer consumers' position in the purchase funnel. ...
Article
Full-text available
Managers frequently explore new strategies, and exploit familiar ones, when making decisions on new product development, pricing, or advertising. Exploring for too long, or exploiting too soon, will generate inferior financial returns. Our research describes decision makers’ exploration/exploitation trade-offs and their link to psychometric traits. We conduct an incentive-aligned study in which subjects play a multiarmed bandit experiment and evaluate how subjects balance exploration and exploitation, linked to psychometric traits. To formally describe exploration/exploitation trade-offs, we develop a behavioral model that captures latent dynamics in learning behavior. Subjects transition between three unobserved states—exploration, exploitation, and inertia—updating their beliefs about expected payoffs. Our analysis suggests that decision makers overexplore low-performing options, forgoing over 30% of potential revenue. They heavily rely on recent experiences. Risk-averse decision makers spend more time exploring. Maximizers are more sensitive to payoffs than satisficers. Our research builds the groundwork needed to devise remedial actions aimed at helping managers find an optimal balance between exploration and exploitation. One way to achieve this goal is by carefully designing the learning environment. In two additional studies, we analyze the evolution of exploration/exploitation trade-offs across different learning environments. Offering decision makers repeated opportunities to learn and increasing the planning horizon appears beneficial.
... Contextual bandit algorithms aims to seek a balance between exploration and exploitation, which have been used in several applications, such as recommender systems [21], dynamic pricing [22], quantitative finance [30] and so on. [4] reviews the existing practical applications of contextual bandit algorithms. ...
Preprint
Full-text available
User interest exploration is an important and challenging topic in recommender systems, which alleviates the closed-loop effects between recommendation models and user-item interactions. Contextual bandit (CB) algorithms strive to make a good trade-off between exploration and exploitation so that users' potential interests have chances to expose. However, classical CB algorithms can only be applied to a small, sampled item set (usually hundreds), which forces the typical applications in recommender systems limited to candidate post-ranking, homepage top item ranking, ad creative selection, or online model selection (A/B test). In this paper, we introduce two simple but effective hierarchical CB algorithms to make a classical CB model (such as LinUCB and Thompson Sampling) capable to explore users' interest in the entire item space without limiting it to a small item set. We first construct a hierarchy item tree via a bottom-up clustering algorithm to organize items in a coarse-to-fine manner. Then we propose a hierarchical CB (HCB) algorithm to explore users' interest in the hierarchy tree. HCB takes the exploration problem as a series of decision-making processes, where the goal is to find a path from the root to a leaf node, and the feedback will be back-propagated to all the nodes in the path. We further propose a progressive hierarchical CB (pHCB) algorithm, which progressively extends visible nodes which reach a confidence level for exploration, to avoid misleading actions on upper-level nodes in the sequential decision-making process. Extensive experiments on two public recommendation datasets demonstrate the effectiveness and flexibility of our methods.
... Adelman and Uçkun (2019), for example, report a cost reduction of between 35% and 41% for smart homes obtaining their electricity based on time-varying prices instead of static prices. Misra et al. (2019) propose a dynamic price experimentation policy that enables price managers to learn about consumers' demand at low opportunity costs. They demonstrate that their approach can increase profits by up to 43%. ...
Article
Purpose The purpose of this paper is to (1) investigate the effect of freshness on consumers' willingness to pay, (2) derive static and dynamic pricing strategies and (3) compare the effect of these pricing strategies on a retailer's revenue and food waste. This investigation helps to reveal the potentials of dynamic pricing strategies for building more sustainable business models. Design/methodology/approach The authors conduct an online experiment to measure consumers' willingness to pay for fresh and three-days’ old strawberries. The impact of freshness on willingness to pay is analysed using univariate tests and regression analysis. Pricing strategies are compared using a Monte Carlo simulation. Findings The results of this study show that freshness largely determines consumers' willingness to pay and price sensitivity. This renders dynamic pricing a promising strategy from an economic point of view. The results of the simulation study show that food waste can be reduced by up to 53.6% with a dynamic pricing instead of a static pricing strategy in the case that there are as many consumers as strawberry packages in the inventory. Revenue can be increased by up to 10% compared to a static pricing strategy based on fresh strawberries. Practical implications This study suggests that food retailers can improve their revenue when switching from static to dynamic pricing. Furthermore, in most cases, food retailers can reduce food waste with a dynamic instead of a static-pricing strategy, which might help to improve their image through a more sustainable business model and attract additional consumers. Originality/value This study is the first to analyse the possibility of using food freshness to design a dynamic pricing strategy and to analyse the impact of such a pricing strategy on both, a retailer's revenue and a retailer's food waste.
... Furthermore, Besbes and Zeevi (2011); den Boer (2015b); Keskin and Zeevi (2017) investigated the time-varying unknown demand setting. In addition, the Upper Confidence Bound (UCB) idea (Auer et al., 2002;Abbasi-Yadkori et al., 2011) has been used in different non-contextual instances (Kleinberg and Leighton, 2003;Misra et al., 2019;Wang et al., 2021). However, all these approaches do not incorporate the covariates into pricing policy. ...
Preprint
Full-text available
Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer's valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer's true valuation, but instead needs to learn the valuation by leveraging contextual information and historical binary purchase feedbacks. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, where a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sub-linear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto-loan dataset.
... Early applications were on optimizing webpage content suggestion such as news articles, advertisement, and marketing messages [4,5]. Nowadays, its applications have been extended to dynamic pricing [6], revenue management [7], inventory buying [8], as well as recommendation of various contents such as skills through virtual assistants [9]. ...
Preprint
Full-text available
The rich body of Bandit literature not only offers a diverse toolbox of algorithms, but also makes it hard for a practitioner to find the right solution to solve the problem at hand. Typical textbooks on Bandits focus on designing and analyzing algorithms, and surveys on applications often present a list of individual applications. While these are valuable resources, there exists a gap in mapping applications to appropriate Bandit algorithms. In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms. Instead of providing a comprehensive overview, we focus on a small number of key decision points related to reward, action, and features, which often affect how Bandit algorithms are chosen in practice.
... Price-based differential treatment, or price inequality, is a form of price discrimination which is intended to take advantage of different consumers' individual price acceptance, with the objective of exploiting the consumer surplus [83]. This pricing tactic, which is also considered a form of dynamic pricing, has been widely practiced in the airline and hotel industries and is being increasingly adopted in most online retailing sectors [22,57]. Current technological developments allow online retailers to tailor a unique price for each individual, which helps them to better manage demand and increase profits while offering some benefits for customers too, such as the possibility of adapting to different preferences, needs and budgets [1,23]. ...
Article
Full-text available
This research examines the personality, cognitive and emotional antecedents of deceptive price perceptions that occur in price inequalities. We draw on appraisal theories to examine the extent to which these relationships are different depending on two situations: consumers who are exposed to an advantaged situation (lower price) and those exposed to a disadvantaged situation (higher price). Data from 994 individuals in the online hotel booking context show that the direction of the price inequality significantly influences the way in which both personality and the attributional–emotional process affect perceptions of deceptive pricing. Our findings provide a better understanding of this subjective, complex, but also increasingly prevalent phenomena of price inequality and perceived deceptive pricing in online retailing. Implications for theory and management are discussed.
... The complexity of the pricing task is increased by real-time price variation based on fluctuating demand. In a real-time scenario, a multiarmed bandit algorithm based on artificial intelligence can dynamically adjust price (Misra et al., 2019). In a scenario where pricing changes frequently, such as an e-commerce portal, Bayesian inference in a machine learning algorithm can quickly adjust price points to match the price of a competitor (Bauer & Jannach, 2018). ...
Article
Full-text available
Disruptive technologies like the internet of things, big data analytics, blockchain, and artificial intelligence have transformed how businesses operate. Artificial intelligence (AI) is the most recent technological disruptor and has enormous marketing transformation potential. Practitioners all over the world are attempting to determine which AI solutions are best suited to their marketing needs. A systematic literature review, on the other hand, can highlight the importance of artificial intelligence (AI) in marketing and point the way for future research. The goal of this study is to provide a comprehensive review of AI in marketing by analysing extant literature published between 1982 and 2020 using bibliometric, conceptual, and intellectual network analysis. The performance of the scientific actors, such as the most relevant authors and sources, was identified through a comprehensive review of 1,584 papers. Furthermore, the conceptual and intellectual network was revealed through co-citation and co-occurrence analysis. The Louvain algorithm was used to cluster data and identify research sub-themes and future research directions in order to expand AI in marketing.
... The multi-armed bandit (MAB) problem has developed into an extremely fruitful area of both research and practice in the past two decades. Modern applications are now numerous and span diverse areas ranging from personalized online advertising and news article recommendation [24,10], to dynamic pricing and portfolio management [29,13,25], to mobile health and personalized medicine [31,5], and the list is constantly growing. Along with the widespread deployment of bandit algorithms, there has recently been a dramatic increase in theoretical bandit research, which has almost exclusively focused on optimizing finite-horizon regret bounds, either in expectation or with high probability. ...
Preprint
We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime where the gaps between arm means scale as $1/\sqrt{n}$ with the time horizon $n$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDEs and random ODEs. As $n \to \infty$, we show that the dynamics converge weakly to solutions of the corresponding SDEs and random ODEs. (Recently, Wager and Xu (arXiv:2101.09855) independently proposed this regime and developed similar SDE and random ODE approximations.) Our weak convergence theory covers both the classical multi-armed and linear bandit settings, and can be used, for instance, to obtain insight about the characteristics of the regret distribution when there is information sharing among arms, as well as the effects of variance estimation, model mis-specification and batched updates in bandit learning. Our theory is developed from first-principles and can also be adapted to analyze other sampling-based bandit algorithms.
... The complexity of the pricing task is increased by real-time price variation based on fluctuating demand. In a real-time scenario, a multiarmed bandit algorithm based on artificial intelligence can dynamically adjust price (Misra et al., 2019). In a scenario where pricing changes frequently, such as an e-commerce portal, Bayesian inference in a machine learning algorithm can quickly adjust price points to match the price of a competitor (Bauer & Jannach, 2018). ...
Article
Full-text available
Disruptive technologies like the internet of things, big data analytics, blockchain, and artificial intelligence have transformed how businesses operate. Artificial intelligence (AI) is the most recent technological disruptor and has enormous marketing transformation potential. Practitioners all over the world are attempting to determine which AI solutions are best suited to their marketing needs. A systematic literature review, on the other hand, can highlight the importance of artificial intelligence (AI) in marketing and point the way for future research. The goal of this study is to provide a comprehensive review of AI in marketing by analysing extant literature published between 1982 and 2020 using bibliometric, conceptual, and intellectual network analysis. The performance of the scientific actors, such as the most relevant authors and sources, was identified through a comprehensive review of 1,584 papers. Furthermore, the conceptual and intellectual network was revealed through co-citation and co-occurrence analysis. The Louvain algorithm was used to cluster data and identify research sub-themes and future research directions in order to expand AI in marketing. Summary Statement of Contribution Because of its practical importance in current and future business, Artificial Intelligence (AI) in Marketing has gained traction. Because of the broad scope and extensive coverage of research studies on AI in marketing, a meta-synthesis of existing studies is critical for determining future research directions. The systematic literature review has been attempted, but existing reviews are descriptive, and the latent intellectual network structure has remained unexplored. To identify research subthemes, trending topics, and future research directions, researchers used bibliometric analysis, conceptual network analysis, and intellectual network analysis.
... Feit and Berman, 2019) will have a negative impact on the company's profitability during the test period, so-called adaptive testing is recommended (cf. Misra et al., 2019). This will continuously optimize the allocation of test subjects to the test conditions in terms of effectiveness-for example, maximizing the profit or the number of conversions. ...
Article
Purpose The purpose of this study is to explain how the application of fuzzy-set qualitative comparative analysis (fsQCA) and experiments can advance theory development in the field of servitization by generating better causal explanations. Design/methodology/approach FsQCA and experiments are established research methods that are suited for developing causal explanations but are rarely utilized by servitization scholars. To support their application, we explain how fsQCA and experiments represent distinct ways of developing causal explanations, provide guidelines for their practical application and highlight potential application areas for a future research agenda in the servitization domain. Findings FsQCA enables specification of cause–effects relationships that result in equifinal paths to an intended outcome. Experiments have the highest explanatory power and enable the drawing of direct causal conclusions through reliance on an interventionist logic. Together, these methods provide complementary ways of developing and testing theory when the research objective is to understand the causal pathways that lead to observed outcomes. Practical implications Applications of fsQCA help to explain to managers why there are numerous causal routes to attaining an intended outcome from servitization. Experiments support managerial decision-making by providing definitive “yes/no” answers to key managerial questions that address clearly specified cause–effect relationships. Originality/value The main contribution of this study is to help advance theory development in servitization by encouraging greater methodological plurality in a field that relies primarily on the qualitative case study methodology.
... Recently, Misra et al. (2019) consider the case where sellers must decide, on real-time, prices for a large number of item, with incomplete demand information. Using experiments, the seller learns about the demand curve and the profitmaximizing price. ...
Article
Full-text available
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy – a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.
Article
Resource flexibility and dynamic pricing are effective strategies in mitigating uncertainties in production systems but have yet to be explored in relation to the improvement of field operations services. We investigate the value of dynamic pricing and flexible allocation of resources in the field service operations of a regulated monopoly providing two services: installations (paid-for) and maintenance (free). We study the conditions under which the company can improve service quality and the profitability of field services by introducing dynamic pricing for installations and the joint management of the resources allocated to paid-for (with a relatively stationary demand) and free (with seasonal demand) services when there is an interaction between quality constraints (lead time) and the flexibility of resources (overtime workers at extra cost). We formalize this problem as a contextual multi-armed bandit problem to make pricing decisions for the installation services. A bandit algorithm can find the near-optimal policy for joint management of the two services independently of the shape of the unobservable demand function. The results show that (i) dynamic pricing and resource management increase profitability; (ii) regulation of the service window is needed to maintain quality; (iii) under certain conditions, dynamic pricing of installation services can decrease the maintenance lead time; (iv) underestimation of demand is more detrimental to profit contribution than overestimation.
Article
We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics, where an unknown exogenous global Markov process governs the rewards distribution of each arm. Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule, which is non-identical among different arms. At each time, a player chooses an arm out of $N$ arms to play, and receives a random reward from a finite set of reward states. The arms are restless, that is, their local state evolves regardless of the player's actions. Motivated by recent studies on related RMAB settings, the regret is defined as the reward loss with respect to a player that knows the dynamics of the problem, and plays at each time $t$ the arm that maximizes the expected immediate value. The objective is to develop an arm-selection policy that minimizes the regret. To that end, we develop the Learning under Exogenous Markov Process (LEMP) algorithm. We analyze LEMP theoretically and establish a finite-sample bound on the regret. We show that LEMP achieves a logarithmic regret order with time. We further analyze LEMP numerically and present simulation results that support the theoretical findings and demonstrate that LEMP significantly outperforms alternative algorithms.
Article
We present an empirical framework for creating dynamic coupon targeting strategies using deep reinforcement learning.
Article
The multi-armed bandit problem refers to the task of sequentially assigning treatments to experimental units so as to identify the best treatment(s) while controlling the regret, or opportunity cost, of exploration. A standard criterion for multi-armed bandit algorithms is control of expected regret, but this criterion is insufficient for many practical problems. Another criterion that should be considered is control of the algorithm replication variance of regret. However, an accessible framework does not currently exist for constructing multi-armed bandit algorithms that control both criteria. We develop such a framework based on the two elementary concepts of dismemberment of treatments and a designed learning phase prior to dismemberment. These concepts can be incorporated into existing multi-armed bandit algorithms to effectively yield new algorithms that better control the expectation and variance of regret. We demonstrate the utility of our framework by constructing new variants of the Thompson sampler that involve a small number of simple tuning parameters. As we illustrate in empirical studies, these new algorithms are implemented in a straightforward manner and achieve improved control of both regret criteria compared to the traditional Thompson sampler. Ultimately, our consideration of additional criteria besides expected regret illuminates novel insights into multi-armed bandit problems.
Article
In this work we consider a seller who sells an item via second-price auctions with a reserve price. By controlling the reserve price, the seller can influence the revenue from the auction, and in this paper, we propose a method for learning optimal reserve prices. We study a limited information setting where the probability distribution of the bids from bidders is unknown and the values of the bids are not revealed to the seller. Furthermore, we do not assume that the seller has access to a historical data set with bids. Our main contribution is a method that incorporates knowledge about the rules of second-price auctions into a multiarmed bandit framework for optimizing reserve prices in our limited information setting. The proposed method can be applied in both stationary and nonstationary environments. Experiments show that the proposed method outperforms state-of-the-art bandit algorithms. In stationary environments, our method outperforms these algorithms when the horizon is short and performs as good as they do for longer horizons. Our method is especially useful if there is a high number of potential reserve prices. In addition, our method adapts quickly to changing environments and outperforms state-of-the-art bandit algorithms designed for nonstationary environments. Summary of Contribution: A key challenge in online advertising is the pricing of advertisements in online auctions. The scope of our study is second-price auctions with a focus on the reserve price optimization problem from a seller’s point of view. This problem is motivated by the real-life practice of small and medium-sized web publishers. However, the proposed solution approach is applicable to any seller who sells an item via second-price auctions and wants to optimize its reserve price during these auctions. Our solution approach is based on techniques from machine learning and operations research, and it would be beneficial especially for sellers who start the selling process without any historical data and can collect the data on the outcomes of the auctions while making reserve price decisions over time. History: Accepted by RamRamesh, Area Editor for Data Science & Machine Learning. Supplemental Material: The supplementary material is available at https://doi.org/10.1287/ijoc.2022.1199 .
Article
Even though base-stock policies are per se straightforward, determining them in complex, stochastic multi-echelon supply chains is often cumbersome or even analytically impossible. Therefore, a wide range of heuristics has been proposed for this purpose. This is the first study considering the problem as a multi-armed bandit problem. In this context, we investigate two algorithms: first, we propose an approach that is based on upper confidence bounds and priority queues. This so-called PQ-UCB algorithm allows us to drastically reduce the runtime of upper confidence bound allocation strategies in problems with large action spaces. Subsequently, we apply the parameter-free sequential halving (SH) algorithm. We investigate various scenarios to compare the performance of both algorithms with the performance of a genetic algorithm and a simulated annealing algorithm taken from the literature. PQ-UCB as well as SH outperform both benchmark metaheuristics and require substantially less effort related to parameter tuning (or even no effort in the case of SH). As multi-armed bandits are not common in inventory optimisation so far, we aim to emphasise their strengths and hope to promote their dissemination also in other domains of supply chain management.
Article
Dynamic pricing has great potential to increase retailers’ profits, but it also creates a risk of negative customer reactions. This paper examines whether and how price discount displays might mitigate negative consequences of dynamic pricing. The results of five studies indicate that when confronted with dynamic pricing customers react negatively due to norm violation, in terms of their perceptions of pricing transparency, price fairness, and value, as well as their purchase intention. However, if retailers display prices as sufficiently high discounts, they can mitigate these negative reactions. The results also suggest that customers tend to focus on the general price-setting process when evaluating a specific transaction in a dynamic pricing context. Therefore, it is necessary to consider both distributive and procedural fairness to explain and predict customers’ reactions to dynamic prices. Further, the results point to how managers may be able to successfully implement dynamic pricing, using price discount displays.
Article
Problem definition: We consider the problem of demand learning and pricing for retailers who offer assortments of substitutable products that change frequently, for example, due to limited inventory, perishable or time-sensitive products, or the retailer’s desire to frequently offer new styles. Academic/practical relevance: We are one of the first to consider the demand learning and pricing problem for retailers who offer product assortments that change frequently, and we propose and implement a learn-then-earn algorithm for use in this setting. Our algorithm prioritizes a short learning phase, an important practical characteristic that is only considered by few other algorithms. Methodology: We develop a novel demand learning and pricing algorithm that learns quickly in an environment with varying assortments and limited price changes by adapting the commonly used marketing technique of conjoint analysis to our setting. We partner with Zenrez, an e-commerce company that partners with fitness studios to sell excess capacity of fitness classes, to implement our algorithm in a controlled field experiment to evaluate its effectiveness in practice using the synthetic control method. Results: Relative to a control group, our algorithm led to an expected initial dip in revenue during the learning phase, followed by a sustained and significant increase in average daily revenue of 14%–18% throughout the earning phase, illustrating that our algorithmic contributions can make a significant impact in practice. Managerial implications: The theoretical benefit of demand learning and pricing algorithms is well understood—they allow retailers to optimally match supply and demand in the face of uncertain preseason demand. However, most existing demand learning and pricing algorithms require substantial sales volume and the ability to change prices frequently for each product. Our work provides retailers who do not have this luxury a powerful demand learning and pricing algorithm that has been proven in practice.
Book
Full-text available
Çocukların ve ailelerin korunması için maddi açıdan desteklenmeleri ailelerin ayakta kalmasına yardımcı olabilmektedir. Sosyal koruma kapsamında çocuk yardımları, eğitim yardımları, doğum yardımı, kreş-bakımevi ve çocuk bakıcısı yardımı, çocuk parası verilmesi, çocuk zammı, ebeveyn parası, geçim avansı, aile destek yardımları ve tek başına yaşayan ebeveyn yardımları gibi pek çok başlık altında Dünya üzerindeki ülkelerde farklı şekillerde farklı yardımlar yapılabilmektedir. Özellikle, gelişmiş ülkelerde ve Avrupa Birliği’nde diğer ülkelere nazaran çok daha fazla aileyi koruyucu önlemlere rastlanılmaktadır. Aileyi koruyucu önlemler Uluslararası Çalışma örgütünün 102 Sayılı Kararı çerçevesinde toplanmaktadır. Sosyal politika bağlamında, sosyal güvenliğin tabana yayılması ve tüm toplumu kapsayacak şekilde işlev kazanabilmesi için ve aynı zamanda toplumsal ve sosyolojik olarak da büyük önemi olan ailenin, korunması gerekmektedir. Bu nedenle, bazı ülkelerde aile yardımı ile aile sigortasını sınırları tam ayrılamamasına karşın, aile sigortası konusunda sürekli olarak çalışmalar ve iyileştirmeler devam etmektedir. Aile yardımları toplumsal düzenin korunması ve toplumsal düzen ile birlikte ekonomik refahın artırılması amacıyla büyük önem taşıyan bir konudur. Bu çalışmada öncelikle sosyal güvenliğin kısa tarihçesine değinilmiş, sonrasında aile kavramı açıklanarak ailenin toplumsal önemi irdelenmiş daha sonra Avrupa’da aile yardımları ve bu bağlamda yürütülmekte olan aile yardım türlerine değinilmiştir.
Article
The authors empirically examine how firms learn to set prices in a new market. The 2012 privatization of off-premise liquor sales in Washington State created a unique opportunity to observe retailers learn to set prices from the point at which their learning process began. Tracking this market as it evolved through time, the authors find that firms indeed learn to set more profitable prices, that these prices increasingly reflect demand fundamentals, and they ultimately converge to levels consistent with (static) profit maximization. The paper further demonstrates that initial pricing mistakes are largest for products whose demand conditions differ the most from those of previously privatized markets, that retailers with previous experience in the category are initially better-informed, and that learning is faster for products with more precise sales information. These findings indicate that firm behavior converges to rational models of firm conduct, but also reveal that such convergence takes time to unfold and play out differently for different firms. These patterns suggest important roles for both firm learning and heterogeneous firm capabilities.
Article
This research aims at providing a new model of consumers’ personal space to limit the spread of contagious disease while shopping in person. To this end, it adopts an agents-based simulation approach to model consumers’ movements in the store during COVID-19 pandemic. Findings show the extent to which consumers’ contacts with others increase the risk of contagion, due to the occurrence of social gatherings in certain areas. Specifically, there is a linear correlation between the number of consumers in the store and the number of consumers susceptible to contract the disease. Thus, the personal space from a psychological perception becomes an individual and compulsory boundary to protect consumers from contagious disease. Finally, our results extend the concept of social distance and personal space while shopping, and support retailers to provide safer shopping experiences.
Article
Crowdsensing gradually forms a big data market where workers are willing to trade reusable data with different data collectors. It is challenging for the data collector to choose the transaction party due to the changeable value of the data, while determining the transaction price is also a tough issue. In this paper, we research the dynamic data transaction in crowdsensing. The contribution of the new data to the collector is modeled as the Shapley value, with each worker as a player in the cooperative game. The data collector then judges the contribution of the worker and determines the transaction object. To maximum the payoff in the transaction, the collector will dynamically adjust the offering price to workers. The contextual bandit model is utilized in the price decision, with each candidates price as an arm and the time-variant data value as the context. Based on the classic LinUCB learning policy, we learn the mapping of the observed data value and the reward, and estimate the optimal reward in current transaction. The simulation on the data demonstrate that the actual reward got by the collector is close to the maximum reward he can get, which verifies the effectiveness of our scheme.
Chapter
Full-text available
Financial support for the protection of children and families can help families survive. Especially in developed countries and the European Union, more family protective measures are encountered than in other countries. Family protective measures are collected within the framework of the International Labor Organization's Resolution No. 102. In the context of social policy, it is necessary to protect the family, which is of great social and sociological importance, in order for social security to spread to the base and to function in a way that covers the whole society. For this reason, although the borders between family assistance and family insurance cannot be fully separated in some countries, studies and improvements are continuing on family insurance.
Article
Customer concentration inside a store is of pivotal importance for retail management, acquiring controversial contributions about the best number of consumers in the floor space to ensure an enjoyable and pleasant experience. Indeed, the excessive concentration of people (crowd) might discourage from shopping in that location, while on the other hand, a certain traffic to the store generates profit for retailers. The aim of this paper is to support retailers’ informed decisions by refining our understanding of the extent to which store layouts influences consumer density. To this end, we conduct a large field study using a unique dataset covering customers in a real grocery store with agent-based simulations. Results clearly show the extent to which this kind of simulations help predicting the changes in store layout able to affect customer density in the areas, while ensuring the same number of individuals.
Article
Full-text available
The topic of dynamic pricing and learning has received a considerable amount of attention in recent years, from different scientific communities. We survey these literature streams: we provide a brief introduction to the historical origins of quantitative research on pricing and demand estimation, point to different subfields in the area of dynamic pricing, and provide an in-depth overview of the available literature on dynamic pricing and learning. Our focus is on the operations research and management science literature, but we also discuss relevant contributions from marketing, economics, econometrics, and computer science. We discuss relations with methodologically related research areas, and identify directions for future research.
Article
Full-text available
Conventional wisdom in marketing holds that retailer forward buying (1) is a consequence of manufacturer trade promotions and (2) stockpiling units helps the retailer but hurts the manufacturer. This paper provides a deeper understanding of forward buying by analyzing it within the context of manufacturer trade promotions, competition and demand uncertainty. We find that regardless of whether the manufacturer offers a trade promotion or not, allowing the retailer to forward buy and hold inventory for the future can, under certain conditions, be beneficial for both parties. Disallowing forward buying by the retailer may lead the manufacturer to lower merchandising requirements and change the depth of the promotion. In competitive environments, there are situations in which retailers engage in forward buying due to competitive pressures in a prisoners’ dilemma situation. Finally, when we consider the case of uncertain demand, we find further evidence of strategic forward buying. In particular, we find cases in which the retailer orders a quantity that is higher than what it expects to sell in even the most optimistic demand scenario.
Article
Full-text available
The price for a product may be set too low, causing the seller to leave money on the table, or too high, driving away potential buyers. Contingent pricing can be useful in mitigating these problems. In contingent pricing arrangements, price is contingent on whether the seller succeeds in obtaining a higher price within a specified period. We show that if the probability of obtaining the high price is not too high, sellers profit from using contingent pricing while economic efficiency increases. The optimal contingent pricing structure depends on the buyer's risk attitude—a deep discount is most profitable if buyers are risk prone. A consolation reward is most profitable if buyers are risk averse. To motivate buyers to participate in a contingent pricing arrangement, the seller must provide sufficient incentives. Consequently, buyers also benefit from contingent pricing. In addition, because the buyers with the highest willingness-to-pay get the product, contingent pricing increases the efficiency of resource allocation.
Article
Full-text available
Considerable theoretical justification for consumers' use of psychological reference points exists from the research literature. From a managerial perspective, one of the most important applications of this concept is reference price, an internal standard against which observed prices are compared. In this paper, we propose three empirical generalizations that are well-supported in the marketing literature. First, there is ample evidence that consumers use reference prices in making brand choices. Second, the empirical results on reference pricing also support the generalization that consumers rely on past prices as part of the reference price formation process. Third, consistent with other research on loss aversion, consumers have been found to be more sensitive to “losses,” i.e. observed prices higher than reference prices, than “gains.” We also propose topics for further research on reference prices.
Article
Full-text available
This article studies the implications of experience curves and brand loyalty for optimal dynamic pricing policy. In a continuous time model, we synthesize several results from the literature on open loop equilibria. Specifically, we show that prices should decrease over time for high discount rates and steeper exogenous declines in variable costs. Conversely, the prices should increase over time if experience curves affect fixed costs and if consumers are brand loyal.
Article
Full-text available
The rapid advance in information technology now makes it feasible for sellers to condition their price offers on consumers’ prior purchase behavior. In this paper we examine when it is profitable to engage in this form of price discrimination when consumers can adopt strategies to protect their privacy. Our baseline model involves rational consumers with constant valuations for the goods being sold and a monopoly merchant who can commit to a pricing policy. Applying results from the prior literature, we show that although it is feasible to price so as to distinguish high-value and low-value consumers, the merchant will never find it optimal to do so. We then consider various generalizations of this model, such as allowing the seller to offer enhanced services to previous customers, making the merchant unable to commit to a pricing policy, and allowing competition in the marketplace. In these cases we show that sellers will, in general, find it profitable to condition prices on purchase history.
Article
Full-text available
Firms in durable good product markets face incentives to intertemporally price discriminate, by setting high initial prices to sell to consumers with the highest willingness to pay, and cutting prices thereafter to appeal to those with lower willingness to pay. A critical determinant of the profitability of such pricing policies is the extent to which consumers anticipate future price declines, and delay purchases. I develop a framework to investigate empirically the optimal pricing over time of a firm selling a durable-good product to such strategic consumers. Prices in the model are equilibrium outcomes of a game played between forward-looking consumers who strategically delay purchases to avail of lower prices in the future, and a forward-looking firm that takes this consumer behavior into account in formulating its optimal pricing policy. The model outlines first, a dynamic model of demand incorporating forward-looking consumer behavior, and second, an algorithm to compute the optimal dynamic sequence of prices given these demand estimates. The model is solved using numerical dynamic programming techniques. I present an empirical application to the market for video-games in the US. The results indicate that consumer forward-looking behavior has a significant effect on optimal pricing of games in the industry. Simulations reveal that the profit losses of ignoring forward-looking behavior by consumers are large and economically significant, and suggest that market research that provides information regarding the extent of discounting by consumers is valuable to video-game firms.
Article
Full-text available
We present a framework to measure empirically the size of indirect network effects in high-technology markets with competing incompatible technology standards. These indirect network effects arise due to inter-dependence in demand for hardware and compatible software. By modeling the joint determination of hardware sales and software availability in the market, we are able to describe the nature of demand inter-dependence and to measure the size of the indirect network effects. We apply the model to price and sales data from the industry for personal digital assistants (PDAs) along with the availability of software titles compatible with each PDA hardware standard. Our empirical results indicate significant indirect network effects. By July 2002, the network effect explains roughly 22% of the log-odds ratio of the sales of all Palm O/S compatible PDA-s to Microsoft O/S compatible PDA-s, where the remaining 78% reflects price and model features. We also use our model estimates to study the growth of the installed bases of Palm and Microsoft PDA hardware, with and without the availability of compatible third party software. We find that lack of third party software negatively impacts the evolution of the installed hardware bases of both formats. These results suggest PDA hardware firms would benefit from investing resources in increasing the provision of software for their products. We then compare the benefits of investments in software with investments in the quality of hardware technology. This exercise helps disentangle the potential for incremental hardware sales due to hardware quality improvement from that of positive feedback due to market software provision.
Book
Full-text available
This important new text and reference for researchers and students in machine learning, game theory, statistics and information theory offers the first comprehensive treatment of the problem of predicting individual sequences. Unlike standard statistical approaches to forecasting, prediction of individual sequences does not impose any probabilistic assumption on the data-generating mechanism. Yet, prediction algorithms can be constructed that work well for all possible sequences, in the sense that their performance is always nearly as good as the best forecasting strategy in a given reference class. The central theme is the model of prediction using expert advice, a general framework within which many related problems can be cast and discussed. Repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems are viewed as instances of the experts' framework and analyzed from a common nonstochastic standpoint that often reveals new and intriguing connections. Old and new forecasting methods are described in a mathematically precise way in order to characterize their theoretical limitations and possibilities.
Article
Full-text available
The benefits of dynamic pricing methods have long been known in industries, such as airlines, hotels, and electric utilities, where the capacity is fixed in the short-term and perishable. In recent years, there has been an increasing adoption of dynamic pricing policies in retail and other industries, where the sellers have the ability to store inventory. Three factors contributed to this phenomenon: (1) the increased availability of demand data, (2) the ease of changing prices due to new technologies, and (3) the availability of decision-support tools for analyzing demand data and for dynamic pricing. This paper constitutes a review of the literature and current practices in dynamic pricing. Given its applicability in most markets and its increasing adoption in practice, our focus is on dynamic (intertemporal) pricing in the presence of inventory considerations.
Article
Full-text available
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Article
Full-text available
We consider a single product revenue management problem where, given an initial inventory, the objective is to dynamically adjust prices over a finite sales horizon to maximize expected revenues. Realized demand is observed over time, but the underlying functional relationship between price and mean demand rate that governs these observations (otherwise known as the demand function or demand curve), is not known. We consider two instances of this problem: i.) a setting where the demand function is assumed to belong to a known parametric family with unknown parameter values; and ii.) a setting where the demand function is assumed to belong to a broad class of functions that need not admit any parametric representation. In each case we develop policies that learn the demand function "on the fly," and optimize prices based on that. The performance of these algorithms is measured in terms of the regret: the revenue loss relative to the maximal revenues that can be extracted when the demand function is known prior to the start of the selling season. We derive lower bounds on the regret that hold for any admissible pricing policy, and then show that our proposed algorithms achieve a regret that is "close" to this lower bound. The magnitude of the regret can be interpreted as the economic value of prior knowledge on the demand function; manifested as the revenue loss due to model uncertainty.
Article
Full-text available
Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be suitable for a risk-averse decision maker. We illustrate some of the results by computer simulations.
Article
Full-text available
Virtual advisors often increase sales for those customers who find such online advice to be convenient and helpful. However, other customers take a more active role in their purchase decisions and prefer more detailed data. In general, we expect that websites are more preferred and increase sales if their characteristics (e.g., more detailed data) match customers' cognitive styles (e.g., more analytic). "Morphing" involves automatically matching the basic "look and feel" of a website, not just the content, to cognitive styles. We infer cognitive styles from clickstream data with Bayesian updating. We then balance exploration (learning how morphing affects purchase probabilities) with exploitation (maximizing short-term sales) by solving a dynamic program (partially observable Markov decision process). The solution is made feasible in real time with expected Gittins indices. We apply the Bayesian updating and dynamic programming to an experimental BT Group (formerly British Telecom) website using data from 835 priming respondents. If we had perfect information on cognitive styles, the optimal "morph" assignments would increase purchase intentions by 21%. When cognitive styles are partially observable, dynamic programming does almost as well—purchase intentions can increase by almost 20%. If implemented system-wide, such increases represent approximately $80 million in additional revenue. Article Full-text available We suggest two new, translation-based methods for estimating and correcting for bias when estimating the edge of a distribution. The first uses an empirical translation applied to the argument of the kernel, in order to remove the main effects of the asymmetries that are inherent when constructing estimators at boundaries. Placing the translation inside the kernel is in marked contrast to traditional approaches, such as the use of high-order kernels, which are related to the jackknife and, in effect, apply the translation outside the kernel. Our approach has the advantage of producing bias estimators that, while enjoying a high order of accuracy, are guaranteed to respect the sign of bias. Our second method is a new bootstrap technique. It involves translating an initial boundary estimate toward the body of the dataset, constructing repeated boundary estimates from data that lie below the respective translations, and employing averages of the resulting empirical bias approximations to estimate the bias of the original estimator. The first of the two methods is most appropriate in univariate cases, and is studied there; the second approach may be used to bias-correct estimates of boundaries of multivariate distributions, and is explored in the bivariate case. Article The authors explore how competition affects dynamic pricing of new products. Dynamics of diffusion, saturation, and cost reduction due to experience are all considered, with emphasis on the last. The competition among firms is modeled as a dynamic Nash equilibrium in which each firm chooses its optimal dynamic strategy, correctly anticipating its rivals’ strategies. The dynamics of price and market share are characterized for an n-firm oligopoly. Empirical examination of price paths across eight products in the semiconductor components industry shows them to be consistent with analytical results. Article Easily implemented dynamic models for pricing and production or ordering decisions are developed for products or services whose value to the consumer may be changing over time. The models are easily solved by brute force dynamic programming. Implementation examples of models for broadcast pricing are presented. Article Firms using online advertising regularly run experiments with multiple versions of their ads since they are uncertain about which ones are most effective. During a campaign, firms try to adapt to intermediate results of their tests, optimizing what they earn while learning about their ads. Yet how should they decide what percentage of impressions to allocate to each ad? This paper answers that question, resolving the well-known “learn-and-earn” trade-off using multi-armed bandit (MAB) methods. The online advertiser’s MAB problem, however, contains particular challenges, such as a hierarchical structure (ads within a website), attributes of actions (creative elements of an ad), and batched decisions (millions of impressions at a time), that are not fully accommodated by existing MAB methods. Our approach captures how the impact of observable ad attributes on ad effectiveness differs by website in unobserved ways, and our policy generates allocations of impressions that can be used in practice. We implemented this policy in a live field experiment delivering over 750 million ad impressions in an online display campaign with a large retail bank. Over the course of two months, our policy achieved an 8% improvement in the customer acquisition rate, relative to a control policy, without any additional costs to the bank. Beyond the actual experiment, we performed counterfactual simulations to evaluate a range of alternative model specifications and allocation rules in MAB policies. Finally, we show that customer acquisition would decrease by about 10% if the firm were to optimize click-through rates instead of conversion directly, a finding that has implications for understanding the marketing funnel. Data is available at https://doi.org/10.1287/mksc.2016.1023. Article This article was downloaded by: [128.97.27.20] On: 25 May 2016, At: 09:44 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Marketing Science Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org Editorial—Marketing Science and Big Data Pradeep Chintagunta, Dominique M. Hanssens, John R. Hauser To cite this article: Pradeep Chintagunta, Dominique M. Hanssens, John R. Hauser (2016) Editorial—Marketing Science and Big Data. Marketing Science 35(3):341-342. http://dx.doi.org/10.1287/mksc.2016.0996 Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact permissions@informs.org. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2016, INFORMS Please scroll down for article—it is on subsequent pages INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org Article Researchers and practitioners devote substantial effort to targeting banner advertisements to consumers, but they focus less effort on how to communicate with consumers once targeted. Morphing enables a website to learn, automatically and near optimally, which banner advertisements to serve to consumers to maximize click-through rates, brand consideration, and purchase likelihood. Banners are matched to consumers based on posterior probabilities of latent segment membership, which are identified from consumers' clickstreams. This paper describes the first large-sample random-assignment field test of banner morphing—more than 100,000 consumers viewed more than 450,000 banners on CNET.com. On relevant Web pages, CNET's click-through rates almost doubled relative to control banners. We supplement the CNET field test with an experiment on an automotive information-and-recommendation website. The automotive experiment replaces automated learning with a longitudinal design that implements morph-to-segment matching. Banners matched to cognitive styles, as well as the stage of the consumer's buying process and body-type preference, significantly increase click-through rates, brand consideration, and purchase likelihood relative to a control. The CNET field test and automotive experiment demonstrate that matching banners to cognitive-style segments is feasible and provides significant benefits above and beyond traditional targeting. Improved banner effectiveness has strategic implications for allocations of budgets among media. Article A leading explanation in the economic literature is that monetary policy has real effects on the economy because firms incur a cost when changing prices. Using a unique database of cost and retail price changes, we find that variation in menu costs results in up to 13.3% fewer price increases. We confirm that these effects are allocative and have a persistent impact on both prices and unit sales. We provide evidence that the menu cost channel operates only when cost increases are small in magnitude, which is consistent with theory and provides the first empirical evidence of boundary conditions. © 2015 The President and Fellows of Harvard College and the Massachusetts Institute of Technology 2015. Article This article presents a model of the design and introduction of a product line when the firm is uncertain about consumer valuations for the products. We find that product line introduction strategy depends on this uncertainty. Specifically, under low levels of uncertainty the firm introduces both models during the first period; under higher levels of uncertainty, the firm prefers sequential introduction and delays design of the second product until the second period. Under intermediate levels of uncertainty the firm's first product should be of lower quality than one produced by a myopic firm that does not take product line effects into consideration. We find that when the firm introduces a product sequentially, the strategy might depend on realized demand. For example, if realized demand is high, the firm's second product should be a higher-end model; if demand turns out to be low, the firm's second product should be a lower-end model or replace the first product with a lower-end model. Article We consider a robust version of the classic problem of optimal monopoly pricing with incomplete information. In the robust version, the seller faces model uncertainty and only knows that the true demand distribution is in the neighborhood of a given model distribution. We characterize the optimal pricing policy under two distinct, but related, decision criteria with multiple priors: (i) maximin expected utility and (ii) minimax expected regret. The resulting optimal pricing policy under either criterion yields a robust policy to the model uncertainty. While the classic monopoly policy and the maximin criterion yield a single deterministic price, minimax regret always prescribes a random pricing policy, or equivalently, a multi-item menu policy. Distinct implications of how a monopolist responds to an increase in uncertainty emerge under the two criteria. Article We consider a non-Bayesian infinite horizon version of the multi-armed bandit problem with the objective of designing simple policies whose regret increases slowly with time. In their seminal work on this problem, Lai and Robbins had obtained an O(logn) lower bound on the regret with a constant that depends on the Kullback-Leibler number. They also constructed policies for some specific families of probability distributions (including exponential families) that achieved the lower bound. In this paper, we construct index policies that depend on the rewards from each arm only through their sample mean. These policies are computationally much simpler and are also applicable much more generally. They achieve an O(logn) regret with a constant that is also based on the Kullback-Leibler number. This constant turns out to be optimal for one-parameter exponential families; however, in general it is derived from the optimal one via a ‘contraction’ principle. Our results rely entirely on a few key lemmas from the theory of large deviations. Book An overview of statistical decision theory, which emphasizes the use and application of the philosophical ideas and mathematical structure of decision theory. The text assumes a knowledge of basic probability theory and some advanced calculus is also required. Article This paper unifies and extends the recent axiomatic literature on minimax regret. It compares several models of minimax regret, shows how to characterize the according choice correspondences in a unified setting, extends one of them to choice from convex sets, and connects them by defining a behavioral notion of perceived ambiguity. Substantively, a main idea is to behaviorally identify ambiguity with failures of independence of irrelevant alternatives. Regarding proof technique, the core contribution is to uncover a dualism between choice correspondences and preferences in an environment where this dualism is not obvious. This insight can be used to generate results by importing findings from the existing literature on preference orderings. Article This paper considers the relationship between pricing and ordering decisions for a monopolistic retailer facing a known demand function where, over the inventory cycle, the product may exhibit: (i) physical decay or deterioration of inventory called wastage; and (ii) decrease in market value called value drop associated with each unit of inventory on hand. The retailer is allowed to continuously vary the selling price of the product over the cycle. We introduce a notion of instantaneous margin, and use it to derive profit maximizing conditions for the retailer. The model explains the markdown of retail goods subject to decay. It also provides guidance in determining when price changes during the cycle are worthwhile due to product aging, how often such changes should be made, and how such changes affect ordering intervals and quantities. Article This paper considers the problem of pricing a new product in a market having competing products of different qualities and market penetration levels, as measured by the cumulative number of units sold. Each customer type selects his optimal product based on maximizing consumer surplus. Pricing policies for a new product are determined for the seller based on cumulative profit maximization without discounting. An example is solved in detail for two demand function forms. Article This paper considers the decision problem of a firm that is uncertain about the demand, and hence profitability, of a new product. We develop a model of a decision maker who sequentially learns about the true product profitability from observed product sales. Based on the current information, the decision maker decides whether to scrap the product. Central to this decision problem are sequential information gathering, and the option value of scrapping the product at any point in time. The model predicts the optimal demand for information (e.g., in the form of test marketing), and it predicts how the launch or exit policy depends on the firm's demand uncertainty. Furthermore, it predicts what fraction of newly developed products should be launched on average, and what fraction of these products will “fail,” i.e., exit. The model is solved using numerical dynamic programming techniques. We present an application of the model to the case of the U.S. ready-to-eat breakfast cereal industry. Simulations show that the value of reducing uncertainty can be large, and that under higher uncertainty firms should strongly increase the fraction of all new product opportunities launched, even if their point estimate of profits is negative. Alternative, simpler decision rules are shown to lead to large profit losses compared to our method. Finally, we find that the high observed exit rate in the U.S. ready-to-eat cereal industry is optimal and to be expected based on our model. Article The advent of optical scanning devices and decreases in the cost of computing power have made it possible to assemble databases with sales and marketing mix information in an accurate and timely manner. These databases enable the estimation of demand functions and pricing/promotion decisions in “real” time. Commercial suppliers of marketing research like A. C. Nielsen and IRI are embedding estimated demand functions in promotion planning and pricing tools for brand managers and retailers. This explosion in the estimation and use of demand functions makes it timely and appropriate to re-examine several fundamental issues. In particular, demand functions are latent theoretical constructs whose exact parametric form is unknown. Estimates of price elasticities, profit maximizing prices, inter-brand competition and other policy implications are conditional on the parametric form assumed in estimation. In practice, many forms may be found that are not only theoretically plausible but also consistent with the data. The different forms could suggest different profit maximizing prices leaving it unclear as to what is the appropriate pricing action. Specification tests may lack the power to resolve this uncertainty, particularly for non-nested comparisons. Also, the structure of these tests does not permit seamless integration of estimation, specification analysis and optimal pricing into a unified framework. As an alternative to the existing approaches, I propose a Bayesian mixture model (BMM) that draws on Bayesian estimation, inference, and decision theory, thereby providing a unified framework. The BMM approach consists of input, estimation, diagnostic and optimal pricing modules. In the input module, alternate parametric models of demand are specified along with priors. Utility structures representing the decision maker's attitude towards risk can be explicitly specified. In the estimation module, the inputs are combined with data to compute parameter estimates and posterior probabilities for the models. The diagnostic module involves testing the statistical assumptions underlying the models. In the optimal pricing module the estimates and posterior probabilities are combined with the utility structure to arrive at optimal pricing decisions. Formalizing demand uncertainty in this manner has many important payoffs. While the classical approaches emphasize choosing a demand specification, the BMM approach emphasizes constructing an objective function that represents a mixture of the specifications. Hence, pricing decisions can be arrived at even when there is no consensus among the different parametric specifications. The pricing decisions will reflect parametric demand uncertainty, and hence be more robust than those based on a single demand model. The BMM approach was empirically evaluated using store level scanner data. The decision context was the determination of equilibrium wholesale prices in a noncooperative game between several leading national brands. Retail demand was parametrized as semilog and doublelog with diffuse priors for the models and the parameters. Wholesale demand functions were derived by incorporating the retailers' pricing behavior in the retail demand function. Utility functions reflecting risk averse and risk neutral decision makers were specified. The diagnostic module confirms that face validity measures, residual analysis, classical tests or holdout predictions were unable to resolve the uncertainty about the parametric form and by implication the uncertainty with regard to pricing decisions. In contrast, the posterior probabilities were more conclusive and favored the specification that predicted better in a holdout analysis. However, across the brands, they lacked a systematic pattern of updating towards any one specification. Also, none of the priors updated to zero or one, and there was considerable residual uncertainty about the parametric specification. Despite the residual uncertainty, the BMM approach was able to determine the equilibrium wholesale prices. As expected, specifications influence the BMM pricing solutions in accordance with their posterior probabilities which act as weights. In addition, differing attitudes towards risk lead to considerable divergence in the pricing actions of the risk averse and the risk neutral decision maker. Finally, results from a Monte Carlo experiment suggest that the BMM approach performs well in terms of recovering potential improvements in profits. Article We investigate the firm's dynamic nonlinear pricing problem when facing consumers whose tastes vary according to a scalar index. We relax the standard assumption that the firm knows the distribution of this index. In general the firm should determine its marginal price schedule as if it were myopic, and produce information by lowering the price schedule; “bunching” consumers at positive purchase levels should be avoided. As a special case we also consider a market characterized by homogeneous consumers with a static, but unknown, demand curve. We show that when there are repeat purchases the forward-looking firm should tend towards penetration pricing; otherwise its strategy should tend towards skimming. We extend our insights to more general settings and discuss implications for pricing product lines. Article Learning curve effects, aspects of consumer demand models (e.g., reservation price distributions, intertemporal utility maximizing behavior), and competitive activity are reasons which have been offered to explain why prices of new durables decline over time. This paper presents an alternative rationale based on the buying behavior for products with overlapping replacement cycles (i.e., next generation products). A model for consumer sales of a new durable is developed by incorporating the replacement behavior of a previous generation product. Pricing strategies for two product generations are investigated analytically and with numerical methods. Results indicate that durable replacement behavior leads to a wider set of optimal pricing strategies than previously obtained. Several empirical illustrations of industry pricing practices for successive product generations are also shown to be consistent with the theoretical results. Finally, various areas for future research are outlined. Article This paper provides a mathematical framework for modeling demand and determining optimal price schedules in markets which have demand externalities and can sustain nonlinear pricing. These fundamental economic concepts appear in the marketplace in the form of mutual buyers' benefits and quantity discounts. The theory addressing these aspects is relevant to a wide variety of goods and services. Examples include tariffs for electronic communications services, pricing of franchises, and royalty fees for copyrighted material and patents. This paper builds on several previous results from microeconomics and extends nonlinear pricing to markets with demand externalities. The implications of this price structure are compared to results obtained for flat rates and two part tariffs in a similar context. A case study is described in which the results were applied to planning the startup of a new electronic communications service. Article We construct two models of the behavior of consumers in an environment where there is uncertainty about brand attributes. In our models, both usage experience and advertising exposure give consumers noisy signals about brand attributes. Consumers use these signals to update their expectations of brand attributes in a Bayesian manner. The two models are (1) a dynamic model with immediate utility maximization, and (2) a dynamic “forward-looking” model in which consumers maximize the expected present value of utility over a planning horizon. Given this theoretical framework, we derive from the Bayesian learning framework how brand choice probabilities depend on past usage experience and advertising exposures. We then form likelihood functions for the models and estimate them on Nielsen scanner data for detergent. We find that the functional forms for experience and advertising effects that we derive from the Bayesian learning framework fit the data very well relative to flexible ad hoc functional forms such as exponential smoothing, and also perform better at out-of-sample prediction. Another finding is that in the context of consumer learning of product attributes, although the forward-looking model fits the data statistically better at conventional significance levels, both models produce similar parameter estimates and policy implications. Our estimates indicate that consumers are risk-averse with respect to variation in brand attributes, which discourages them from buying unfamiliar brands. Using the estimated behavioral models, we perform various scenario evaluations to find how changes in marketing strategy affect brand choice both in the short and long run. A key finding obtained from the policy experiments is that advertising intensity has only weak short run effects, but a strong cumulative effect in the long run. The substantive content of the paper is potentially of interest to academics in marketing, economics and decision sciences, as well as product managers, marketing research managers and analysts interested in studying the effectiveness of marketing mix strategies. Our paper will be of particular interest to those interested in the long run effects of advertising. Note that our estimation strategy requires us to specify explicit behavioral models of consumer choice behavior, derive the implied relationships among choice probabilities, past purchases and marketing mix variables, and then estimate the behavioral parameters of each model. Such an estimation strategy is referred to as “structural” estimation, and econometric models that are based explicitly on the consumer's maximization problem and whose parameters are parameters of the consumers' utility functions or of their constraints are referred to as “structural” models. A key benefit of the structural approach is its potential usefulness for policy evaluation. The parameters of structural models are invariant to policy, that is, they do not change due to a change in the policy. In contrast, the parameters of reduced form brand choice models are, in general, functions of marketing strategy variables (e.g., consumer response to price may depend on pricing policy). As a result, the predictions of reduced form models for the outcomes of policy experiments may be unreliable, because in making the prediction one must assume that the model parameters are unaffected by the policy change. Since the agents in our models choose among many alternative brands, their choice probabilities take the form of higher-order integrals. We employ Monte-Carlo methods to approximate these integrals and estimate our models using simulated maximum likelihood. Estimation of the dynamic forward-looking model also requires that a dynamic programming problem be solved in order to form the likelihood function. For this we use a new approximation method based on simulation and interpolation techniques. These estimation techniques may be of interest to researchers and policy makers in many fields where dynamic choice among discrete alternatives is important, such as marketing, decision sciences, labor and health economics, and industrial organization. Article It is well known now that kernel density estimators are not consistent when estimating a density near the finite end points of the support of the density to be estimated. This is due to boundary effects that occur in nonparametric curve estimation problems. A number of proposals have been made in the kernel density estimation context with some success. As of yet there appears to be no single dominating solution that corrects the boundary problem for all shapes of densities. In this paper, we propose a new general method of boundary correction for univariate kernel density estimation. The proposed method generates a class of boundary corrected estimators. They all possess desirable properties such as local adaptivity and non-negativity. In simulation, it is observed that the proposed method perform quite well when compared with other existing methods available in the literature for most shapes of densities, showing a very important robustness property of the method. The theory behind the new approach and the bias and variance of the proposed estimators are given. Results of a data analysis are also given. Article This paper studies how and how much active experimentation is used in discounted or finite-horizon optimization problems with an agent who chooses actions sequentially from a finite set of actions, with rewards depending on unknown parameters associated with the actions. Closed-form approximations are developed for the optimal rules in these ‘multi-armed bandit’ problems. Some refinements and modifications of the basic structure of these approximations also provide a nearly optimal solution to the long-standing problem of incorporating switching costs into multi-armed bandits. Article A class of simple adaptive allocation rules is proposed for the problem (often called the "multi-armed bandit problem") of sampling$x_1, \cdots x_N$sequentially from$k$populations with densities belonging to an exponential family, in order to maximize the expected value of the sum$S_N = x_1 + \cdots + x_N$. These allocation rules are based on certain upper confidence bounds, which are developed from boundary crossing theory, for the$k$population parameters. The rules are shown to be asymptotically optimal as$N \rightarrow \infty$from both Bayesian and frequentist points of view. Monte Carlo studies show that they also perform very well for moderate values of the horizon$N\$.
Article
Corruption in the public sector erodes tax compliance and leads to higher tax evasion. Moreover, corrupt public officials abuse their public power to extort bribes from the private agents. In both types of interaction with the public sector, the private agents are bound to face uncertainty with respect to their disposable incomes. To analyse effects of this uncertainty, a stochastic dynamic growth model with the public sector is examined. It is shown that deterministic excessive red tape and corruption deteriorate the growth potential through income redistribution and public sector inefficiencies. Most importantly, it is demonstrated that the increase in corruption via higher uncertainty exerts adverse effects on capital accumulation, thus leading to lower growth rates.
Article
We consider the problem of pricing a single object when the seller has only minimal information about the true valuation of the buyer. Specifically, the seller only knows the support of the possible valuations and has no further distributional information. The seller is solving this choice problem under uncertainty by minimizing her regret. The pricing policy hedges against uncertainty by randomizing over a range of prices. The support of the pricing policy is bounded away from zero. Buyers with low valuations cannot generate substantial regret and are priced out of the market. We generalize the pricing policy without priors to encompass many buyers and many qualities. (c) 2008 by the European Economic Association.
Article
We present a model of entry and exit with Bayesian learning and price competition. A new product of initially unknown quality is introduced in the market, and purchases of the product yield information on its true quality. We assume that the performance of the new product is publicly observable. As agents learn from the experiments of others, informational externalities arise. We determine the Markov Perfect Equilibrium prices and allocations. In a single market, the combination of the informational externalities among the buyers and the strategic pricing by the sellers results in excessive experimentation. If the new product is launched in many distinct markets, the path of sales converges to the efficient path in the limit as the number of markets grows.
Article
We show that price-matching guarantees can facilitate monopoly pricing only if firms automatically match prices. If consumers must instead request refunds (thereby incurring hassle costs), we find that any increase in equilibrium prices due to firms' price-matching policies will be small; often, no price increase can be supported. In symmetric markets price-matching guarantees cannot support any rise in prices, even if hassle costs are arbitrarily small In asymmetric markets, higher prices can be supported, but the prices fall well short of maximizing joint profits. Our model can explain why some firms adopt price-matching guarantees while others do not. Copyright (c) 1999 Massachusetts Institute of Technology.
Article
Temporary price reductions (sales) are common for many goods and naturally result in large increases in the quantity sold. Demand estimation based on temporary price reductions may mismeasure the long-run responsiveness to prices. In this paper we quantify the extent of the problem and assess its economic implications. We structurally estimate a dynamic model of consumer choice using two years of scanner data on the purchasing behavior of a panel of households. The results suggest that static demand estimates, which neglect dynamics, (i) overestimate own-price elasticities by 30 percent, (ii) underestimate cross-price elasticities by up to a factor of 5, and (iii) overestimate the substitution to the no-purchase or outside option by over 200 percent. This suggests that policy analysis based on static elasticity estimates will underestimate price-cost margins and underpredict the effects of mergers. Copyright The Econometric Society 2006.
Article
This paper considers a problem of optimal learning by experimentation by a single decision maker. Most of the analysis is concerned with the characterisation of limit beliefs and actions. We take a two-stage approach to this problem: first, understand the case where the agent's payoff function is deterministic; then, address the additional issues arising when noise is present. Our analysis indicates that local properties of the payoff function (such as smoothness) are crucial in determining whether the agent eventually attains the true maximum payoff or not. The paper also makes a limited attempt at characterising optimal experimentation strategies.
Conference Paper
We consider price-setting algorithms for a simple market in which a seller has an unlimited supply of identical copies of some good, and interacts sequentially with a pool of n buyers, each of whom wants at most one copy of the good. In each transaction, the seller offers a price between 0 and 1, and the buyer decides whether or not to buy, by comparing the offered price to his privately-held valuation for the good. The price offered to a given buyer may be influenced by the outcomes of prior transactions, but each individual buyer participates only once. In this setting, what is the value of knowing the demand curve? In other words, how much revenue can an uninformed seller expect to obtain, relative to a seller with prior information about the buyers' valuations? The answer depends on how the buyers' valuations are modeled. We analyze three cases - identical, random, and worst-case valuations - in each case deriving upper and lower bounds which match within a sublogarithmic factor.
Using big data to make better pricing decisions
• W Baker
• D Kiewell
• G Winkler
Pricing of short life-cycle products through active learning. Working paper
• Y Aviv
• A Pazcal