Figure 1 - uploaded by Alexandre Xavier Ywata Carvalho

Content may be subject to copyright.

# Comparison of expected mean cumulative revenues for several pricing policies (optimal expected revenue per period under known parameters values is 8, 906).

Source publication

This paper considers the problem of changing prices over time to maximize expectedrevenues in the presence of unknown demand distribution parameters. It providesand compares several methods that use the sequence of past prices and observeddemands to set price in the current period. A Taylor series expansion of the futurereward function explicitly i...

## Context in source publication

**Context 1**

... simulations show that the unconstrained one-step ahead rules provide greater mean cumulative revenuesˆErevenuesˆ revenuesˆE[CR(t)] than the other strategies. Figure 1 provides a comparison of a selected one-step ahead rule in which G(t) is piecewise linear, and the other rules. A comparison between several one-step pricing rules is shown in Figure 2. ...

## Similar publications

In an influential paper, Jesse Rothstein (2010) shows that standard value-added models (VAMs) suggest implausible and large future teacher effects on past student achievement. This is the basis of a falsification test that appears to indicate bias in typical VAM estimates of teacher contributions to student learning on standardized tests. We find t...

## Citations

... They showed via simulations that one-step lookahead policies could significantly outperform myopic policies. Various other semimyopic policies have also been investigated in Carvalho & Puterman (2005b). ...

In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning.

... mention Kalman filter estimates for timeseries forecasting, but do not explicate how such estimates could be computed for revenue management. Carvalho and Puterman (2015) employ Kalman filters as a heuristic to develop a one-step-look-ahead strategy based on a second degree Taylor expansion of future revenue. Kwon et al. (2009) use a Kalman filter to estimate demand parameters for competing service providers. ...

... Here, we extend research relying on Kalman filter equations such as Lobo and Boyd (2003) and Carvalho and Puterman (2015). We adapt the idea of demand evolving in the form of an auto-regressive process from Li et al. (2009) andChung et al. (2012) to consecutive sales periods. ...

In recent years, revenue management research developed increasingly complex demand forecasts to model customer choice. While the resulting systems should easily outperform their predecessors, it appears difficult to achieve substantial improvement in practice. At the same time, interest in robust revenue maximization is growing. From this arises the challenge of creating versatile and computationally efficient approaches to estimate demand and quantify demand uncertainty. Motivated by this challenge, this paper introduces and benchmarks two filter-based demand estimators: the unscented Kalman filter and the particle filter. It documents a computational study, which is set in the airline industry and compares the estimators’ efficiency to that of sequential estimation and maximum-likelihood estimation. We quantify estimator efficiency through the posterior Cramér–Rao bound and compare revenue performance to the revenue opportunity. Both indicate that unscented Kalman filter and maximum-likelihood estimation outperform the alternatives. In addition, the Kalman filter requires comparatively little computational effort to update and quantifies demand uncertainty.

... A key feature of time-resolved uncertainties is that the resolution (or reduction) of the uncertainty provides an opportunity for updated design decision-making, if flexibility has been provided in the design. As an example of this process, future demand for a system is an epistemic uncertain variable as sudden changes in the future may occur due to the introduction of a new technology to the market, changes in government policy, or changes in customers' interests (Carvalho & Puterman, 2004). Future changes in demand, including changes in the type of demand or the number of users, are unknown when a system is being designed; however, when demand is observed, modifications can be made in response to the state of demand (i.e. the state of interest). ...

It is desirable for complex engineered systems to be resilient to various sources of uncertainty throughout their life cycle. Such systems are high in cost and complexity, and often incorporate highly sophisticated materials, components, design, and other technologies. There are many uncertainties such systems will face throughout their life cycles due to changes in internal and external conditions, or states of interest , to the designer, such as technology readiness, market conditions, or system health. These states of interest affect the success of the system design with respect to the main objectives and application of the system, and are generally uncertain over the life cycle of the system. To address such uncertainties, we propose a resilient design approach for engineering systems. We utilize a Kalman filter approach to model the uncertain future states of interest. Then, based upon the modeled states, the optimal change in the design of the system is achieved to respond to the new states. This resilient method is applicable in systems when the ability to change is embedded in the system design. A design framework is proposed encompassing a set of definitions, metrics, and methodologies. A case study of a communication satellite system is presented to illustrate the features of the approach.

... Article submitted to Management Science; manuscript no. MS-10-00459-R4 (2005b), this idea is applied to a binomial demand function with expectation a logit function of the price, whereas in Carvalho and Puterman (2005a), a log-normal demand model is considered. ...

Price experimentation is an important tool for firms to find the optimal selling price of their products. It should be conducted properly, since experimenting with selling prices can be costly. A firm, therefore, needs to find a pricing policy that optimally balances between learning the optimal price and gaining revenue. In this paper, we propose such a pricing policy, called controlled variance pricing (CVP). The key idea of the policy is to enhance the certainty equivalent pricing policy with a taboo interval around the average of previously chosen prices. The width of the taboo interval shrinks at an appropriate rate as the amount of data gathered gets large; this guarantees sufficient price dispersion. For a large class of demand models, we show that this procedure is strongly consistent, which means that eventually the value of the optimal price will be learned, and derive upper bounds on the regret, which is the expected amount of money lost due to not using the optimal price. Numerical tests indicate that CVP performs well on different demand models and time scales.
This paper was accepted by Assaf Zeevi, stochastic models and simulation.

... Although this is not a focus of the current paper, we mention that there is a growing literature on this specific topic. See [31]- [34] for examples of recent work. All of these models are distinguished from ours in their objectives and in the specific demand and inventory situations treated. ...

We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greedy policy that takes advantage of the known statistical correlation structure among the arms. In the infinite horizon discounted reward setting, we show that the greedy and optimal policies eventually coincide, and both settle on the best arm. This is in contrast with the Incomplete Learning Theorem for the case of independent arms. In the total reward setting, we show that the cumulative Bayes risk after T periods under the greedy policy is at most O(logT), which is smaller than the lower bound of Omega(log[superscript 2] T) established by Lai for a general, but different, class of bandit problems. We also establish the tightness of our bounds. Theoretical and numerical results show that the performance of our policy scales independently of the number of arms. National Science Foundation (Grants DMS-0732196, CMMI-0746844, and ECCS-0701623) Kenan-Flagler Business School University of Chicago. Graduate School of Business

This paper presents a framework to compare the resiliency of different designs during the conceptual design, when information about implementation details is unavailable. We apply the Inherent Behavioral Functional Model (IBFM) tool to develop an initial functional model for a system and simulate the failure behavior. The simulated failure scenarios provide us the information on the unique failure propagation paths and the end state/final behavior of the system assigned to each failure. Each failure path is caused by injecting one or multiple simultaneous faults into the functional model. Within this framework, we generate a population of functional models from a baseline seed model, and evaluate its potential failure scenarios. We also develop a cost-risk model to compare resiliency of different designs, and produce a preference ranking. select the most resilient one, based upon the cost-risk objective. The risk is calculated based on the probability of having an undesired end state for each design, and a consequential cost is assigned to each failure to quantify the cost-risk for a given design. In this paper, we implement and demonstrate the proposed method on the design of a resilient mono-propellant system.

We study a dynamic pricing problem with multiple products and infinite inventories. The demand for these products depends on the selling prices and on parameters unknown to the seller. Their value can be learned from accumulating sales data using statistical estimation techniques. The quality of the parameter estimates is influenced by the amount of price dispersion; however, a large amount of variation in the selling prices can be costly since it means that suboptimal prices are used. The seller thus needs to balance optimizing the quality of the parameter estimates and optimizing instant revenue, i.e., exploitation and exploration.
In this study we propose a pricing policy for this dynamic pricing problem. The key idea is to use at each time period the price that is optimal with respect to current parameter estimates, with an additional constraint that ensures sufficient price dispersion. We measure the price dispersion by the smallest eigenvalue of the design matrix and show how a desired growth rate of this eigenvalue can be achieved by a simple quadratic constraint in the price-optimization problem. We study the performance of our pricing policy by providing bounds on the regret, which measures the expected revenue loss caused by using suboptimal prices.

This paper focuses on joint dynamic pricing and demand learning in an oligopolistic market. Each firm seeks to learn the price-demand
relationship for itself and its competitors, and to set optimal prices, taking into account its competitors’ likely moves.
We follow a closed-loop approach to capture the transient aspect of the problem, that is, pricing decisions are updated dynamically
over time, using the data acquired thus far.
We formulate the problem faced at each time period by each firm as a Mathematical Program with Equilibrium Constraints (MPEC).
We utilize variational inequalities to capture the game-theoretic aspect of the problem. We present computational results
that provide insights on the model and illustrate the pricing policies this model gives rise to.

We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greedy policy that takes advantage of the known statistical correlation structure among the arms. In the infinite horizon discounted reward setting, we show that the greedy and optimal policies eventually coincide, and both settle on the best arm. This is in contrast with the Incomplete Learning Theorem for the case of independent arms. In the total reward setting, we show that the cumulative Bayes risk after T periods under the greedy policy is at most O (log T ), which is smaller than the lower bound of ??(log<sup>2</sup> T ) established by Lai for a general, but different, class of bandit problems. We also establish the tightness of our bounds. Theoretical and numerical results show that the performance of our policy scales independently of the number of arms.