November 2024
·
4 Reads
SIAM Journal on Control and Optimization
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
November 2024
·
4 Reads
SIAM Journal on Control and Optimization
June 2024
·
16 Reads
This paper analyzes the stability of optimal policies in the long-run stochastic control framework with an averaged risk-sensitive criterion for discrete-time MDPs on finite state-action space. In particular, we study the robustness of optimal controls when perturbations to the risk-aversion parameter are applied, and investigate the Blackwell property, together with its link to the risk-sensitive vanishing discount approximation framework. Finally, we present examples that help to better understand the intricacies of the risk-sensitive control framework.
June 2024
·
35 Reads
Applied Mathematics & Optimization
We consider a finite number of N statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for N→∞. In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is 1/N. We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.
May 2024
·
19 Reads
·
6 Citations
European Journal of Operational Research
April 2024
·
66 Reads
·
9 Citations
Mathematical Methods of Operations Research
The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term ’risk-sensitive’ refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk. This comprises the well-known entropic risk measure and Conditional Value-at-Risk. We restrict our considerations to stationary problems with an infinite time horizon. Conditions are given under which optimal policies exist and solution procedures are explained. We present both the theory when the Optimized Certainty Equivalent is applied recursively as well as the case where it is applied to the cumulated reward. Discounted as well as non-discounted models are reviewed.
February 2024
·
28 Reads
·
3 Citations
Mathematics and Financial Economics
We consider the strategic interaction of n investors who are able to influence a stock price process and at the same time measure their utilities relative to the other investors. Our main aim is to find Nash equilibrium investment strategies in this setting in a financial market driven by a Brownian motion and investigate the influence the price impact has on the equilibrium. We consider both CRRA and CARA utility functions. Our findings show that the problem is well-posed as long as the price impact is at most linear. Moreover, numerical results reveal that the investors behave very aggressively when the price impact is close to a critical parameter.
January 2024
·
21 Reads
IEEE Transactions on Automatic Control
We investigate discrete-time mean-variance portfolio selection problems viewed as a Markov decision process. We transform the problems into a new model with a deterministic transition function for which the optimality equation holds. In this way, we can solve the problem recursively and obtain a time-consistent solution, that is an optimal solution that meets the Bellman optimality principle. We apply our technique for solving explicitly a more general framework.
November 2023
·
10 Reads
·
1 Citation
International Journal of Theoretical and Applied Finance
This paper extends the utility maximization literature by combining partial information and (robust) regulatory constraints. Partial information is characterized by the fact that the stock price itself is observable by the optimizing financial institution, but the outcome of the market price of the risk [Formula: see text] is unknown to the institution. The regulator develops either a congruent or distinct perception of the market price of risk in comparison to the financial institution when imposing the Value-at-Risk (VaR) constraint. We also discuss a robust VaR constraint in which the regulator uses a worst-case measure. The solution to our optimization problem takes the same form as in the full information case: optimal wealth can be expressed as a decreasing function of state price density. The optimal wealth is equal to the minimum regulatory financing requirement in the intermediate economic states. The key distinction lies in the fact that the price density in the final state depends on the overall evolution of the estimated market price of risk, denoted as [Formula: see text] or that the upper boundary of the intermediate region exhibits stochastic behavior.
July 2023
·
9 Reads
We consider a finite number of N statistically equal individuals, each moving on a finite set of states according to a continuous-time Markov Decision Process. Transition intensities of the individuals and generated rewards depend not only on the state and action of the individual itself, but also on the states of the other individuals as well as the chosen action. Interactions like this are typical for a wide range of models in e.g.\ biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the individuals have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for In contrast to other papers we do not consider the so-called Master equation. Instead we define a 'limiting' (deterministic) optimization problem from the limiting differential equation for the path trajectories. This has the advantage that we need less assumptions and can apply Pontryagin's maximum principle in order to construct asymptotically optimal strategies. We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies are not necessarily asymptotically optimal.
April 2023
·
56 Reads
·
11 Citations
Applied Mathematics & Optimization
We consider mean-field control problems in discrete time with discounted reward, infinite time horizon and compact state and action space. The existence of optimal policies is shown and the limiting mean-field problem is derived when the number of individuals tends to infinity. Moreover, we consider the average reward problem and show that the optimal policy in this mean-field limit is ε -optimal for the discounted problem if the number of individuals is large and the discount factor close to one. This result is very helpful, because it turns out that in the special case when the reward does only depend on the distribution of the individuals, we obtain a very interesting subclass of problems where an average reward optimal policy can be obtained by first computing an optimal measure from a static optimization problem and then achieving it with Markov Chain Monte Carlo methods. We give two applications: Avoiding congestion an a graph and optimal positioning on a market place which we solve explicitly.
... A particularly important class of stochastic control problems is that of risk-sensitive control (see Whittle, 1990;Bensoussan et al., 1998;Bielecki et al., 2022;Bäuerle & Rieder, 2014;Bäuerle & Jaśkiewicz, 2024), with linear exponential of quadratic Gaussian (LEQG) problems as a notable subclass (see Jacobson, 1973;Whittle, 1981;Bensoussan & van Schuppen, 1985;Bensoussan, 1992). LEQG problems are especially relevant in financial applications (see e.g. ...
April 2024
Mathematical Methods of Operations Research
... They find Nash equilibrium strategies for both cases in a general financial market. Bäuerle and Göll (2024) have recently extended their studies by analyzing the effect of the price impact on equilibrium strategies. Deng et al. (2024) introduced another extension of Lacker and Zariphopoulou (2019), in which they investigate a mean-field game strategy in a partially observable market. ...
February 2024
Mathematics and Financial Economics
... Uncertainty aversion with penalization proportional to discounted relative entropy with respect to "structured" models is studied in Hansen and Sargent (2022) in a setting where a decision-maker has ambiguity about a prior over the set of structured statistical models and fears that each of those models is misspecified. Bäuerle and Mahayni (2023) introduce smooth ambiguity into a portfolio optimization problem. ...
May 2024
European Journal of Operational Research
... Lifting MDPs to the space of probability measures has rarely been explored in the literature and is typically confined to specific settings. For instance, Bäuerle (2023) and Carmona et al. (2019) leverage the structure of Mean-Field MDPs (MFMDPs) to derive deterministic optimality equations similar to (2) in the absence of common noise. To the best of the authors' knowledge, the only exceptions to these setting-specific approaches are Shreve and Bertsekas (1979); Bertsekas and Shreve (1996) and this paper, where we develop a general framework applicable to any standard MDP. ...
April 2023
Applied Mathematics & Optimization
... The first step in our analysis is to produce closed form solutions for the master system (2) and (4). Specifically, we show that a solution U to (2) is given by ...
October 2022
Mathematical Methods of Operations Research
... The existence of partial information in both the insurance market and financial market profoundly affects the decision-making of insurance companies, and sometimes they also affect each other. Bäuerleand Leimcke [3] shows that the correlation between stock prices and insurance claims deeply changes the investment decisions of insurers when the optimal investment and reinsurance strategies have to be determined. For example, the COVID-19 crisis is one example for an event with impact on financial and insurance risks, which shows that it makes sense to add interdependencies between both. ...
April 2022
Statistics & Risk Modeling
... Unlike existing minimax regret methods, our approach directly optimises for learnability, instead of using an imperfect proxy for regret, leading to more effective training on our domains. Robust RL methods have the goal of improving an agent's robustness to environmental disturbances, and worst-case environment dynamics [33][34][35][36][37][38][39][40][41][42][43][44][45]. However, these methods generally consider continuous perturbations instead of a mix of discrete and continuous environment settings. ...
November 2021
Mathematics of Operations Research
... The problem of maximizing conditional value-at-risk (CVaR; Rockafellar et al., 2000), also known as average value-at-risk or expected shortfall, has received attention both in the context of risk-sensitive reinforcement learning (Bäuerle and Ott, 2011;Chow and Ghavamzadeh, 2014;Chow et al., 2015;Bäuerle and Glauner, 2021;Greenberg et al., 2022) and in non-sequential decision-making (Rockafellar et al., 2000). ...
August 2021
Mathematical Methods of Operations Research
... Related Work. While the study of distributionally robust optimization (DRO) problems and distributionally robust Markov decision problems and problems has been an active research topic already in the past decade (compare, e.g., [5], [6], [15], [23], [28], [29], [30], [34], [35], [41], [45], [48], [49], [50], and [51]), the discussion and construction of Q-learning algorithms to solve these sequential decision making problems numerically has only become an active research topic very recently. ...
June 2021
... In the more recent literature, many authors have attempted to overcome this issue with risk measures adapted to a dynamic setting. Among others, Bäuerle and Glauner (2021) propose iterated coherent risk measures, where they both derive risk-aware dynamic programming (DP) equations and provide policy iteration algorithms, Ahmadi et al. (2021) investigate bounded policy iteration algorithms for partially observable Markov decision processes, Kose and Ruszczynski (2021) prove the convergence of temporal difference algorithms optimising dynamic Markov coherent risk measures, and Cheng and Jaimungal (2022) derive a DP principle for Kusuoka-type conditional risk mappings. These works, however, require computing the value function for every possible state of the environment, limiting their applicability to problems with a small number of state-action pairs. ...
April 2021
European Journal of Operational Research