Nicole Bäuerle’s research while affiliated with Karlsruhe Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (156)


Blackwell Optimality and Policy Stability for Long-Run Risk-Sensitive Stochastic Control
  • Article

November 2024

·

4 Reads

SIAM Journal on Control and Optimization

Nicole Bäuerle

·

·


Blackwell optimality and policy stability for long-run risk sensitive stochastic control

June 2024

·

16 Reads

This paper analyzes the stability of optimal policies in the long-run stochastic control framework with an averaged risk-sensitive criterion for discrete-time MDPs on finite state-action space. In particular, we study the robustness of optimal controls when perturbations to the risk-aversion parameter are applied, and investigate the Blackwell property, together with its link to the risk-sensitive vanishing discount approximation framework. Finally, we present examples that help to better understand the intricacies of the risk-sensitive control framework.


Colourful lines: State trajectories μtN(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _t^N(1)$$\end{document} for N=100\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 100$$\end{document} (red) and N=10000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=10000$$\end{document} (green) agents in Example 5.4 when one agent starts in state 1.Black line: Deterministic limit process μt(1)=(23t)32\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _t(1) = (\frac{2}{3}t)^\frac{3}{2}$$\end{document} (Color figure online)
Left: State trajectories for different numbers N of machines executing the optimal control for (F). Right: Ten state trajectories for N=1000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 1000$$\end{document} machines executing the asymptotically optimal control (Color figure online)
Transition intensities of one device between the possible states
State trajectories for N=1000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1000$$\end{document} devices under optimal control for λSI=0.6,λSR=λIR=0.2,a¯=1,T=10.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{SI}=0.6, \lambda _{SR}=\lambda _{IR}=0.2, \bar{a}=1, T=10.$$\end{document} (Color figure online)
Transition intensities of one agent for the resource constraint problem

+1

Continuous-Time Mean Field Markov Decision Models
  • Article
  • Full-text available

June 2024

·

35 Reads

Applied Mathematics & Optimization

We consider a finite number of N statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for N→∞.N.N\rightarrow \infty . In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is 1/N1/N1/\sqrt{N}. We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.

Download


Markov decision processes with risk-sensitive criteria: an overview

April 2024

·

66 Reads

·

9 Citations

Mathematical Methods of Operations Research

The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term ’risk-sensitive’ refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk. This comprises the well-known entropic risk measure and Conditional Value-at-Risk. We restrict our considerations to stationary problems with an infinite time horizon. Conditions are given under which optimal policies exist and solution procedures are explained. We present both the theory when the Optimized Certainty Equivalent is applied recursively as well as the case where it is applied to the cumulated reward. Discounted as well as non-discounted models are reviewed.


Illustration of π1,∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^{1,*}$$\end{document} from Theorem 3.3 in terms of α∈(-0.04,αmax)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \in (-0.04,\alpha _{\text {max}})$$\end{document} for n=12\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=12$$\end{document}, μ=0.03\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 0.03$$\end{document}, σ=0.2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma = 0.2$$\end{document} and αmax=nσ2/8=0.06\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\text {max}} = n \sigma ^2 / 8=0.06$$\end{document}. θ1=0.3,δ1∈{1,4}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _1 = 0.3,\, \delta _1\in \{1,4\}$$\end{document} and the parameters θj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _j$$\end{document} and δj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \delta _j$$\end{document}, j≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\ge 2$$\end{document} are increasing from 0 to 1 with step size 0.1 and from 0.5 to 2.7 by step size 0.2, respectively. The dashed blue and orange horizontal lines represent the optimal investment without price impact, given by δ1μσ-2.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _1 \mu \sigma ^{-2}.$$\end{document}
Illustration of the constant Nash equilibrium (π1,∗,π2,∗)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\pi ^{1,*},\pi ^{2,*})$$\end{document} in terms of γ∈(0,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma \in (0,1]$$\end{document} for the parameter choices μ=0.03,σ=0.2,δ1=1,δ2=2,θ1=0.5,θ2=0.7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 0.03,\, \sigma = 0.2,\, \delta _1 = 1,\, \delta _2=2,\, \theta _1 = 0.5,\, \theta _2 = 0.7$$\end{document}, and α=0.01\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha = 0.01$$\end{document}. The horizontal dashed lines represent the Nash equilibrium under linear price impact for comparison
Illustration of π1,∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^{1,*}$$\end{document} from Theorem 5.1 in terms of α∈(-0.02,αmax)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \in (-0.02,\alpha _{\text {max}})$$\end{document} for n=12\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=12$$\end{document}, μ=0.03\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 0.03$$\end{document}, σ=0.2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma = 0.2$$\end{document}, and αmax=nσ2/8=0.06\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\text {max}} = n \sigma ^2 / 8=0.06$$\end{document}. Further, θ1=0.3,δ1∈{1,4}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _1 = 0.3,\, \delta _1\in \{1,4\}$$\end{document} and the parameters θj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _j$$\end{document} and δj\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \delta _j$$\end{document}, j≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\ge 2$$\end{document} are increasing from 0 to 1 with step size 0.1, and from 0.5 to 2.7 by step size 0.2, respectively. The dashed blue and orange horizontal lines represent the optimal investment fraction without price impact, given by δ1μσ-2.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _1 \mu \sigma ^{-2}.$$\end{document}
Nash equilibria for relative investors with (non)linear price impact

February 2024

·

28 Reads

·

3 Citations

Mathematics and Financial Economics

We consider the strategic interaction of n investors who are able to influence a stock price process and at the same time measure their utilities relative to the other investors. Our main aim is to find Nash equilibrium investment strategies in this setting in a financial market driven by a Brownian motion and investigate the influence the price impact has on the equilibrium. We consider both CRRA and CARA utility functions. Our findings show that the problem is well-posed as long as the price impact is at most linear. Moreover, numerical results reveal that the investors behave very aggressively when the price impact is close to a critical parameter.


Time-Consistency in the Mean–Variance Problem: A New Perspective

January 2024

·

21 Reads

IEEE Transactions on Automatic Control

We investigate discrete-time mean-variance portfolio selection problems viewed as a Markov decision process. We transform the problems into a new model with a deterministic transition function for which the optimality equation holds. In this way, we can solve the problem recursively and obtain a time-consistent solution, that is an optimal solution that meets the Bellman optimality principle. We apply our technique for solving explicitly a more general framework.


OPTIMAL INVESTMENT UNDER PARTIAL INFORMATION AND ROBUST VAR-TYPE CONSTRAINT

November 2023

·

10 Reads

·

1 Citation

International Journal of Theoretical and Applied Finance

This paper extends the utility maximization literature by combining partial information and (robust) regulatory constraints. Partial information is characterized by the fact that the stock price itself is observable by the optimizing financial institution, but the outcome of the market price of the risk [Formula: see text] is unknown to the institution. The regulator develops either a congruent or distinct perception of the market price of risk in comparison to the financial institution when imposing the Value-at-Risk (VaR) constraint. We also discuss a robust VaR constraint in which the regulator uses a worst-case measure. The solution to our optimization problem takes the same form as in the full information case: optimal wealth can be expressed as a decreasing function of state price density. The optimal wealth is equal to the minimum regulatory financing requirement in the intermediate economic states. The key distinction lies in the fact that the price density in the final state depends on the overall evolution of the estimated market price of risk, denoted as [Formula: see text] or that the upper boundary of the intermediate region exhibits stochastic behavior.


Continuous-time mean field Markov decision models

July 2023

·

9 Reads

We consider a finite number of N statistically equal individuals, each moving on a finite set of states according to a continuous-time Markov Decision Process. Transition intensities of the individuals and generated rewards depend not only on the state and action of the individual itself, but also on the states of the other individuals as well as the chosen action. Interactions like this are typical for a wide range of models in e.g.\ biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the individuals have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for N.N\to\infty. In contrast to other papers we do not consider the so-called Master equation. Instead we define a 'limiting' (deterministic) optimization problem from the limiting differential equation for the path trajectories. This has the advantage that we need less assumptions and can apply Pontryagin's maximum principle in order to construct asymptotically optimal strategies. We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies are not necessarily asymptotically optimal.


Network with labelled nodes (left); Optimal stationary distribution (right)
Evolution of the individuals using the optimal randomized decision when all start in node 1, after n=2,4,8,16,32\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=2,4,8,16,32$$\end{document} and 64 time steps (left to right, above to below)
Market place with ice cream vendor (left). Optimal distribution in example (right)
Mean Field Markov Decision Processes

April 2023

·

56 Reads

·

11 Citations

Applied Mathematics & Optimization

We consider mean-field control problems in discrete time with discounted reward, infinite time horizon and compact state and action space. The existence of optimal policies is shown and the limiting mean-field problem is derived when the number of individuals tends to infinity. Moreover, we consider the average reward problem and show that the optimal policy in this mean-field limit is ε\varepsilon ε -optimal for the discounted problem if the number of individuals is large and the discount factor close to one. This result is very helpful, because it turns out that in the special case when the reward does only depend on the distribution of the individuals, we obtain a very interesting subclass of problems where an average reward optimal policy can be obtained by first computing an optimal measure from a static optimization problem and then achieving it with Markov Chain Monte Carlo methods. We give two applications: Avoiding congestion an a graph and optimal positioning on a market place which we solve explicitly.


Citations (41)


... A particularly important class of stochastic control problems is that of risk-sensitive control (see Whittle, 1990;Bensoussan et al., 1998;Bielecki et al., 2022;Bäuerle & Rieder, 2014;Bäuerle & Jaśkiewicz, 2024), with linear exponential of quadratic Gaussian (LEQG) problems as a notable subclass (see Jacobson, 1973;Whittle, 1981;Bensoussan & van Schuppen, 1985;Bensoussan, 1992). LEQG problems are especially relevant in financial applications (see e.g. ...

Reference:

Exploratory Randomization for Discrete-Time Linear Exponential Quadratic Gaussian (LEQG) Problem
Markov decision processes with risk-sensitive criteria: an overview

Mathematical Methods of Operations Research

... They find Nash equilibrium strategies for both cases in a general financial market. Bäuerle and Göll (2024) have recently extended their studies by analyzing the effect of the price impact on equilibrium strategies. Deng et al. (2024) introduced another extension of Lacker and Zariphopoulou (2019), in which they investigate a mean-field game strategy in a partially observable market. ...

Nash equilibria for relative investors with (non)linear price impact

Mathematics and Financial Economics

... Uncertainty aversion with penalization proportional to discounted relative entropy with respect to "structured" models is studied in Hansen and Sargent (2022) in a setting where a decision-maker has ambiguity about a prior over the set of structured statistical models and fears that each of those models is misspecified. Bäuerle and Mahayni (2023) introduce smooth ambiguity into a portfolio optimization problem. ...

Optimal investment in ambiguous financial markets with learning
  • Citing Article
  • May 2024

European Journal of Operational Research

... Lifting MDPs to the space of probability measures has rarely been explored in the literature and is typically confined to specific settings. For instance, Bäuerle (2023) and Carmona et al. (2019) leverage the structure of Mean-Field MDPs (MFMDPs) to derive deterministic optimality equations similar to (2) in the absence of common noise. To the best of the authors' knowledge, the only exceptions to these setting-specific approaches are Shreve and Bertsekas (1979); Bertsekas and Shreve (1996) and this paper, where we develop a general framework applicable to any standard MDP. ...

Mean Field Markov Decision Processes

Applied Mathematics & Optimization

... The existence of partial information in both the insurance market and financial market profoundly affects the decision-making of insurance companies, and sometimes they also affect each other. Bäuerleand Leimcke [3] shows that the correlation between stock prices and insurance claims deeply changes the investment decisions of insurers when the optimal investment and reinsurance strategies have to be determined. For example, the COVID-19 crisis is one example for an event with impact on financial and insurance risks, which shows that it makes sense to add interdependencies between both. ...

Bayesian optimal investment and reinsurance with dependent financial and insurance risks
  • Citing Article
  • April 2022

Statistics & Risk Modeling

... Unlike existing minimax regret methods, our approach directly optimises for learnability, instead of using an imperfect proxy for regret, leading to more effective training on our domains. Robust RL methods have the goal of improving an agent's robustness to environmental disturbances, and worst-case environment dynamics [33][34][35][36][37][38][39][40][41][42][43][44][45]. However, these methods generally consider continuous perturbations instead of a mix of discrete and continuous environment settings. ...

Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures
  • Citing Article
  • November 2021

Mathematics of Operations Research

... The problem of maximizing conditional value-at-risk (CVaR; Rockafellar et al., 2000), also known as average value-at-risk or expected shortfall, has received attention both in the context of risk-sensitive reinforcement learning (Bäuerle and Ott, 2011;Chow and Ghavamzadeh, 2014;Chow et al., 2015;Bäuerle and Glauner, 2021;Greenberg et al., 2022) and in non-sequential decision-making (Rockafellar et al., 2000). ...

Minimizing spectral risk measures applied to Markov decision processes

Mathematical Methods of Operations Research

... Related Work. While the study of distributionally robust optimization (DRO) problems and distributionally robust Markov decision problems and problems has been an active research topic already in the past decade (compare, e.g., [5], [6], [15], [23], [28], [29], [30], [34], [35], [41], [45], [48], [49], [50], and [51]), the discussion and construction of Q-learning algorithms to solve these sequential decision making problems numerically has only become an active research topic very recently. ...

Q-Learning for Distributionally Robust Markov Decision Processes
  • Citing Chapter
  • June 2021

... In the more recent literature, many authors have attempted to overcome this issue with risk measures adapted to a dynamic setting. Among others, Bäuerle and Glauner (2021) propose iterated coherent risk measures, where they both derive risk-aware dynamic programming (DP) equations and provide policy iteration algorithms, Ahmadi et al. (2021) investigate bounded policy iteration algorithms for partially observable Markov decision processes, Kose and Ruszczynski (2021) prove the convergence of temporal difference algorithms optimising dynamic Markov coherent risk measures, and Cheng and Jaimungal (2022) derive a DP principle for Kusuoka-type conditional risk mappings. These works, however, require computing the value function for every possible state of the environment, limiting their applicability to problems with a small number of state-action pairs. ...

Markov decision processes with recursive risk measures
  • Citing Article
  • April 2021

European Journal of Operational Research