Conference Paper

Point-based POMDP solvers for life-cycle cost minimization of deteriorating structures

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Optimized maintenance of operating aging infrastructures is of paramount importance to ensure safe and cost effective operation during their original design lifetime and even beyond that. Modern answers to the problem should focus on automated planning and decision making techniques taking advantage of informative but uncertain data that become available during the structural life-cycle. In this paper such a solution framework is presented, based on partially observable Markov decision processes (POMDPs). In a POMDP framework, the evolution of the system is described by stochastic processes, real-time observation data update the system state estimations, and all possible future actions, about where, when and what type of inspection and repair should be performed, are taken into account in order to optimize the long-term life-cycle objectives. As a consequence of their advanced mathematical attributes, POMDP models are unfortunately hard to solve. In recent years, however, significant breakthroughs have been achieved, mainly due to the introduction of point-based value iteration algorithms. In this work, several POMDP point-based methods are examined, with various characteristics in the selection of the belief space points/subset and the value function update procedures. To investigate the strengths and limitations of the various solution methods for structural maintenance problems of deteriorating infrastructure and to draw conclusions regarding their efficiency and applicability to problems of this kind, a realistic nonstationary example is selected, concerning corrosion of reinforcing bars of concrete structures in a spatial stochastic context.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In [39,40], POMDPs are adopted for decision-making for highway-pavements. The use of POMDPs has been also applied in [41,42] for bridge inspection planning, whereas point-based solutions for stochastic deteriorating systems using POMDPs have been presented in [43,44,45,46]. A continuous formulation for problems described by linear and/or nonlinear transition functions is presented in [47], whereas specialized cases of mixed observability are also presented in [48]. ...
... A detailed overview on point-based solvers along with their application in various robotic tasks can be found in [37]. Their insights and application details in structural inspection and maintenance planning can be found in [48,46], where different point- 6 based approaches are tested. Among them, the three most competitive are identified and used herein. ...
Preprint
Full-text available
Efficient integration of uncertain observations with decision-making optimization is key for prescribing informed intervention actions, able to preserve structural safety of deteriorating engineering systems. To this end, it is necessary that scheduling of inspection and monitoring strategies be objectively performed on the basis of their expected value-based gains that, among others, reflect quantitative metrics such as the Value of Information (VoI) and the Value of Structural Health Monitoring (VoSHM). In this work, we introduce and study the theoretical and computational foundations of the above metrics within the context of Partially Observable Markov Decision Processes (POMDPs), thus alluding to a broad class of decision-making problems of partially observable stochastic deteriorating environments that can be modeled as POMDPs. Step-wise and life-cycle VoI and VoSHM definitions are devised and their bounds are analyzed as per the properties stemming from the Bellman equation and the resulting optimal value function. It is shown that a POMDP policy inherently leverages the notion of VoI to guide observational actions in an optimal way at every decision step, and that the permanent or intermittent information provided by SHM or inspection visits, respectively, can only improve the cost of this policy in the long-term, something that is not necessarily true under locally optimal policies, typically adopted in decision-making of structures and infrastructure. POMDP solutions are derived based on point-based value iteration methods, and the various definitions are quantified in stationary and non-stationary deteriorating environments, with both infinite and finite planning horizons, featuring single- or multi-component engineering systems.
... The theoretical elegance of POMDPs is, however, not on par with the complexity of their accompanying solution techniques. Alleviating some of the emerging complexities, point-based value iteration is successfully implemented for optimization of medium-sized I&M problems Papakonstantinou, et al., 2016, Morato, et al., 2022a. ...
Conference Paper
Full-text available
Inspection and maintenance (I&M) optimization entails many sources of computational complexity, among others, due to high-dimensional decision and state variables in multi-component systems, long planning horizons, stochasticity of objectives and constraints, and inherent uncertainties in measurements and models. This paper studies how the above can be addressed within the context of constrained Partially Observable Markov Decision Processes (POMDPs) and Deep Reinforcement Learning (DRL) in a unified fashion. Special emphasis is paid on how ordered action structuring of I&M actions can be exploited to decompose the respective policy parametrizations in actor-critic DRL schemes, resulting into fully decoupled maintenance and inspection actors. It is shown that the Value of Information (VoI) is naturally utilized in such POMDP control frameworks, as directly associated with the DRL advantage functions that emerge in the gradient computations of the inspection policy parameters. Overall, the presented approach, following the natural flow of engineering decisions, results in new architectural configurations for policy networks, facilitating more efficient training, while alleviating further the dimensionality burdens related to combinatorial definitions of I&M actions. The efficiency of the methodology is demonstrated in numerical experiments of a structural system subject to corrosion, where the optimization problem is formulated to concurrently account for state and model uncertainties as well as long-term probability of failure exceedance constraints. Results showcase that the obtained DRL policies considerably outperform standard decision rules.
... The benefits offered by adaptive I&M policies are substantiated by Yang and Frangopol (2022), and Morato, Papakonstantinou, Andriotis, Nielsen, and Rigo (2022) have demonstrated that partially observable Markov decision process (POMDP) solved via point-based algorithms can efficiently determine optimal I&M policies. An overview of stateof-the-art point-based solvers and their applicability to infrastructure management can be found in Papakonstantinou, Andriotis, and Shinozuka (2017). ...
Article
Exposed to the cyclic action of wind and waves, offshore wind structures are subject to fatigue deterioration processes throughout their operational life, therefore constituting a structural failure risk. In order to control the risk of adverse events, physics-based deterioration models, which often contain significant uncertainties, can be updated with information collected from inspections, thus enabling decision-makers to dictate more optimal and informed maintenance interventions. The identified decision rules are, however, influenced by the deterioration model and failure criterion specified in the formulation of the pre-posterior decision-making problem. In this paper, fatigue failure criteria are integrated with Bayesian networks and Markov decision processes. The proposed methodology is implemented in the numerical experiments, specified with various crack growth models and failure criteria, for the optimal management of an offshore wind structural detail under fatigue deterioration. Within the experiments, the crack propagation, structural reliability estimates, and the optimal policies derived through heuristics and partially observable Markov decision processes (POMDPs) are thoroughly analysed, demonstrating the capability of failure assessment diagram to model the structural redundancy in offshore wind substructures, as well as the adaptability of POMDP policies.
... kov Models (DHMMs), equipped with the softmax function for transition probabilities, have been shown to provide sound probabilistic frameworks for sequential predictions (Bengio & Frasconi, 1996;Visser & Speekenbrink, 2010). Featuring, in a more arbitrary setting, hidden states that are not completely identifiable by Response Metrics (RMs), such dynamic models can provide transition probabilities among states given an Intensity Measure (IM) of interest, while at the same time preserving the Markovian property that is central in structural performance prediction frameworks and optimal maintenance and inspection planning (Papakonstantinou & Shinozuka, 2014a, 2014bPapakonstantinou, et al., 2016Papakonstantinou, et al., , 2018Andriotis & Papakonstantinou 2018c). ...
Conference Paper
Full-text available
Extended and generalized fragility functions support estimation of multiple damage state probabilities, based on intensity measure spaces of arbitrary dimensions and longitudinal state dependencies in time. The softmax function provides a consistent mathematical formulation for fragility analysis, thus, fragility functions are herein developed along the premises of softmax regression. In this context, the assumption that a lognormal or any other cumulative distribution function should be used to represent fragility functions is eliminated, multivariate data can be easily handled, and fragility crossings are avoided without the need for any parametric constraints. Adding to the above attributes, generalized fragility functions also provide probabilistic transitions among possible damage states, which can be either hidden or explicitly defined, thus allowing for long-term performance predictions. Long-term considerations enable the study and probabilistic quantification of the cumulative deterioration effects caused by multiple sequential events, while hidden damage states are described as states that are either not deterministically observed or determined, or that are initially even completely unknown and undefined based on relevant engineering demand parameters. Although hidden damage state cases are, therefore, frequently encountered in structural performance assessments, methods to untangle their longitudinal dynamics are elusive in the literature. In this work, various techniques are developed for fragility analysis with hidden damage states and long-term deterioration effects, from Markovian probabilistic graphical models to more flexible deep learning architectures with recurrent units.
Article
Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.
Article
Full-text available
Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also discuss recent improvements to our (point-based) heuristic search value iteration algorithm. Our new implementation calculates tighter initial bounds, avoids solving linear programs, and makes more effective use of sparsity.
Article
Full-text available
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We introduce an algorithm based on DP, which we call Real-Time DP (RTDP), by which an embedded system can improve its performance with experience. RTDP generalizes Korf's Learning-Real-Time-A* algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several different classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins' Q-Learning algorithm. A secondary aim of this article is to provide a bridge between AI research on real-time planning and learning and relevant concepts and algorithms from control theory.
Conference Paper
Full-text available
Motion planning in uncertain and dynamic environments is an essential capability for autonomous robots. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for solving such problems, but they are often avoided in robotics due to high computational complexity. Our goal is to create practical POMDP algorithms and software for common robotic tasks. To this end, we have developed a new point-based POMDP algorithm that exploits the notion of optimally reachable belief spaces to improve computational efficiency. In simulation, we successfully applied the algorithm to a set of common robotic tasks, including instances of coastal navigation, grasping, mobile robot exploration, and target tracking, all modeled as POMDPs with a large number of states. In most of the instances studied, our algorithm substantially outperformed one of the fastest existing point-based algorithms. A software package implementing our algorithm will soon be released at
Conference Paper
Full-text available
Real-time dynamic programming (RTDP) is a heuris- tic search algorithm for solving MDPs. We present a modified algorithm called Focused RTDP with several improvements. While RTDP maintains only an upper bound on the long-term reward function, FRTDP main- tains two-sided bounds and bases the output policy on the lower bound. FRTDP guides search with a new rule for outcome selection, focusing on parts of the search graph that contribute most to uncertainty about the values of good policies. FRTDP has modified trial termination cri- teria that should allow it to solve some problems (within ) that RTDP cannot. Experiments show that for all the problems we studied, FRTDP significantly outperforms RTDP and LRTDP, and converges with up to six times fewer backups than the state-of-the-art HDP algorithm.
Article
Operation and maintenance of an infrastructure system rely on information collected on its components, which can provide the decision maker with an accurate assessment of their condition states. However, resources to be invested in data gathering are usually limited and observations should be collected based on their Value of Information (VoI). Assessing the VoI is computationally intractable for most applications involving sequential decisions, such as long-term infrastructure maintenance. In this article, we propose an approach for integrating adaptive maintenance planning based on Partially Observable Markov Decision Process (POMDP) and inspection scheduling based on a tractable approximation of VoI. Two alternative myopic approaches, namely pessimistic and optimistic, are introduced, and compared theoretically and by numerical examples.
Article
The signs of deterioration in worldwide infrastructure and the associated socio-economic and environmental losses call for sustainable resource management and policy-making. To this end, this work presents an enhanced variant of partially observable Markov decision processes (POMDPs) for the life cycle assessment and maintenance planning of infrastructure. POMDPs comprise a method, commonly employed in the field of robotics, for decision-making on the basis of uncertain observations. In the work presented herein, a continuous-state POMDP formulation is presented which is adapted to the problem of decision-making for optimal management of civil structures. The aforementioned problem may comprise non-linear and non-deterministic action and observation models. The continuous-state POMDP is herein coupled with a normalised unscented transform (NUT) in order to deliver a framework able to tackle non-linearities that likely characterise action models. The capabilities of this enhanced framework and its applicability to the maintenance planning problem are presented via two applications. In a first illustrative example, the use of the NUT is demonstrated within the framework of the value iteration algorithm. Next, the proposed continuous-state framework is compared against a discrete-state formulation for implementation on a life cycle assessment problem.
Article
Stochastic control methods have a history of implementation in risk management and life-cycle cost procedures for civil engineering structures. The synergy of stochastic control methods and Bayesian principles can result in Partially Observable Markov Decision Processes (POMDPs) that allow consideration of uncertainty within the entire domain of the model and expand available policy options in comparison to other state-of-the art methods. The superior attributes of POMDPs enable optimum decisions which are based on the belief space or otherwise only on the best knowledge that a decision-maker can have at each time. In this work the effort is mostly based in modeling and solving the problem of finding optimal policies for the maintenance and management of aging structures through a POMDP framework with large state spaces that can adequately and sufficiently describe real life problems. In order to form the POMDP framework, stochastic, physically based models can be used and their connection to the control process is explained in detail. Specific examples of a corroded existing structure are presented, based on non-stationary POMDPs, for both infinite and finite horizon cases, with 332 and 14009 states respectively. Results from both cases are compared and discussed and the capabilities of the method become apparent.
Article
The overall objective of this two part study is to highlight the advanced attributes, capabilities and use of stochastic control techniques, and especially Partially Observable Markov Decision Processes (POMDPs), that can address the conundrum of planning optimum inspection/monitoring and maintenance policies based on stochastic models and uncertain structural data in real time. In this second part of the study a distinct, advanced, infinite horizon POMDP formulation with 332 states is cast and solved, related to a corroding reinforced concrete structure and its minimum life-cycle cost. The formation and solution of the problem modernize and extend relevant approaches and motivate use of POMDP methods in challenging practical applications. Apart from uncertain observations the presented framework can also support uncertain action outcomes, non-periodic inspections and choice availability of inspection/monitoring types and intervals, as well as maintenance actions and action times. It is thus no surprise that the estimated optimum policy consists of a complex combination of a variety of actions, which cannot be achieved by any other method. To be able to solve the problem we resort to a point-based value iteration solver and we evaluate its performance and solution quality for this type of applications. Simpler approximate solvers based on MDPs are also used and compared and the important notions of observation gathering actions and the value of information are briefly discussed.
Article
To address effectively the urgent societal need for safe structures and infrastructure systems under limited resources, science-based management of assets is needed. The overall objective of this two part study is to highlight the advanced attributes, capabilities and use of stochastic control techniques, and especially Partially Observable Markov Decision Processes (POMDPs), that can address the conundrum of planning optimum inspection/monitoring and maintenance policies based on stochastic models and uncertain structural data in real time. Markov Decision Processes are in general controlled stochastic processes that move away from conventional optimization approaches in order to achieve minimum life-cycle costs and advice the decision-makers to take optimum sequential decisions based on the actual results of the inspections or the non-destructive testings they perform. In this first part of the study we exclusively describe, out of the vast and multipurpose stochastic control field, methods that are fitting for structural management, starting from simpler to sophisticated techniques and modern solvers. We present Markov Decision Processes (MDPs), semi-MDP and POMDP methods in an overview framework, we have related each of these to the others, and we have described POMDP solutions in many forms, including both the problematic grid-based approximations that are routinely used in structural maintenance problems, and the advanced point-based solvers capable of solving large scale, realistic problems. Our approach in this paper is helpful for understanding shortcomings of the currently used methods, related complications, possible solutions and the significance the different solvers have, not only on the solution but also on the modeling choices of the problem. In the second part of the study we utilize almost all presented topics and notions in a very broad, infinite horizon, minimum life-cycle cost structural management example and we focus on point-based solvers implementation and comparison with simpler techniques, among others.
Article
The utilization of Markov decision processes as a sequential decision algorithm in the management actions of infrastructure (inspection, maintenance and repair) is discussed. The realistic issue of partial information from inspection is described, and the classic approach of partially observable Markov decision processes is then introduced. The use of this approach to determine optimal inspection strategies is described, as well as the role of deterioration and maintenance for steel structures. Discrete structural shapes and maintenance actions provide a tractable approach. In-service inspection incorporates Bayesian updating and leads to optimal operation and initial design. Finally, the concept of management policy is described with strategy vectors.
Article
A partially observable Markov decision process (MDP) model is presented that extends beyond completely observable approaches by recognizing that inspections do not yield perfect estimates of the true internal state of system components. The approach permits the exact solution of problems whose output is inspection and maintenance policies that prescribe when to inspect, how to inspect, when to repair and how to repair, so as to minimize discounted life-cycle costs. The extension to accommodate partial observability does, however, exact a significant computational demand. The model is demonstrated with a one-lane, two-girder bridge inspection, maintenance, and repair application.
Article
Our knowledge to model, analyse, design, maintain, monitor, manage, predict and optimise the life-cycle performance of structures and infrastructures under uncertainty is continually growing. However, in many countries, including the United States, the civil infrastructure is no longer within desired levels of performance and safety. Decisions regarding civil infrastructure systems should be supported by an integrated reliability-based life-cycle multi-objective optimisation framework by considering, among other factors, the likelihood of successful performance and the total expected cost accrued over the entire life-cycle. The primary objective of this paper is to highlight recent accomplishments in the life-cycle performance assessment, maintenance, monitoring, management and optimisation of structural systems under uncertainty. Challenges are also identified.
Article
In comparison with the well-researched field of analysis and design of structural systems, the life-cycle performance prediction of these systems under no maintenance as well as under various maintenance scenarios is far more complex, and is a rapidly emergent field in structural engineering. As structures become older and maintenance costs become higher, different agencies and administrations in charge of civil infrastructure systems are facing challenges related to the implementation of structure maintenance and management systems based on life-cycle cost considerations. This article reviews the research to date related to probabilistic models for maintaining and optimizing the life-cycle performance of deteriorating structures and formulates future directions in this field.
Article
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.