David M. Bossens’s research while affiliated with Institute of High Performance Computing and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (29)


The Digital Ecosystem of Beliefs: does evolution favour AI over humans?
  • Preprint

December 2024

·

3 Reads

David M. Bossens

·

Shanshan Feng

·

Yew-Soon Ong

As AI systems are integrated into social networks, there are AI safety concerns that AI-generated content may dominate the web, e.g. in popularity or impact on beliefs.To understand such questions, this paper proposes the Digital Ecosystem of Beliefs (Digico), the first evolutionary framework for controlled experimentation with multi-population interactions in simulated social networks. The framework models a population of agents which change their messaging strategies due to evolutionary updates following a Universal Darwinism approach, interact via messages, influence each other's beliefs through dynamics based on a contagion model, and maintain their beliefs through cognitive Lamarckian inheritance. Initial experiments with an abstract implementation of Digico show that: a) when AIs have faster messaging, evolution, and more influence in the recommendation algorithm, they get 80% to 95% of the views, depending on the size of the influence benefit; b) AIs designed for propaganda can typically convince 50% of humans to adopt extreme beliefs, and up to 85% when agents believe only a limited number of channels; c) a penalty for content that violates agents' beliefs reduces propaganda effectiveness by up to 8%. We further discuss implications for control (e.g. legislation) and Digico as a means of studying evolutionary principles.


Quantum Policy Gradient in Reproducing Kernel Hilbert Space

November 2024

·

3 Reads

Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional complex Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum reinforcement learning. This paper proposes parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including available analytic forms for the gradient of the policy and tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. Two actor-critic algorithms, one based on stochastic policy gradient and one based on deterministic policy gradient (comparable to the popular DDPG algorithm), demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.




Mapping the Complexity of Legal Challenges for Trustworthy Drones on Construction Sites in the United Kingdom
  • Article
  • Full-text available

May 2024

·

149 Reads

ACM Journal on Responsible Computing

·

David Bossens

·

·

[...]

·

Shane Windsor

Drones, unmanned aircraft controlled remotely and equipped with cameras, have seen widespread deployment across military, industrial, and commercial domains. The commercial sector, in particular, has experienced rapid growth, outpacing regulatory developments due to substantial financial incentives. The UK construction sector exemplifies a case where the regulatory framework for drones remains unclear. This article investigates the state of UK legislation on commercial drone use in construction through a thematic analysis of peer-reviewed literature. Four main themes, including opportunities, safety risks, privacy risks, and the regulatory context, were identified along with twenty-one sub-themes such as noise and falling materials. Findings reveal a fragmented regulatory landscape, combining byelaws, national laws, and EU regulations, creating business uncertainty. Our study recommends the establishment of specific national guidelines for commercial drone use, addressing uncertainties and building public trust, especially in anticipation of the integration of ‘autonomous’ drones. This research contributes to the responsible computing domain by uncovering regulatory gaps and issues in UK drone law, particularly within the often-overlooked context of the construction sector. The insights provided aim to inform future responsible computing practices and policy development in the evolving landscape of commercial drone technology.

Download

Lifetime policy reuse and the importance of task capacity

October 2023

·

5 Reads

·

1 Citation

AI Communications

A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.


Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

August 2023

·

8 Reads

The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy through gradient descent rather than indirectly and abruptly through constrained optimisation on a sorted value list. A theoretical analysis first derives the Lagrangian policy gradient for the policy optimisation of both proposed algorithms and then the adversarial policy gradient to learn the adversary for Adversarial RCPG. Empirical experiments injecting perturbations in inventory management and safe navigation tasks demonstrate the competitive performance of both algorithms compared to traditional RCPG variants as well as non-robust and non-constrained ablations. In particular, Adversarial RCPG ranks among the top two performing algorithms on all tests.


Low Variance Off-policy Evaluation with State-based Importance Sampling

December 2022

·

4 Reads

In off-policy reinforcement learning, a behaviour policy performs exploratory interactions with the environment to obtain state-action-reward samples which are then used to learn a target policy that optimises the expected return. This leads to a problem of off-policy evaluation, where one needs to evaluate the target policy from samples collected by the often unrelated behaviour policy. Importance sampling is a traditional statistical technique that is often applied to off-policy evaluation. While importance sampling estimators are unbiased, their variance increases exponentially with the horizon of the decision process due to computing the importance weight as a product of action probability ratios, yielding estimates with low accuracy for domains involving long-term planning. This paper proposes state-based importance sampling (SIS), which drops the action probability ratios of sub-trajectories with "neglible states" -- roughly speaking, those for which the chosen actions have no impact on the return estimate -- from the computation of the importance weight. Theoretical results show that this results in a reduction of the exponent in the variance upper bound as well as improving the mean squared error. An automated search algorithm based on covariance testing is proposed to identify a negligible state set which has minimal MSE when performing state-based importance sampling. Experiments are conducted on a lift domain, which include "lift states" where the action has no impact on the following state and reward. The results demonstrate that using the search algorithm, SIS yields reduced variance and improved accuracy compared to traditional importance sampling, per-decision importance sampling, and incremental importance sampling.



Trust in Language Grounding: a new AI challenge for human-robot teams

September 2022

·

34 Reads

The challenge of language grounding is to fully understand natural language by grounding language in real-world referents. While AI techniques are available, the widespread adoption and effectiveness of such technologies for human-robot teams relies critically on user trust. This survey provides three contributions relating to the newly emerging field of trust in language grounding, including a) an overview of language grounding research in terms of AI technologies, data sets, and user interfaces; b) six hypothesised trust factors relevant to language grounding, which are tested empirically on a human-robot cleaning team; and c) future research directions for trust in language grounding.


Citations (12)


... Doubly robust (DR) estimators (e.g., Jiang & Li (2016); Farajtabar et al. (2018)) combine model-based DM and model-free IS for OPE but may fail to reduce variance when both DM and IS have high variance. Various methods have been developed to refine estimation accuracy in IS, such as truncating importance weights and estimating weights from steady-state visitation distributions (Liu et al., 2018a;Xie et al., 2019;Doroudi et al., 2017;Bossens & Thomas, 2024). ...

Reference:

Concept-driven Off Policy Evaluation
Low Variance Off-policy Evaluation with State-based Importance Sampling
  • Citing Conference Paper
  • June 2024

... This could still be problematic in real-world power systems because NNbased approximation is unreliable at the early stage, which takes effect when collecting enough data. Two possible remedies are 1) training the policy on digital twin simulators to collect initial data, and 2) applying transfer learning or simto-real techniques [107] to generate initial sample data. ...

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes
  • Citing Conference Paper
  • June 2024

... While these simulations could be subsumed under the category of applied algorithms, we distinguish between the two to highlight the significant attention that safer agent learning receives. Research in this area encompasses various approaches, including: constrained Markov Decision Processes [88,121,122], enforcement of hard constraints [123,124], model-based reinforcement learning for safe exploration [72,125,126], reward learning and inverse reinforcement learning [34,60,127], multi-agent reinforcement learning with a focus on safety and cooperation [128][129][130], and reinforcement learning with human oversight or feedback mechanisms for enhanced safety [33,131,132]. ...

Explicit Explore, Exploit, or Escape (E4): near-optimal safety-constrained reinforcement learning in polynomial time
  • Citing Article
  • January 2022

... Uncertainty-based safety approaches are variabilities that are either due to inherent stochasticity of the system or to stochasticity in the MDP parameters [11]. To account for uncertainty, the agent must be ready for the worst case when the model is inaccurate [24], by formulating a worst-case optimization criterion [11]. For example, to verify system properties with linear temporal logic, Wolff et al. [25] generated a control policy to maximize the worst-case probability of satisfying the linear temporal logic based on their defined specifications to account for the uncertainties in the transition dynamics. ...

Explicit Explore, Exploit, or Escape (E4E4E^4): near-optimal safety-constrained reinforcement learning in polynomial time

Machine Learning

... As the complexity of these missions grows, ensuring the reliability and safety of the robots becomes paramount [5]. One critical aspect of ensuring the safety and reliability of ARMs is the timely and accurate detection of anomalies [6], which can arise from various sources such as hardware faults [7], software faults [8], environmental change [9], or unexpected interactions with other systems [10][11][12][13]. ...

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

Current Robotics Reports

... In the simplest variant, the behavior function b 1 is set as an identity function, i.e., ∀w, b 1 (w) = w, with B 1 = W. Portfolio weight vectors w simultaneously serve as both genotypes and phenotypes. Similar can be seen in some of the approaches used to tackle the Rastrigin function benchmark [29] through QD algorithms [30,31]. ...

Quality-Diversity Meta-Evolution: Customizing Behavior Spaces to a Meta-Objective
  • Citing Article
  • February 2022

IEEE Transactions on Evolutionary Computation

... In a variety of applications, end-users may be particularly interested in behaviour spaces that are custom-made to a particular meta-objective, which expresses desirable properties for the final archive, such as a high number of solutions, generalisation towards particular problems, ability to form meaningful behavioural sequences, etc. In this context, we propose the newly emerging framework of quality-diversity meta-evolution [22,23], or QD-Meta for short, to evolve a population of QD algorithms, each with their own behaviour space and optionally some other representational or algorithmic properties. The framework adapts a feature-map to define the behaviour space and through the use of a large database with the solutions generated so far, it allows to rapidly construct new archives based on the new behaviour space. ...

On the use of feature-maps for improved quality-diversity meta-evolution
  • Citing Conference Paper
  • July 2021

... The recent stream of open-ended agent/environment co-evolution works (e.g. [32,33,34]) was kickstarted by the POET [35,36] algorithm. The "UED" term itself originated in PAIRED [8], which uses the performance of an "antagonist" agent to define the curriculum for the main (protagonist) agent. ...

QED: Using Quality-Environment-Diversity to Evolve Resilient Robot Swarms
  • Citing Article
  • November 2020

IEEE Transactions on Evolutionary Computation