Stephen Hailes’s research while affiliated with University College London and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (181)


Figure 1: Diagrammatic representation of FSNID.
Figure 4: Temporal complexity of FSNID and comparators with respect to the number of features.
Figure 5: In this figure we compare three neural architectures ability to incorporate temporal dependencies into the feature selection and subsequent classification tasks.
Feature Selection for Network Intrusion Detection
  • Preprint
  • File available

November 2024

·

10 Reads

Charles Westphal

·

Stephen Hailes

·

Network Intrusion Detection (NID) remains a key area of research within the information security community, while also being relevant to Machine Learning (ML) practitioners. The latter generally aim to detect attacks using network features, which have been extracted from raw network data typically using dimensionality reduction methods, such as principal component analysis (PCA). However, PCA is not able to assess the relevance of features for the task at hand. Consequently, the features available are of varying quality, with some being entirely non-informative. From this, two major drawbacks arise. Firstly, trained and deployed models have to process large amounts of unnecessary data, therefore draining potentially costly resources. Secondly, the noise caused by the presence of irrelevant features can, in some cases, impede a model's ability to detect an attack. In order to deal with these challenges, we present Feature Selection for Network Intrusion Detection (FSNID) a novel information-theoretic method that facilitates the exclusion of non-informative features when detecting network intrusions. The proposed method is based on function approximation using a neural network, which enables a version of our approach that incorporates a recurrent layer. Consequently, this version uniquely enables the integration of temporal dependencies. Through an extensive set of experiments, we demonstrate that the proposed method selects a significantly reduced feature set, while maintaining NID performance. Code will be made available upon publication.

Download

Mutual Information Preserving Neural Network Pruning

October 2024

·

9 Reads

Model pruning is attracting increasing interest because of its positive implications in terms of resource consumption and costs. A variety of methods have been developed in the past years. In particular, structured pruning techniques discern the importance of nodes in neural networks (NNs) and filters in convolutional neural networks (CNNs). Global versions of these rank all nodes in a network and select the top-k, offering an advantage over local methods that rank nodes only within individual layers. By evaluating all nodes simultaneously, global techniques provide greater control over the network architecture, which improves performance. However, the ranking and selecting process carried out during global pruning can have several major drawbacks. First, the ranking is not updated in real time based on the pruning already performed, making it unable to account for inter-node interactions. Second, it is not uncommon for whole layers to be removed from a model, which leads to untrainable networks. Lastly, global pruning methods do not offer any guarantees regarding re-training. In order to address these issues, we introduce Mutual Information Preserving Pruning (MIPP). The fundamental principle of our method is to select nodes such that the mutual information (MI) between the activations of adjacent layers is maintained. We evaluate MIPP on an array of vision models and datasets, including a pre-trained ResNet50 on ImageNet, where we demonstrate MIPP's ability to outperform state-of-the-art methods. The implementation of MIPP will be made available upon publication.


Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents

October 2024

·

2 Reads

Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.


Figure 8: Versions of the IPD test-time prompt used in additional analyses. At test time, as reported in Section 5 in the paper, we use new symbols for the actions in each game: action3 and action4 (prompt a in the Figure). We also run additional test-time evaluations with a prompt using the original action tokens but varying the order of presentation of the payoffs (b), or reversing the meaning of the original action tokens (c).
Figure 12: Action types displayed during fine-tuning on the Iterated Prisoner's Dilemma (IPD) game against four fixed-strategy opponents and an LLM opponent. For each episode, we plot the actions of the LLM player M given the last move of their opponent O.
Figure 13: Analysis of generalization of the fine-tuned agents' learned morality to other matrix game environments. We present results for models fine-tuned against an LLM opponent, to complement the results for fine-tuning versus a TFT opponent presented in the main paper (Figure 4).
Figure 17: Analysis of generalization of the fine-tuned agents' learned morality to other matrix game environments, with the meaning of action tokens in the prompt as in the original training procedure (here, action1=Cooperate, action2=Defect) but payoff matrix presented in a different order within the prompt (i.e., prompt b in Figure 8).
Moral Alignment for LLM Agents

October 2024

·

31 Reads

Decision-making agents based on pre-trained Large Language Models (LLMs) are increasingly being deployed across various domains of human activity. While their applications are currently rather specialized, several research efforts are under way to develop more generalist agents. As LLM-based systems become more agentic, their influence on human activity will grow and the transparency of this will decrease. Consequently, developing effective methods for aligning them to human values is vital. The prevailing practice in alignment often relies on human preference data (e.g., in RLHF or DPO), in which values are implicit and are essentially deduced from relative preferences over different model outputs. In this work, instead of relying on human feedback, we introduce the design of reward functions that explicitly encode core human values for Reinforcement Learning-based fine-tuning of foundation agent models. Specifically, we use intrinsic rewards for the moral alignment of LLM agents. We evaluate our approach using the traditional philosophical frameworks of Deontological Ethics and Utilitarianism, quantifying moral rewards for agents in terms of actions and consequences on the Iterated Prisoner's Dilemma (IPD) environment. We also show how moral fine-tuning can be deployed to enable an agent to unlearn a previously developed selfish strategy. Finally, we find that certain moral strategies learned on the IPD game generalize to several other matrix game environments. In summary, we demonstrate that fine-tuning with intrinsic rewards is a promising general solution for aligning LLM agents to human values, and it might represent a more transparent and cost-effective alternative to currently predominant alignment techniques.


Figure 1: PIDF at a glance. The diagram shows how the information-theoretic interactions (left) between the FOI, FNOI, and target are converted to an intepretable representation by means of a bar graph (right) using PIDF.
Figure 2: Comparison of feature importance indicators using synthetic datasets for the analysis of RVQ, SVQ and MSQ.
Figure 11: Feature duplication experiments. We apply PIDF to a modified version of the California housing, Abalone, and Whitewine datasets. These modified datasets are obtained by adding a duplicate feature to the original ones. The duplicate features we used in our experiments are longitude, diameter and density, respectively.
Partial Information Decomposition for Data Interpretability and Feature Selection

May 2024

·

3 Reads

In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.


Large Language Models are Effective Priors for Causal Graph Discovery

May 2024

·

10 Reads

Causal structure discovery from observations can be improved by integrating background knowledge provided by an expert to reduce the hypothesis space. Recently, Large Language Models (LLMs) have begun to be considered as sources of prior information given the low cost of querying them relative to a human expert. In this work, firstly, we propose a set of metrics for assessing LLM judgments for causal graph discovery independently of the downstream algorithm. Secondly, we systematically study a set of prompting designs that allows the model to specify priors about the structure of the causal graph. Finally, we present a general methodology for the integration of LLM priors in graph discovery algorithms, finding that they help improve performance on common-sense benchmarks and especially when used for assessing edge directionality. Our work highlights the potential as well as the shortcomings of the use of LLMs in this problem space.



Exploring the Security Culture of Operational Technology (OT) Organisations: the Role of External Consultancy in Overcoming Organisational Barriers

August 2023

·

12 Reads

·

6 Citations

We have conducted 33 interviews with professionals with a security related role working in various OT sectors in the UK, on the subject of security culture development. As such, our work has identified three key organisational barriers to the development of a security culture: governance structures, lack of communication between functions, and the lack of OT cybersecurity expertise. Subsequently, we emphasize the role of consultants and security solution vendors in overcoming these barriers through consultancy.


Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

August 2023

·

19 Reads

·

11 Citations

Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner's Dilemma, Volunteer's Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.


Map of the study area at the edge of the City of Cape Town, South Africa (inset). The map is colored by habitat type and shows the different land uses. The fence, designed to keep the baboons from entering vineyard farming areas, is indicated in black; the solid line represents the part of the fence which was live across the full study period, while the dashed line represents parts of the fence that were newly erected (South) in year two. The fences surrounding Farm C and the northern edge of Farm B were not operational. They were therefore not represented here. The solid gray lines represent the delimitation of each land property.
Change in baboon management strategies (maps (a)–(d)) and baboons' responses (e) and (f). In (a) and (b) areas where baboons are likely to be deterred (higher space restriction scores) are represented by “warmer” colors, and areas where baboons are likely to be left alone are represented by “colder” colors. In (c) and (d) rangers tend to agree on a common strategy in yellow or disagree in red. (e) and (f) shows troop space use defined with kernel density estimates. The colors indicate the kernel volume, with darker areas representing more intensely used space (core areas) and lighter areas representing less used space (95% contours). Contours represent habitat features with vertical hashes representing trees, horizontal hashes representing meadows, and light gray polygons indicating urban areas. The dark gray polygon is an identified urban foraging hotspot. The red line indicates the location of the baboon fence and the orange line the buffer zone around urban areas before recommendations (e) and after (f).
Baboon foraging behavior. The distribution of foraging location for adult males (different colored line for each male) are given in year one (n = 6, (a)), and year two (n = 7, (b)). The ranging pattern of the troop is represented as a gray polygon. Distribution is represented as kernel densities using a bandwidth of 50 m.
Time budget of adult baboons in the troop (bars) or adult males (boxplots) in year one (light gray) and year two (dark gray). Time budgets are given as the percentage of behavioral observations made during scan (troop) or focal sampling (males, n = 10 in year one, and n = 8 in year two).
Using behavioral studies to adapt management decisions and reduce negative interactions between humans and baboons in Cape Town, South Africa

May 2023

·

218 Reads

·

1 Citation

Gaelle Fehlmann

·

·

·

[...]

·

Abstract Understanding the behavioral ecology of wildlife that experiences negative interactions with humans and the outcome of any wildlife management intervention is essential. In the Cape Peninsula, South Africa, chacma baboons (Papio ursinus) search for anthropogenic food sources in both urban and agricultural areas. In response, the city of Cape Town and private farmers employ “rangers” to keep baboons within the Table Mountain National Park. In this study, we investigated the success of rangers' intervention in keeping baboons in their natural habitat. Based on our findings in year one, we recommended adjustments to the rangers' management strategy in year two. We recommended improved consensus of actions toward baboons (that is, when/where to herd them), and the construction of a baboon‐proof fence around one of the farms that provided a corridor to urban areas. During the 2 months following recommendations, these interventions combined resulted in a significant reduction in the time baboons spent in both urban and agricultural land. Our case study illustrates the importance of integrating research findings into ongoing management actions to improve both human livelihoods and baboon conservation through an adaptive management framework. We expect similar approaches to be beneficial in a wide range of species and contexts.


Citations (65)


... Certain types of information, often labeled as 'top secret' or 'secret', carry higher levels of sensitivity and are strictly controlled [63]. Industry perspectives on the handling of this data have been partially studied, focusing on various stakeholders such as IT professionals, developers, and end-users [15,22,45]. Although the risks of information leakage are well-understood, particularly in the end-users or AI practitioners, there is a lack of research on groups such as professionals. ...

Reference:

"I Always Felt that Something Was Wrong.": Understanding Compliance Risks and Mitigation Strategies when Professionals Use Large Language Models
Exploring the Security Culture of Operational Technology (OT) Organisations: the Role of External Consultancy in Overcoming Organisational Barriers
  • Citing Conference Paper
  • August 2023

... These in silico simulations can be seen as a starting point to develop more concrete hypotheses from real-world patterns that can then be tested in laboratory studies. An example is decision-making, which can be studied using simulated agents as a starting point for experiments involving humans [97,98]. The recent advent of large language models/foundational models [99] is introducing new opportunities in terms of realistic simulation of human behaviour. ...

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
  • Citing Conference Paper
  • August 2023

... This effect may have then been further strengthened by maintaining a small number of active beehives within the study site. This conditioned fear response has been previously studied on an array of species including rats (Rattus norvegicus [24]), rabbits (Oryctolagus cuniculus [25]), and chacma baboons (Papio ursinus [26]). There is also some evidence from Kenya that elephants may show a conditioned fear response to empty beehives protecting crop fields following previous exposure to active beehives [27]. ...

Using behavioral studies to adapt management decisions and reduce negative interactions between humans and baboons in Cape Town, South Africa

... Recent work has also sought to formulate the graph exploration task explicitly as a Markov decision process, using domain-specific node features and novelty rewards [50,51]. GNNs, in combination with RL, have also been used to build and rewire graphs such that they possess high values of specific features of interest [52,53]. ...

Dynamic Network Reconfiguration for Entropy Maximization using Deep Reinforcement Learning
  • Citing Conference Paper
  • December 2022

... A mechanism for doing so is designing the simulation policy that is used to sample actions (i.e., graph edges in this context) outside of the search tree, which may yield substantially better solutions than the default uniform random sampling of actions. The simulation policy can generally be hand-engineered [16,10] or learned from interactions [15,35,1]. ...

Planning spatial networks with Monte Carlo tree search

... Thijssen and Kappen [46] propose adaptive Monte Carlo sampling that is accelerated using importance sampling. This approach has been successfully applied to the control of 10-20 autonomous helicopters (quadrotors) that are engaged in coordinated control tasks such as flying with minimal velocity in a restricted area without collision or a task where multiple 'cats' need to catch a mouse that tries to get away [18]. ...

Real-Time Stochastic Optimal Control for Multi-Agent Quadrotor Systems
  • Citing Article
  • March 2016

Proceedings of the International Conference on Automated Planning and Scheduling

... For example, graph algorithms like Dijkstra's for shortest paths have been successfully applied in logistics and transportation optimization [26]. Max flow algorithms have been used in network capacity optimization, and graph neural networks (GNNs) have shown promise in predictive modeling and recommendation systems [27]- [29]. However, a comprehensive framework that integrates these approaches for yield optimization across multiple industries is still lacking. ...

Graph Neural Modeling of Network Flows
  • Citing Preprint
  • September 2022

... We have tried to collect the relevant research papers on attacks and impact of these attacks over ICS between 2014 and 2023 for the analysis presented in this paper. Evripidou et al. 15 performed an SLR to review the security culture followed in organizations and analyzed the factors affecting the system's security. Scott Steele Buchanan, 16 examined several cyber-attacks on ICS including Stuxnet and discussed the evolution of cyber attacks in the domain of ICS. ...

Security Culture in Industrial Control Systems Organisations: A Literature Review

IFIP Advances in Information and Communication Technology

... [31] employed a deep reinforcement learning method to alter the network topology to improve the performance of communication network. [32] proposed a framework for improving the resilience of complex networks, by introducing the network construction Markov decision process. Inspired by the successes of RL in the study of complex networks, RL can be considered as a potential method to alter complex networks to achieve different synchronization patterns. ...

Goal-directed graph construction using reinforcement learning

... As these systems shift from isolated infrastructure to networked environments, they become susceptible to cyberattacks targeting various system components. [25] highlights that IoT integration expands the attack surface, allowing adversaries to infiltrate through multiple entry points, such as sensors, controllers, and communication networks. [14] gives the idea that in legacy sectors like urban water management, IoT integration expands the attack surface, creating multiple entry points for adversaries to exploit. ...

A Systematic Review of the State of Cyber-Security in Water Systems

Water