Andrew Stephen Mcgough

Andrew Stephen Mcgough
Durham University | DU · School of Engineering and Computing Sciences

About

121
Publications
23,510
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,229
Citations

Publications

Publications (121)
Preprint
Full-text available
A change from a high-carbon emitting electricity power system to one based on renewables would aid in the mitigation of climate change. Decarbonization of the electricity grid would allow for low-carbon heating, cooling and transport. Investments in renewable energy must be made over a long time horizon to max-imise return of investment of these lo...
Article
Full-text available
We present NUFEB (Newcastle University Frontiers in Engineering Biology), a flexible, efficient, and open source software for simulating the 3D dynamics of microbial communities. The tool is based on the Individual-based Modelling (IbM) approach, where microbes are represented as discrete units and their behaviour changes over time due to a variety...
Preprint
Full-text available
With the recent growth in the number of malicious activities on the internet, cybersecurity research has seen a boost in the past few years. However, as certain variants of malware can provide highly lucrative opportunities for bad actors, significant resources are dedicated to innovations and improvements by vast criminal organisations. Among thes...
Article
Full-text available
Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Unsupervised graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key s...
Preprint
Full-text available
Graphs have become a crucial way to represent large, complex and often temporal datasets across a wide range of scientific disciplines. However, when graphs are used as input to machine learning models, this rich temporal information is frequently disregarded during the learning process, resulting in suboptimal performance on certain temporal infer...
Preprint
Full-text available
Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classific...
Conference Paper
Full-text available
Due to the threat of climate change, a transition from a fossil-fuel based system to one based on zero-carbon is required. However, this is not as simple as instantaneously closing down all fossil fuel energy generation and replacing them with renewable sources -- careful decisions need to be taken to ensure rapid but stable progress. To aid decisi...
Conference Paper
Full-text available
Impacts on natural and human systems have already been observed due to anthropogenic greenhouse gas emissions [17]. To reduce these emissions, a transition to a low-carbon economy is required. Carbon taxes can be used as a tool for pricing in the negative externalities of pollution and enabling a more rapid transition to a low-carbon economy. This...
Preprint
Full-text available
We present NUFEB, a flexible, efficient, and open source software for simulating the 3D dynamics of microbial communities. The tool is based on the Individual-based Modelling (IbM) approach, where microbes are represented as discrete units and their behaviour changes over time due to a variety of processes. This approach allows us to study populati...
Conference Paper
Full-text available
Handling large corpuses of documents is of significant importance in many fields, no more so than in the areas of crime investigation and defence, where an organisation may be presented with a large volume of scanned documents which need to be processed in a finite time. However, this problem is exacerbated both by the volume, in terms of scanned d...
Preprint
Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict the time it will take to train a deep learning network to solve a given problem. This training time can be seen...
Preprint
State-space models (SSMs) provide a flexible framework for modelling time-series data. Consequently, SSMs are ubiquitously applied in areas such as engineering, econometrics and epidemiology. In this paper we provide a fast approach for approximate Bayesian inference in SSMs using the tools of deep learning and variational inference.
Preprint
Full-text available
Graphs are a commonly used construct for representing relationships between elements in complex high dimensional datasets. Many real-world phenomenon are dynamic in nature, meaning that any graph used to represent them is inherently temporal. However, many of the machine learning models designed to capture knowledge about the structure of these gra...
Conference Paper
Full-text available
Natural Language Inference (NLI) is a fundamental step towards natural language understanding. The task aims to detect whether a premise entails or contradicts a given hypothesis. NLI contributes to a wide range of natural language understanding applications such as question answering, text summarization and information extraction. Recently, the pu...
Preprint
High Throughput Computing (HTC) provides a convenient mechanism for running thousands of tasks. Many HTC systems exploit computers which are provisioned for other purposes by utilising their idle time - volunteer computing. This has great advantages as it gives access to vast quantities of computational power for little or no cost. The downside is...
Chapter
Full-text available
Clinical data is usually observed and recorded at irregular intervals and includes: evaluations, treatments, vital sign and lab test results. These provide an invaluable source of information to help diagnose and understand medical conditions. In this work, we introduce the largest patient records dataset in diabetes research: King Abdullah Interna...
Chapter
Full-text available
Dropout is a crucial regularization technique for the Recurrent Neural Network (RNN) models of Natural Language Inference (NLI). However, dropout has not been evaluated for the effectiveness at different layers and dropout rates in NLI models. In this paper, we propose a novel RNN model for NLI and empirically evaluate the effect of applying dropou...
Conference Paper
Full-text available
Dropout is a crucial regularization technique for the Recurrent Neural Network (RNN) models of Natural Language Inference (NLI). However, dropout have not been evaluated for the effectiveness at different layers and dropout rates in NLI models. In this paper, we propose a RNN model for NLI and empirically evaluate the effect of applying dropout at...
Preprint
Full-text available
Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural ele...
Conference Paper
Full-text available
In order to reliably generate electricity to meet the demands of the customer base, it is essential to match supply with demand. Short-term load forecasting is utilised in both real-time scheduling of electricity, and load-frequency control. This paper aims to improve the accuracy of load-forecasting by using machine learning techniques to predict...
Conference Paper
High Throughput Computing allows workloads of many thousands of tasks to be performed efficiently over many distributed resources and frees the user from the laborious process of managing task deployment, execution and result collection. However, in many cases the High Throughput Computing system is comprised from volunteer computational resources...
Article
Full-text available
Parameter inference for stochastic differential equations is challenging due to the presence of a latent diffusion process. Working with an Euler-Maruyama discretisation for the diffusion, we use variational inference to jointly learn the parameters and the diffusion paths. We use a standard mean-field variational approximation of the parameter pos...
Article
Full-text available
Accurate predictive modelling of the growth of microbial communities requires the credible representation of the interactions of biological, chemical and mechanical processes. However, although biological and chemical processes are represented in a number of Individual-based Models (IbMs) the interaction of growth and mechanics is limited. Converse...
Data
Growth of a biofilm without flow. If there is no flow field the shape is determined by the nutrient level. (WMV)
Data
Growth of a biofilm at a shear rate of 0.01 s-1. If the biofilm grows in a flow field, the shape is determined by both nutrient level and flow fields. (WMV)
Data
Biofilm growth, deformation, and detachment at shear rate of 0.2 s-1. The biofilm is initially grown for 4.6 days without flows and then shear flow is applied on the grown biofilm for another 4.6 days. It is seen that the detached clusters agglomerate with one another. (WMV)
Data
Supporting information (SI). (DOCX)
Conference Paper
Full-text available
Successful Cybersecurity depends on the processing of vast quantities of data from a diverse range of sources such as police reports, blogs, intelligence reports, security bulletins, and news sources. This results in large volumes of unstructured text data that is difficult to manage or investigate manually. In this paper we introduce a tool that s...
Conference Paper
When performing a trace-driven simulation of a High Throughput Computing system we are limited to the knowledge which should be available to the system at the current point within the simulation. However, the trace-log contains information we would not be privy to during the simulation. Through the use of Machine Learning we can extract the latent...
Conference Paper
The problem of how to compare empirical graphs is an area of great interest within the field of network science. The ability to accurately but efficiently compare graphs has a significant impact in such areas as temporal graph evolution, anomaly detection and protein comparison. The comparison problem is compounded when working with massive graphs...
Conference Paper
Existing approaches to energy management of large scale distributed systems are ill-equipped to handle the challenges introduced by the dynamic and self-adaptive nature of the Internet of Things. In this position paper we motivate the need for energy-aware modelling and simulation approaches for IoT infrastructures, to facilitate what-if analyses,...
Conference Paper
In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of la- belled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created u...
Article
Full-text available
The insider threat problem is a significant and ever present issue faced by any organisation. While security mechanisms can be put in place to reduce the chances of external agents gaining access to a system, either to steal assets or alter records, the issue is more complex in tackling insider threat. If an employee already has legitimate access r...
Conference Paper
Full-text available
In this paper, we present the concept of "Ben-ware" as a beneficial software system capable of identifying anomalous human behaviour within a 'closed' organisation's IT infrastructure. We note that this behaviour may be malicious (for example, an employee is seeking to act against the best interest of the organisation by stealing confidential infor...
Chapter
Full-text available
Recent technological advances in modern health-care have lead to the ability to collect a vast wealth of patient monitoring data. This data can be utilised for patient diagnosis but it also holds the potential for use within medical research. However, these datasets often contain errors which limit their value to medical research, with one study fi...
Article
Full-text available
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware or software failures as well as interruptions from resource owners and more important tasks. Until recently many researchers have focused on th...
Conference Paper
High Throughput Computing (HTC) is a powerful paradigm allowing vast quantities of independent work to be performed simultaneously. However, until recently little evaluation has been performed on the energy impact of HTC. Many organisations now seek to minimise energy consumption across their IT infrastructure though it is unclear how this will aff...
Conference Paper
A host uses servers hired from a Cloud in order to offer certain services to paying customers. It must decide dynamically when and how many servers to hire, and when to release them, so as to minimize both the job holding costs and the server costs. Under certain assumptions, the problem can be formulated in terms of a semi-Markov decision process...
Article
Volunteer computing systems provide an easy mechanism for users who wish to perform large amounts of High Throughput Computing work. However, if the Volunteer Computing system is deployed over a shared set of computers where interactive users can seize back control of the computers this can lead to wasted computational effort and hence wasted energ...
Article
The Cloud provides impartial access to computer services on a pay-per-use basis, a fact that has encouraged many researchers to adopt the Cloud for the processing of large computational jobs and data storage. It has been used in the past for single research endeavours or as a mechanism for coping with excessive load on conventional computational re...
Conference Paper
Full-text available
High Throughput Computing (HTC) is a powerful paradigm allowing vast quantities of independent work to be performed simultaneously. However, until recently little evaluation has been performed on the energy impact of HTC. Many organisations now seek to minimise energy consumption across their IT infrastructure though it is unclear how this will aff...
Article
Full-text available
Urban flood risk modelling is a highly topical example of intensive computational processing. Such processing is increasingly required by a range of organisations including local government, engineering consultancies and the insurance industry to fulfil statutory requirements and provide professional services. As the demands for this type of work b...
Conference Paper
Full-text available
As our expectations of what computer systems can do and our ability to capture data improves, the desire to perform ever more computationally intensive tasks increases. Often these tasks, comprising vast numbers of repeated computations, are highly interdependent on each other - a closely coupled problem. The process of Landscape-Evolution Modellin...
Conference Paper
Exploiting computational resources within an organisation for more than their primary task offers great benefits - making better use of capital expenditure and provides a pool of computational power. This can be achieved through the deployment of a cycle stealing distributed system, where tasks execute during the idle time on computers. However, if...
Conference Paper
The Cloud provides highly democratic access to computer services on a pay-per-use basis. A fact that has encouraged many researchers to adopt the Cloud for the processing of large computational tasks and data storage. This has been used in the past for single research endeavours or as mechanism for coping with excessive load on conventional computa...
Conference Paper
Reduction of power consumption for any computer system is now an important issue, although this should be done in a manner that is not detrimental to the user base. We present a number of policies that can be applied to multi-use clusters where computers are shared between interactive users and high-throughput computing. We evaluate policies by tra...
Article
Existing Landscape-Evolution Models (LEMs) have tended to be applied at relatively coarse spatial resolution and over comparatively short timescales (years-centuries). Extending these models to encompass landscape evolution at the scale of, for example, an entire river basin and over important landscape-forming timescales (i.e. tens of thousands of...
Article
Full-text available
This document specifies the semantics and structure of the Job Submission Description Language (JSDL). JSDL is used to describe the requirements of computational jobs for submission to resources, particularly in Grid environments, though not restricted to the latter. The document includes the normative XML Schema for the JSDL, along with examples o...
Article
Full-text available
Imperial College e-Science Networked Infrastructure (ICENI) is an end-to-end Grid middleware developed to allow transparent usage of Grid. It consists of both a service-oriented Grid middleware and an application toolkit, using a component-programming model to represent Grid applications. We utilise this infrastructure in the the e-Protein project,...
Article
There is a growing tension within large organisations such as universities between the desire to perform vast amounts of computational processing and the desire to reduce power consumption by switching off computers. This situation will only worsen as computational problems get larger and the desire to save energy escalates. Through careful managem...
Conference Paper
The Condor matchmaker provides a powerful mechanism for optimally matching between user task and resource provider requirements, making the Condor system a good choice for use as a meta-scheduler within the Grid. Integrating Condor within a wider Grid context is possible but only through modification to the Condor source code as each new mechanism...
Article
Full-text available
In this paper we describe the architecture and operation of the Real Time Monitor (RTM), developed by the Grid team in the HEP group at Imperial College London. This is arguably the most popular dissemination tool within the EGEE [1] Grid. Having been used, on many occasions including GridFest and LHC inauguration events held at CERN in October 200...
Chapter
Full-text available
Computing Grids are hardware and software infrastructures that support secure sharing and concurrent access to distributed services by a large number of competing users from different virtual organizations. Concurrency can easily lead to overload and resource shortcomings in large-scale Grid infrastructures, as today’s Grids do not offer differenti...
Article
Full-text available
Initial work in the Grid ENabled Integrated Earth system model (GENIE) project involved a series of parameter sweep experiments using a Grid infrastructure consisting of a ock ed Condor pool, a web service oriented data management system and a web portal. In this paper we introduce the Imperial College E-Science Networked Infrastructure (ICENI) Gri...
Article
The Grid holds great potential for users, software developers and resource owners. Users of the Grid are abstracted away from its complexity, while software developers are being provided with a rich middleware in which to develop their applications without the need for a large investment in resources. Resource owners are able to expose their resour...
Article
Grid computing infrastructures embody a cost-effective computing paradigm that virtualises heterogeneous system resources to meet the dynamic needs of critical business and scientific applications. These applications range from batch processes and long-running tasks to real-time and even transactional applications. Grid computing environments are i...
Article
Full-text available
As the main computing paradigm for resource-intensive scientific applications, Grid(1) enables resource sharing and dynamic allocation of computational resources. Large-scale grids are normally composed of huge numbers of components from different sites, which increases the requirements of workflows and quality of service (QoS) upon these workflows...
Article
As the main computing paradigm for resource-intensive scientific applications, Grid[1] enables resource sharing and dynamic allocation of computational resources. Large-scale grids are normally composed of huge numbers of components from different sites, which increases the requirements of workflows and quality of service (QoS) upon these workflows...
Article
intensive scientific applications, Grid[1] enables resource sharing and dynamic allocation of computational resources. It promotes access to distributed data, operational flexibility and collabora- tion, and allows service providers to be distributed both concep- tually and physically to meeting different requirements. Large- scale grids are normal...
Article
Full-text available
The Grid is a concept which allows for the sharing of resources between a distributed community allowing each to progress towards potentially different goals. As adoption of the Grid increases so are the activities that people wish to conduct through it. The GRIDCC project is a European Union funded project addressing the issues of integrating inst...
Article
Full-text available
The over-arching aim of Grid computing is to move computational resources from individual institutions where they can only be used for in-house work, to a more open vision of vast online ubiquitous `virtual computational' resources which support individuals and collaborative projects. A major step towards realizing this vision is the provision of i...
Conference Paper
Grid Infrastructures are inherently dynamic and unpredictable environments where resource management and scheduling play an important part in ensuring that Grid applications execute while satisfying defined cost and performance constraints. Traditional Grid Scheduling approaches have focused on the scheduling and optimisation of single applications...
Chapter
Full-text available
Performing large-scale science is becoming increasingly complex. Scientists have resorted to the use of computing tools to enable and automate their experimental process. As acceptance of the technology grows, it will become commonplace that computational experiments will involve larger data sets, more computational resources, and scientists (often...
Conference Paper
Full-text available
Grid computing infrastructures embody a cost-effective computing paradigm that virtualises heterogeneous system resources to meet the dynamic needs of critical business and scientific applications. These applications range from batch processes and long-running tasks to more real-time and even transactional applications. Grid schedulers aim to make...
Conference Paper
The grid can be seen as a collection of services each of which performs some functionality. Users of the grid seek to use combinations of these services to perform the overall task they need to achieve. In general this can be seen as a set of services with a workflow document describing how these services should be combined. The user may also have...
Conference Paper
Full-text available
Many Grid architectures have been developed in recent years. These range from the large community Grids such as LHG and EGEE to single site deployments such as Condor. However, these Grid architectures have tended to focus on the single or batch submission of executable jobs. Application scientists are now seeking to manage and use physical instrum...