About
32
Publications
5,147
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
453
Citations
Introduction
My research interests lie at the intersection of scalable data systems and deep learning.
I have worked on the design and implementation of data-parallel processing systems that best utilise hardware accelerators with applications to deep learning and data streaming.
I have also worked on complex event processing for home network management; and routing algorithms for wireless sensor networks, among other topics.
Skills and Expertise
Education
October 2005 - December 2009
September 2004 - October 2005
Publications
Publications (32)
Artificial Intelligence (AI) assists recruiting and job searching. Such systems can be biased against certain characteristics. This results in potential misrepresentations and consequent inequalities related to people with mental health disorders. Hence occupational and mental health bias in existing Natural Language Processing (NLP) models used in...
This paper uses computational methods to simultaneously investigate the epistemological effects of misinformation on communities of rational agents, while also contributing to the philosophical debate on ‘higher-order’ evidence (i.e. evidence that bears on the quality and/or import of one’s evidence). Modelling communities as networks of individual...
In this paper, we situate our computational approach to philosophy relative to other digital humanities and computational social science practices, based on reflections stemming from our research on the PolyGraphs project in social epistemology. We begin by describing PolyGraphs. An interdisciplinary project funded by the Academies (BA, RS, and RAE...
The number of colour names varies across languages. To augment colour communication between speakers of different languages, we need a multilingual method to map how we perceive colours to the words we use to describe them. We evaluate the performance of a supervised colour naming model, Rotated Split Trees (RST), trained by responses from a crowds...
Speakers group colour stimuli into categories that are commonly referred to by a name (e.g. pink, peach, and pale green). While the number of colour names in wide cultural use may vary across different languages, it has been shown that most languages have a small set of basic colour terms (BCTs) that are typically shared by speakers. The cognitive...
There is a deluge of AI-assisted decision-making systems, where our data serve as proxy to our actions, suggested by AI. The closer we investigate our data (raw input, or their learned representations, or the suggested actions), we begin to discover “bugs”. Outside of their test, controlled environments, AI systems may encounter situations investig...
We explore synonyms in colour naming within and across three languages, British English, Estonian and Greek, using data collected from a crowdsourcing experiment. We identified 30 common lexical colour categories in British English, 41 in Estonian and 29 in Greek, where no one category was fully contained within others. The synonymy analysis within...
Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and large parameter count. In this work we demonstrate a set of modifications to the structure of a Transformer laye...
Window aggregation queries are a core part of streaming applications. To support window aggregation efficiently, stream processing engines face a trade-off between exploiting parallelism (at the instruction/multi-core levels) and incremental computation (across overlapping windows and queries). Existing engines implement ad-hoc aggregation and para...
Deep learning (DL) systems expose many tuning parameters ("hyper-parameters") that affect the performance and accuracy of trained models. Increasingly users struggle to configure hyper-parameters, and a substantial portion of time is spent tuning them empirically. We argue that future DL systems should be designed to help manage hyper-parameters. W...
Deep learning models are trained on servers with many GPUs, and training must scale with the number of GPUs. Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an upda...
Deep learning models are trained on servers with many GPUs, and training must scale with the number of GPUs. Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an upda...
The computation of sliding window aggregates is one of the core functionalities of stream processing systems. Presently, there are two classes of approaches to evaluating them. The first is non-incremental, i.e., every window is evaluated in isolation even if overlapping windows provide opportunities for work-sharing. While not algorithmically effi...
Modern servers have become heterogeneous, often combining multi-core CPUs with many-core GPGPUs. Such heterogeneous architectures have the potential to improve the performance of data-intensive stream processing applications, but they are not supported by current relational stream processing engines. For an engine to exploit a heterogeneous archite...
Heterogeneous architectures that combine multi-core CPUs with many-core GPGPUs have the potential to improve the performance of data-intensive stream processing applications. Yet, a stream processing engine must execute streaming SQL queries with sufficient data-parallelism to fully utilise the available heterogeneous processors, and decide how to...
In this paper, we study the problem of multi-resource fairness in systems with multiple users. Each user requires to run one or more complex jobs that consist of multiple interconnected tasks. A job is considered finished when all its corresponding tasks have been executed in the system. Tasks can have different resource requirements. Because of sp...
Home wireless networks are difficult to manage and comprehend because of evolving locality, co-locality, connectivity and interaction. We define formal models of home wireless network infrastructure and policies and investigate how they can be used in a network management system designed to provide user-oriented support. We model spatial and tempor...
There is increasing demand for complex event processing of ever-expanding volumes of data in an ever-growing number of application domains. Traditional complex event processing technologies, based upon either stream database management systems or publish/subscribe systems, are adept at handling many of these applications. However, a growing number...
There is increasing demand for complex event processing of ever-expanding volumes of data in an ever-growing number of application domains. Traditional complex event processing technologies, based upon either stream database management systems or publish/subscribe systems, are adept at handling many of these applications. However, a growing number...
This paper presents a user driven redesign of the domestic network infrastructure that draws upon a series of ethnographic studies of home networks. We present an infrastructure based around a purpose built access point that has modified the handling of protocols and services to reflect the interactive needs of the home. The developed infrastructur...
The challenge is solved using Glasgow automata, concise complex event processing engines executable in the context of a topic-based publish/subscribe cache of event streams and relations. The imperative programming style of the Glasgow Automaton Programming Language (GAPL) enables multiple, efficient realisations of the two challenge queries.
Wireless home networks are increasingly deployed in people's homes worldwide. Unfortunately, home networks have evolved using protocols designed for backbone and enterprise networks, which are quite different in scale and character to home networks. We believe this evolution is at the heart of widely observed problems experienced by users managing...
Two phases of the SICSA Multi-core Challenge have gone past. The first challenge was to produce concordances of books for sequences of words up to length N; and the second to simulate the motion of N celestial bodies under gravity. We took both challenges on the SCC, using C and the Linux Shell. This paper is an account of the experiences gained. I...
The Homework project has examined redesign of existing home network infrastructures to better support the needs and requirements of actual home users. Integrating results from several ethnographic studies, we have designed and built a home networking platform providing detailed per-flow measurement and management capabilities supporting several nov...
Home networks have evolved to become small-scale versions of enterprise networks. The tools for visualizing and managing such networks are primitive and continue to require networked systems expertise on the part of the home user. As a result, non-expert home users must manually manage non-obvious aspects of the network - e.g., MAC address filterin...
Despite several advantages inherent in mobile-agent- based approaches to network management as compared to traditional SNMP-based approaches, industry is reluctant to adopt the mobile agent paradigm as a replacement for the existing manager-agent model; the management community requires an evolutionary, rather than a revolutionary, use of mobile ag...
The ready availability of integrated circuits for sensing (MEMS), processing and wireless com-munication has resulted in burgeoning interest in the design, implementation, deployment, and operation of environmental sensor networks. The maintenance and control of such systems is essen-tial to ensure efficient use of resources for appropriate informa...
The routing problem (finding an optimal route from one point in a computer network to another) is surrounded by impossibility results. These results are usually expressed as lower and upper bounds on the set of nodes (or the set of links) of a network and represent the complexity of a solution to the routing problem (a routing function). The routin...