
David LillisUniversity College Dublin | UCD · School of Computer Science
David Lillis
BA, HDip, MSc, PhD
About
81
Publications
27,370
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
633
Citations
Citations since 2017
Introduction
Additional affiliations
January 2014 - present
Publications
Publications (81)
Recent developments in the field of data fusion have seen a focus on techniques that use training queries to estimate the probability that various documents are relevant to a given query and use that information to assign scores to those documents on which they are subsequently ranked. This paper introduces SlideFuse, which builds on these techniqu...
Modern computing systems require powerful software frameworks to ease their development and manage their complexity. These issues are addressed within both Component-Based Software Engineering and Agent-Oriented Software Engineering, although few integrated solutions exist. This paper discusses a novel integration strategy, which builds upon both p...
Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become
an extremely important area as the amount of electronically available information increases dramatically. There are numerous
methods of performing the IR task both by utilising different techniques and through using different re...
peer-reviewed Modularising requirements is a classic problem of software engineering; concerns often overlap, requiring multiple dimensions of decomposition to achieve separation. Whenever complete modularity is unachievable, it is important to provide principled approaches to the decoupling of concerns. To this end, this paper discusses the Social...
The contextual word embedding model, BERT, has proved its ability on downstream tasks with limited quantities of annotated data. BERT and its variants help to reduce the burden of complex annotation work in many interdisciplinary research areas, for example, legal argument mining in digital humanities. Argument mining aims to develop text analysis...
Existing Zero-Shot Learning (ZSL) techniques for text classification typically assign a label to a piece of text by building a matching model to capture the semantic similarity between the text and the label descriptor. This is expensive at inference time as it requires the text paired with every label to be passed forward through the matching mode...
The contextual word embedding model, BERT, has proved its ability on downstream tasks with limited quantities of annotated data. BERT and its variants help to reduce the burden of complex annotation work in many interdisciplinary research areas, for example, legal argument mining in digital humanities. Argument mining aims to develop text analysis...
The growing research field of argumentation mining (AM) in the past ten years has made it a popular topic in Natural Language Processing. However, there are still limited studies focusing on AM in the context of legal text (Legal AM), despite the fact that legal text analysis more generally has received much attention as an interdisciplinary field...
In recent years, the task of mining important information from social media posts during crises has become a focus of research for the purposes of assisting emergency response (ES). The TREC Incident Streams (IS) track is a research challenge organised for this purpose. The track asks participating systems to both classify a stream of crisis-relate...
Social media has enabled people to circulate information in a timely fashion, thus motivating people to post messages seeking help during crisis situations. These messages can contribute to the situational awareness of emergency responders, who have a need for them to be categorised according to information types (i.e. the type of aid services the...
User-generated content (UGC) on social media can act as a key source of information for emergency responders in crisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using anno...
The Incident streams (IS) track is a research challenge aimed at finding important information from social media during crises for emergency response purposes. More specifically, given a stream of crisis-related tweets, the IS challenge asks a participating system to 1) classify what the types of users' concerns or needs are expressed in each tweet...
Social media has enabled people to circulate information in a timely fashion, thus motivating people to post messages seeking help during crisis situations. These messages can contribute to the situational awareness of emergency responders, who have a need for them to be categorised according to information types (i.e. the type of aid services the...
User-generated content (UGC) on social media can act as a key source of information for emergency responders in crisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using anno...
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic...
Maintaining traceability links of software systems is a crucial task for software management and development. Unfortunately, dealing with traceability links are typically taken as afterthought due to time pressure. Some studies attempt to use information retrieval-based methods to automate this task, but they only concentrate on calculating the tex...
In this paper, we describe our approach in the shared task: COVID-19 event extraction from Twitter. The objective of this task is to extract answers from COVID-related tweets to a set of predefined slot-filling questions. Our approach treats the event extraction task as a question answering task by leveraging the transformer-based T5 text-to-text m...
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic...
This paper presents University College Dublin's (UCD) work at TREC 2019-B Incident Streams (IS) track. The purpose of the IS track is to find actionable messages and estimate their priority among a stream of crisis-related tweets. Based on the track's requirements, we break down the task into two sub-tasks. One is defined as a multi-label classific...
This paper describes an agent programming language agnostic implementation of the Multi-Agent MicroServices (MAMS) model - an approach to integrating agents within microservices-based architectures. In this model, agents, deployed within microservices, expose aspects of their state as virtual resources that are externally accessible using REpresent...
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 yea...
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 yea...
This paper explores the intersection between microservices and Multi-Agent Systems (MAS), introducing the notion of a new approach to building MAS known as Multi-Agent MicroServices (MAMS). Our approach is illustrated through a worked example of a Vickrey Auction implemented as a microservice.
Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seize...
In today's world, closed circuit television, cellphone photographs and videos, open-source intelligence (i.e., social media and web data mining), and other sources of photographic evidence are commonly used by police forces to identify suspects and victims of both online and offline crimes. Human characteristics such as age, height, weight, gender,...
Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the "Data Fusion" task, and has great promise in dealing with the vast quantity of unstructured textual data that is a feature of many Big Data scenarios. However, no universally-accepte...
Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the "Data Fusion" task, and has great promise in dealing with the vast quantity of unstructured textual data that is a feature of many Big Data scenarios. However, no universally-accepte...
Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in po...
Since their inception, Multi Agent Systems (MASs) have been championed as a solution for the increasing problem of software complexity. Communities of distributed autonomous computing entities that are capable of collaborating, negotiating and acting to solve complex organisational and system management problems are an attractive proposition. Centr...
Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of " known-illegal " files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the sei...
Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on behalf of the educator to ens...
In this paper we describe our efforts for the TREC OpenSearch task. Our goal for this year is to evaluate the effectiveness of: (1) a ranking method using information crawled from an authoritative search engine; (2) search rank- ing based on clickthrough data taken from user feedback; and (3) a unified modeling method that combines knowledge from t...
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being enc...
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being enc...
Many jurisdictions suffer from lengthy evidence processing backlogs in digital forensics investigations. This has negative consequences for the timely incorporation of digital evidence into criminal investigations, while also affecting the timelines required to bring a case to court. Modern technological advances, in particular the move towards clo...
Many jurisdictions suffer from lengthy evidence processing backlogs in digital forensics investigations. This has negative consequences for the timely incorporation of digital evidence into criminal investigations, while also affecting the timelines required to bring a case to court. Modern technological advances, in particular the move towards clo...
Agent-Oriented Programming (AOP) researchers have successfully developed a range of agent programming languages that bridge the gap between theory and practice. Unfortunately, despite the in-community success of these languages, they have proven less compelling to the wider software engineering community. One of the main problems facing AOP languag...
Agent-Oriented Programming (AOP) researchers have successfully developed a range of agent programming languages that bridge the gap between theory and practice. Unfortunately, despite the in-community success of these languages, they have proven less compelling to the wider software engineering community. One of the main problems facing AOP languag...
Autonomically managing energy within the home is a formidable challenge as any solution needs to interoperate with a decidedly heterogeneous network of sensors and appliances, not just in terms of technologies and protocols but also by managing smart as well as “dumb” appliances. Furthermore, as studies have shown that simply providing energy usage...
To accommodate the proliferation of heterogeneous network models and protocols, the use of semantic technologies to enable an abstract treatment of networks is proposed. Network adapters are employed to lift network specific data into a semantic representation. Semantic reasoning integrates the disparate network models and protocols into a common d...
Wireless Sensor Networks (WSNs) are comprised of thousands of nodes that are embedded with limited energy resources. Clustering is a well-known technique that can be used to extend the lifetime of such a network. However, user adaption is one criterion that is not taken into account by current clustering algorithms. Here, the term "user" refers to...
The issue of situating intelligent agents within an environment, either virtual or physical, is an important research question in the area of Multi Agent Systems. In addition, the deployment of agents within Wireless Sensor Networks has received some focus also.
This paper proposes an architecture to augment the reasoning capabilities of agents wit...
With the increasing availability of sensors within smartphones and within the world at large, a question arises about how this sensor data can be leveraged by Augmented Reality (AR) devices. AR devices have traditionally been limited by the capability of a given device's unique set of sensors. Connecting sensors from multiple devices using a Sensor...
We increasingly live in a world where sensors have become truly ubiquitous in nature. Many of these sensors are an integral part of devices such as smartphones, which contain sufficient sensors to allow for their use as Augmented Reality (AR) devices. This AR experience is limited by the precision and functionality of an individual device's sensors...
Pervasive sensing is characterized by heterogeneity across a number of dimensions. This raises significant problems for those designing, implementing and deploying sensor networks, irrespective of application domain. Such problems include for example, issues of data provenance and integrity, security, and privacy amongst others. Thus engineering a...
With the vast number of sensors on current and future mobile computing devices, as well as within our environment, a revolution in HCI is taking place. Devices with multiple sensors enable navigation applications, location-based searches, touch-based interfaces with haptic feedback and the promise of Augmented Reality, with devices such as Google G...
Since their inception, Multi Agent Systems (MASs) have been championed as a solution for the increasing problem of software complexity. Communities of distributed autonomous computing entities that are capable of collaborating, negotiating and acting to solve complex organisational and system management problems are an attractive proposition. Centr...
The Agent Conversation Reasoning Engine (ACRE) is intended to aid agent developers to improve the management and reliability of agent communication. To evaluate its effectiveness, a problem scenario was created that could be used to compare code written with and without the use of ACRE by groups of test subjects.
This paper describes the requiremen...
The Supervised Machine Learning task of classification has parallels with Information Retrieval (IR): in each case, items
(documents in the case of IR) are required to be categorised into discrete classes (relevant or non-relevant). Thus a parallel
can also be drawn between classifier ensembles, where evidence from multiple classifiers are combined...
Within Multi Agent Systems, communication by means of Agent Communication Languages (ACLs) has a key role to play in the co-operation,
co-ordination and knowledge-sharing between agents. Despite this, complex reasoning about agent messaging, and specifically
about conversations between agents, tends not to have widespread support amongst general-pu...
This is the second year in which a team from University College Dublin has participated in the Multi Agent Contest (http://www.multiagentcontest.org/2009). This paper describes the system that was created to participate in the contest, along with observations of the team’s experiences
in the contest. The system itself was built using the AFAPL agen...
The SIFT (Segmented Information Fusion Techniques) group in UCD is dedicated to researching Data Fusion in Information Retrieval. This area of research involves the merging of multiple sets of results into a single result set that is presented to the user. As a means of both evaluat- ing the effectiveness of this work and comparing it against other...
Data Fusion is the combination of a number of independent search results, relating to the same document collection, into a single result to be presented to the user. A number of probabilistic data fusion models have been shown to be effective in empirical studies. These typically attempt to estimate the probability that particular documents will be...
Within Multi Agent Systems, communication by means of Agent Communication Languages has a key role to play in the co-operation, co-ordination and knowledge-sharing between agents. Despite this, complex reasoning about agent messaging and specifically about conversations between agents, tends not to have widespread support amongst general-purpose ag...
The SIFT (Segmented Information Fusion Techniques) group in UCD is dedicated to researching Data Fusion in Information Retrieval. This area of research involves the merging of multiple sets of results into a single result set that is presented to the user. As a means of both evaluat-ing the effectiveness of this work and comparing it against other...
The design, implementation and testing of Multi Agent Systems is typically a very complex task. While a number of specialist
agent programming languages and toolkits have been created to aid in the development of such systems, the provision of associated
development tools still lags behind those available for other programming paradigms. This inclu...