
David LillisUniversity College Dublin | UCD · School of Computer Science
David Lillis
BA, HDip, MSc, PhD
About
98
Publications
36,579
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
951
Citations
Introduction
Additional affiliations
Publications
Publications (98)
Recent developments in the field of data fusion have seen a focus on techniques that use training queries to estimate the probability that various documents are relevant to a given query and use that information to assign scores to those documents on which they are subsequently ranked. This paper introduces SlideFuse, which builds on these techniqu...
Modern computing systems require powerful software frameworks to ease their development and manage their complexity. These issues are addressed within both Component-Based Software Engineering and Agent-Oriented Software Engineering, although few integrated solutions exist. This paper discusses a novel integration strategy, which builds upon both p...
Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become
an extremely important area as the amount of electronically available information increases dramatically. There are numerous
methods of performing the IR task both by utilising different techniques and through using different re...
peer-reviewed Modularising requirements is a classic problem of software engineering; concerns often overlap, requiring multiple dimensions of decomposition to achieve separation. Whenever complete modularity is unachievable, it is important to provide principled approaches to the decoupling of concerns. To this end, this paper discusses the Social...
The recent advancements in Generative Artificial intelligence (GenAI) technology have been transformative for the field of education. Large Language Models (LLMs) such as ChatGPT and Bard can be leveraged to automate boilerplate tasks, create content for personalised teaching, and handle repetitive tasks to allow more time for creative thinking. Ho...
A survey conducted in 2023 surveyed 3,017 high school and college students. It found that almost one-third of them confessed to using ChatGPT for assistance with their homework. The rise of Large Language Models (LLMs) such as ChatGPT and Gemini has led to a surge in academic misconduct. Students can now complete their assignments and exams just by...
Microservices have been seen as a solution to open systems, within which microservices can behave arbitrarily. This requires the system to have strong trust management. However, existing microservices trust models cannot fully support open systems. In this paper, we propose a reputation-based trust model designed for open microservices that groups...
The rise of Large Language Models (LLMs) such as ChatGPT and Gemini has posed new challenges for the academic community. With the help of these models, students can easily complete their assignments and exams, while educators struggle to detect AI-generated content. This has led to a surge in academic misconduct, as students present work generated...
Microservices constitute the state of the art for implementing distributed systems and have been seen as a potential solution towards open systems. The characteristics of open systems require structured microservice management, including grouping microservices that are functionally similar. Microservices use RESTful APIs, often documented via OpenA...
Spatiotemporal object detection and activity recognition are essential components in the advancement of computer vision, with broad applications spanning surveillance, autonomous driving, and smart stores. This chapter offers a comprehensive overview of the techniques and applications associated with these concepts. Beginning with an introduction t...
This study presents the development and evaluation of machine learning models to predict winter wheat crop yield using heterogeneous soil and weather data sets. A concept of an error stabilisation stopping mechanism is introduced in an LSTM model specifically designed for heterogeneous datasets. The comparative analysis of this model against an LST...
Western countries rely heavily on wheat, and yield prediction is crucial. Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction. Existing literature reports that they perform better than traditional Machine Learning (ML) models. However, the existing LSTM cannot handle he...
Western countries rely heavily on wheat, and yield prediction is crucial. Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction. Existing literature reported that they perform better than traditional Machine Learning (ML) models. However, the existing LSTM cannot handle h...
Winter wheat is one of the most important crops in the United Kingdom, and crop yield prediction is essential for the nation's food security. Several studies have employed machine learning (ML) techniques to predict crop yield on a county or farm-based level. The main objective of this study is to predict winter wheat crop yield using ML models on...
The microservices architecture (MSA) is a form of distributed systems architecture that has been widely adopted in large-scale software systems in recent years. As with other distributed system architectures, one of the challenges that MSA faces is establishing trust between the microservices, particularly in the context of open systems. The bounda...
The contextual word embedding model, BERT, has proved its ability on downstream tasks with limited quantities of annotated data. BERT and its variants help to reduce the burden of complex annotation work in many interdisciplinary research areas, for example, legal argument mining in digital humanities. Argument mining aims to develop text analysis...
Existing Zero-Shot Learning (ZSL) techniques for text classification typically assign a label to a piece of text by building a matching model to capture the semantic similarity between the text and the label descriptor. This is expensive at inference time as it requires the text paired with every label to be passed forward through the matching mode...
The contextual word embedding model, BERT, has proved its ability on downstream tasks with limited quantities of annotated data. BERT and its variants help to reduce the burden of complex annotation work in many interdisciplinary research areas, for example, legal argument mining in digital humanities. Argument mining aims to develop text analysis...
The growing research field of argumentation mining (AM) in the past ten years has made it a popular topic in Natural Language Processing. However, there are still limited studies focusing on AM in the context of legal text (Legal AM), despite the fact that legal text analysis more generally has received much attention as an interdisciplinary field...
In recent years, the task of mining important information from social media posts during crises has become a focus of research for the purposes of assisting emergency response (ES). The TREC Incident Streams (IS) track is a research challenge organised for this purpose. The track asks participating systems to both classify a stream of crisis-relate...
Social media has enabled people to circulate information in a timely fashion, thus motivating people to post messages seeking help during crisis situations. These messages can contribute to the situational awareness of emergency responders, who have a need for them to be categorised according to information types (i.e. the type of aid services the...
User-generated content (UGC) on social media can act as a key source of information for emergency responders in crisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using anno...
The Incident streams (IS) track is a research challenge aimed at finding important information from social media during crises for emergency response purposes. More specifically, given a stream of crisis-related tweets, the IS challenge asks a participating system to 1) classify what the types of users' concerns or needs are expressed in each tweet...
Social media has enabled people to circulate information in a timely fashion, thus motivating people to post messages seeking help during crisis situations. These messages can contribute to the situational awareness of emergency responders, who have a need for them to be categorised according to information types (i.e. the type of aid services the...
User-generated content (UGC) on social media can act as a key source of information for emergency responders in crisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using anno...
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic...
Maintaining traceability links of software systems is a crucial task for software management and development. Unfortunately, dealing with traceability links are typically taken as afterthought due to time pressure. Some studies attempt to use information retrieval-based methods to automate this task, but they only concentrate on calculating the tex...
In this paper, we describe our approach in the shared task: COVID-19 event extraction from Twitter. The objective of this task is to extract answers from COVID-related tweets to a set of predefined slot-filling questions. Our approach treats the event extraction task as a question answering task by leveraging the transformer-based T5 text-to-text m...
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic...
This paper presents University College Dublin's (UCD) work at TREC 2019-B Incident Streams (IS) track. The purpose of the IS track is to find actionable messages and estimate their priority among a stream of crisis-related tweets. Based on the track's requirements, we break down the task into two sub-tasks. One is defined as a multi-label classific...
This paper describes an agent programming language agnostic implementation of the Multi-Agent MicroServices (MAMS) model - an approach to integrating agents within microservices-based architectures. In this model, agents, deployed within microservices, expose aspects of their state as virtual resources that are externally accessible using REpresent...
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 yea...
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 yea...
This paper explores the intersection between microservices and Multi-Agent Systems (MAS), introducing the notion of a new approach to building MAS known as Multi-Agent MicroServices (MAMS). Our approach is illustrated through a worked example of a Vickrey Auction implemented as a microservice.
Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seize...
In today's world, closed circuit television, cellphone photographs and videos, open-source intelligence (i.e., social media and web data mining), and other sources of photographic evidence are commonly used by police forces to identify suspects and victims of both online and offline crimes. Human characteristics such as age, height, weight, gender,...
Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the "Data Fusion" task, and has great promise in dealing with the vast quantity of unstructured textual data that is a feature of many Big Data scenarios. However, no universally-accepte...
Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the "Data Fusion" task, and has great promise in dealing with the vast quantity of unstructured textual data that is a feature of many Big Data scenarios. However, no universally-accepte...
Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in po...
Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in po...
Since their inception, Multi Agent Systems (MASs) have been championed as a solution for the increasing problem of software complexity. Communities of distributed autonomous computing entities that are capable of collaborating, negotiating and acting to solve complex organisational and system management problems are an attractive proposition. Centr...
Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of " known-illegal " files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the sei...
Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on the part of the educator to e...
Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on behalf of the educator to ens...
In this paper we describe our efforts for the TREC OpenSearch task. Our goal for this year is to evaluate the effectiveness of: (1) a ranking method using information crawled from an authoritative search engine; (2) search rank- ing based on clickthrough data taken from user feedback; and (3) a unified modeling method that combines knowledge from t...
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being enc...
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being enc...
Many jurisdictions suffer from lengthy evidence processing backlogs in digital forensics investigations. This has negative consequences for the timely incorporation of digital evidence into criminal investigations, while also affecting the timelines required to bring a case to court. Modern technological advances, in particular the move towards clo...
Many jurisdictions suffer from lengthy evidence processing backlogs in digital forensics investigations. This has negative consequences for the timely incorporation of digital evidence into criminal investigations, while also affecting the timelines required to bring a case to court. Modern technological advances, in particular the move towards clo...
Agent-Oriented Programming (AOP) researchers have successfully developed a range of agent programming languages that bridge the gap between theory and practice. Unfortunately, despite the in-community success of these languages, they have proven less compelling to the wider software engineering community. One of the main problems facing AOP languag...
Agent-Oriented Programming (AOP) researchers have successfully developed a range of agent programming languages that bridge the gap between theory and practice. Unfortunately, despite the in-community success of these languages, they have proven less compelling to the wider software engineering community. One of the main problems facing AOP languag...
Autonomically managing energy within the home is a formidable challenge as any solution needs to interoperate with a decidedly heterogeneous network of sensors and appliances, not just in terms of technologies and protocols but also by managing smart as well as “dumb” appliances. Furthermore, as studies have shown that simply providing energy usage...
To accommodate the proliferation of heterogeneous network models and protocols, the use of semantic technologies to enable an abstract treatment of networks is proposed. Network adapters are employed to lift network specific data into a semantic representation. Semantic reasoning integrates the disparate network models and protocols into a common d...
Wireless Sensor Networks (WSNs) are comprised of thousands of nodes that are embedded with limited energy resources. Clustering is a well-known technique that can be used to extend the lifetime of such a network. However, user adaption is one criterion that is not taken into account by current clustering algorithms. Here, the term "user" refers to...
The issue of situating intelligent agents within an environment, either virtual or physical, is an important research question in the area of Multi Agent Systems. In addition, the deployment of agents within Wireless Sensor Networks has received some focus also.
This paper proposes an architecture to augment the reasoning capabilities of agents wit...
With the increasing availability of sensors within smartphones and within the world at large, a question arises about how this sensor data can be leveraged by Augmented Reality (AR) devices. AR devices have traditionally been limited by the capability of a given device's unique set of sensors. Connecting sensors from multiple devices using a Sensor...
We increasingly live in a world where sensors have become truly ubiquitous in nature. Many of these sensors are an integral part of devices such as smartphones, which contain sufficient sensors to allow for their use as Augmented Reality (AR) devices. This AR experience is limited by the precision and functionality of an individual device's sensors...
Pervasive sensing is characterized by heterogeneity across a number of dimensions. This raises significant problems for those designing, implementing and deploying sensor networks, irrespective of application domain. Such problems include for example, issues of data provenance and integrity, security, and privacy amongst others. Thus engineering a...
With the vast number of sensors on current and future mobile computing devices, as well as within our environment, a revolution in HCI is taking place. Devices with multiple sensors enable navigation applications, location-based searches, touch-based interfaces with haptic feedback and the promise of Augmented Reality, with devices such as Google G...