About
75
Publications
84,398
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,237
Citations
Current institution
Publications
Publications (75)
Ransomware attacks have increased year-on-year, resulting in organizations of all sizes facing threats to their IT infrastructure and business operations. Companies cannot guarantee absolute protection from a ransomware attack, with social engineering being a significant weakness in an organization's cybersecurity structure. We examine the ransomwa...
The integrity of electronic evidence is essential for judicial proceedings. In this context, the role of the First Responder for discovery, identification and preservation is considered to be one of the short-term most critical challenge. While the number of devices to be collected was reasonably small and the items were easily identifiable in the pa...
Although keeping some basic concepts inherited from FAT32, the exFAT file system introduces many differences, such as the new mapping scheme of directory entries. The combination of exFAT mapping scheme with the allocation of bitmap files and the use of FAT leads to new forensic possibilities. The recovery of deleted files, including fragmented one...
In this paper, we present the ADMIRE architecture; a new framework for developing novel and innovative data mining techniques to deal with very large and distributed heterogeneous datasets in both commercial and academic applications. The main ADMIRE components are detailed as well as its interfaces allowing the user to efficiently develop and impl...
In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top...
In addition to hosting user-generated video content, YouTube provides recommendation services, where sets of related and recommended videos are presented to users, based on factors such as co-visitation count and prior viewing history. This article is specifically concerned with extreme right (ER) video content, portions of which contravene hate la...
The Syria conflict has been described as the most socially mediated in history, with online social media playing a particularly important role. At the same time, the ever-changing landscape of the conflict leads to difficulties in applying analytical approaches taken by other studies of online political activism. Therefore, in this paper, we use an...
Many extreme right groups have had an online presence for some time through the use of dedicated websites. This has been accompanied by increased activity in social media platforms in recent years, enabling the dissemination of extreme right content to a wider audience. In this paper, we present an analysis of the activity of a selection of such gr...
The growth of medical and clinical textual datasets has fostered research interests in methods for storing, retrieving and extracting of pertinent data. In more recent years, shared tasks and more comprehensive data sharing agreements have seen a further growth in the research area spanning Natural Language Processing (NLP) and Information Retrieva...
Due to its status as the most popular video sharing platform, YouTube plays
an important role in the online strategy of extreme right groups, where it is
often used to host associated content such as music and other propaganda. In
this paper, we develop a categorization suitable for the analysis of extreme
right channels found on YouTube. By combin...
With the rapid growth of global cloud adoption in private and public sectors, cloud computing environments is becoming a new battlefield for cyber crime. In this paper, the researcher presents the results and analysis of a survey that was widely circulated among digital forensic experts and practitioners internationally on cloud forensics and criti...
Recent years have seen increased interest in the online presence of extreme
right groups. Although originally composed of dedicated websites, the online
extreme right milieu now spans multiple networks, including popular social
media platforms such as Twitter, Facebook and YouTube. Ideally therefore, any
contemporary analysis of online extreme righ...
In this paper, researchers provide a preliminary analysis on the forensic implications of cloud computing reference architecture, on the segregation of duties of cloud actors in cloud investigations, forensic artifacts on all layers of cloud system stack, cloud actors interaction scenarios in cloud investigations, and forensic implications of all c...
In this paper we present a shortened version of the Cloud Forensic Maturity Model (CFMM). It composes of two inter-related parts, i.e., the Cloud Forensic Investigative Architecture (CFIA) and the Cloud Forensic Capability Matrix (CFCM). The CFMM is developed in order to create a reference model to evaluate and improve cloud forensic maturity. It i...
The growth of digital clinical data has raised questions as to how best to leverage this data to aid the world of healthcare. Promising application areas include Information Retrieval and Question-Answering systems. Such systems require an in-depth understanding of the texts that are processed. One aspect of this understanding is knowing if a medic...
Like other social media websites, YouTube is not immune from the attention of
spammers. In particular, evidence can be found of attempts to attract users to
malicious third-party websites. As this type of spam is often associated with
orchestrated campaigns, it has a discernible network signature, based on
networks derived from comments posted by u...
As the popularity of content sharing websites such as YouTube and Flickr has
increased, they have become targets for spam, phishing and the distribution of
malware. On YouTube, the facility for users to post comments can be used by
spam campaigns to direct unsuspecting users to bogus e-commerce websites. In
this paper, we demonstrate how such campa...
As cloud adoption grows, the importance of preparing for forensic investigations in cloud environments also grows. A recent survey of digital forensic professionals identified that missing terms and conditions regarding forensic activities in service level agreements between cloud providers and cloud consumers is a significant challenge for cloud f...
When encrypted material is discovered during a digital investigation and the investigator cannot decrypt the material then he or she is faced with the problem of how to determine the evidential value of the material. This research is proposing a methodology titled Cryptopometry. Cryptopometry extracts probative value from the encrypted file of a hy...
Cloud computing may well become one of the most transformative technologies in the history of computing. Cloud service providers and customers have yet to establish adequate forensic capabilities that could support investigations of criminal activities in the cloud. This paper discusses the emerging area of cloud forensics, and highlights its chall...
Cloud computing is estimated to be one of the most transformative technologies in the history of computing. Cloud organizations, including the providers and customers of cloud services, have yet to establish a well-defined forensic capability. Without this they are unable to ensure the robustness and suitability of their services to support investi...
Unsolicited Bulk Email (UBE) has become a large problem in recent years. The number of mass mailers in existence is increasing dramatically. Automatically detecting UBE has become a vital area of current research. Many email clients (such as Outlook and Thunderbird) already have junk filters built in. Mass mailers are continually evolving and overc...
The ability to correctly classify sentences that describe events is an important task for many natural language applications
such as Question Answering (QA) and Text Summarisation. In this paper, we treat event detection as a sentence level text classification
problem. Overall, we compare the performance of discriminative versus generative approach...
When encrypted material is discovered during a digital investigation and the investigator cannot decrypt the material then he or she is faced with the problem of how to determine the evidential value of the material. This research is proposing a methodology titled Cryptopometry. Cryptopometry extracts probative value from the encrypted file of a hy...
This paper introduces an approach to classifying emails into phishing/non-phishing categories using the C5.0 algorithm which achieves very high precision and an ensemble of other classifiers that achieve high recall. The representation of instances used in this paper is very small consisting of only five features. Results of an evaluation of this s...
The teaching of practical sessions in Computer Science frequently tends to follow a standard pattern: large numbers of students work in isolation on a particular assignment, enlisting help from whichever demonstrator is available at the relevant time. This model has a number of inherent difficulties. In some cases, each demonstrator may not have th...
When encrypted material is discovered during a digital investigation and the investigator cannot decrypt the material then
s/he is faced with the problem of how to determine the evidential value of the material. This research is proposing a methodology
of extracting probative value from the encrypted file of a hybrid cryptosystem. The methodology a...
The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Sum- marisation. In this paper, we treat event detection as a sentence level text classifi- cation problem. We compare the perfor- mance of two approaches to this task: a Support Vector Mac...
In this paper we present a classifier for Recognising Textual Entailment (RTE) and Semantic Equivalence. We evaluate the performance
of this classifier using an evaluation framework provided by the PASCAL RTE Challenge Workshop. Sentence–pairs are represented
as a set of features, which are used by our decision tree classifier to determine if an en...
We investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. We provide evidence that the order in which events are described is structured in a way that can be exploited during clustering. We evaluate our approach on a corpus of news articles describing events that have occurr...
Readability refers to all characteristics of a document that contribute to its ‘ease of understanding or comprehension due
to the style of writing’ [1]. The readability of a text is dependent on a number of factors, including but not constrained
to; its legibility, syntactic difficulty, semantic difficulty and the organization of the text [2]. As m...
We describe Semantic Equivalence and Textual Entailment Recognition, and outline a system which uses a number of lexical,
syntactic and semantic features to classify pairs of sentences as “semantically equivalent”. We describe an experiment to
show how syntactic and semantic features improve the performance of an earlier system, which used only lex...
In large telecommunication networks, alarms are usually useful for identifying faults and, therefore solving them. However, for large systems the number of alarms produced is so large that the current management systems are overloaded. One way of overcoming this problem is to analyse and interpret these alarms before faults can be located. Two diff...
In this paper, we present the ADMIRE architecture; a new framework for developing novel and innovative data mining techniques to deal with very large and distributed heterogeneous datasets in both commercial and academic applications. The main ADMIRE components are detailed as well as its interfaces allowing the user to efficiently develop and impl...
With the proliferation of news articles from thousands of dif-ferent sources now available on the Web, summarization of such information is becoming increasingly important. Our re-search focuses on merging descriptions of news events from multiple sources, to provide a concise description that com-bines the information from each source. Specificall...
Our systems (ndsc and ndsc+) use linguis-tic features as training data for a decision tree classifier. These features are derived from the text–hypothesis pairs under ex-amination. The classifier uses the features to decide if the the given hypothesis can be entailed from the text. This decision has an associated confidence value, which is a functi...
In this paper, we present the HybridTrim system, which uses a machine learning technique to combine linguistic, statistical and positional information to identify topic labels for headlines in a text. We compare our system with the Topiary system which, in contrast, uses a statistical learning approach to finding topic descriptors for headlines. Bo...
In this paper we compare a number of Topiary-style headline generation systems. The Topiary system, developed at the University of Maryland with BBN, was the top performing headline generation system at DUC 2004. Topiary-style headlines consist of a number of general topic labels followed by a compressed version of the lead sentence of a news story...
In this paper we compare two parse-and-trim style headline generation systems. The Topiary system uses a statistical learning
approach to finding topic labels for headlines, while our approach, the LexTrim system, identifies key summary words by analysing
the lexical cohesion structure of a text. The performance of these systems is evaluated using...
This report outlines the approach taken by members of the IIRG at University College Dublin in the PASCAL Textual Entailment Challenge 2005. Our tech- nique measures the semantic equivalence of each text/hypothesis pair by examining both linguistic and statistical features in these sentences using a decision tree clas- sifier.
Database selection, also known as resource selection, server selection and query routing is an important topic in distributed
information retrieval research. Several approaches to database selection use document frequency data to rank servers. Many
researchers have shown that the effectiveness of these algorithms depends on database size and conten...
This document describes the creation of an automatic text-to-scene conversion system, AVis (Automatic Visualizer), for accident reports. Such reports vary from short text passages to long, complex documents describing the chain of events. The visualization of accidents is an important tool in Accident and Incident Analysis [Johnson 2002] For exampl...
We describe an experiment to determine the quality of different similarity metrics, with regard to redundancy removal. The three metrics under examination are WordNet distance, Cosine Similarity (Vector-space model) and Latent Semantic Indexing.
In this paper we present a machine learning approach to generating very short news story summaries (i.e. no more than 75 bytes long). Our technique uses a decision tree classifier to establish which phrases in a text should be included in the resultant summary. Our ROUGE evaluation results for task 1 (English text summarisation) and task 3 (transla...
In this paper, we describe a News Story Gisting system that generates a 10-word short summary of a news story. This system uses a machine learning technique to combine linguistic, statistical and positional information in order to generate an appropriate summary. We also present the results of an automatic evaluation of this system with respect to...
In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e., distinct news stories from b...
We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chain based summarisation systems. We also com...
In this paper we describe an extractive method of creating very short summaries or gists that capture the essence of a news story using a linguistic technique called lexical chaining. The recent interest in robust gisting and title generation techniques originates from a need to improve the indexing and browsing capabilities of interactive digital...
This papers describes novel research involving the development of Textual CBR techniques and applying them to the problem
of Incident Report Retrieval. Incident Report Retrieval is a relatively new research area in the domain of Accident Reporting
and Analysis. We describe T-Ret, an Incident Report Retrieval system that incorporates textual CBR tec...
This paper describes research into the use of lexical chains to build effective Topic Tracking systems and compares the performance
with a simple keyword-based approach. Lexical chaining is a method of grouping lexically related terms into so called lexical
chains, using simple natural language processing techniques. Topic tracking involves trackin...
We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chain-based summarisation systems. We also com...
In his paper we describe an ex rac ive me hod of crea ing very shor summaries or gis s ha cap ure he essence of a ne ws s ory using a linguis ic echnique called lexical chaining. The recen in er es in robus gis ing and i le genera ion echniques origina es from a need o imp rove he indexing and browsing capabili ies of in erac ive digi al mul im edi...
In this paper, we explore the effects of data fusion on First Story Detection [1] in a broadcast news domain. The data fusion element of this experiment involves the combination of evidence derived from two distinct representations of document content in a single cluster run. Our composite document representation consists of a concept representatio...
Incident reporting is becoming increasingly important in large organizations. Legislation is progressively being introduced to deal with this information. One example is the European Directive No. 94/95/EC, which obliges airlines and national bodies to collect and collate reports of incidents. Typically these organizations use manual files and stan...
Incident Management Systems can play a crucial role in helping to reduce the number of workplace accidents by providing support
for incident analysis. In particular, the retrieval of relevant similar incident reports can help safety personnel to identify
factors and patterns that have contributed or might potentially contribute to accidents. Incide...
Describes research into the use of lexical chains to build effective topic tracking systems. Lexical chaining is a method of grouping lexically related terms into so called lexical chains, using simple natural language processing techniques. Topic tracking involves tracking a given news event in a stream of news stories i.e. finding all subsequent...
In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news...
In this paper we describe a type of data fusion involving the combination of evidence derived from multiple document representations. Our aim is to investigate if a composite representation can improve the online detection of novel events in a stream of broadcast news stories. This classification process otherwise known as first story detection FSD...
This paper discusses a system for online new event detection in the domain of news articles on the web. This area is related
to the Topic Detection and Tracking initiative. We evaluate two benchmark systems: The first like most current web retrieval
systems, relies on term repetition to calculate document relatedness. The second attempts to perfor...
The last winner of the Salton Award, Tefko Saracevic, gave an acceptance address at SIGIR in Philadelphia in 1997. Previous winners were William Cooper (1994), Cyril Cleverdon (1991), Karen Sparck Jones (1988) and Gerard Salton himself (1985).
In this ...
This paper describes research into the development of techniques to build effective Topic Tracking systems. Topic tracking involves tracking a given news event in a stream of news stories i.e. finding all subsequent stories in the news stream that discuss the given event. This research has grown out of the Topic Detection and Tracking (TDT) initiat...
This paper discusses a system for online new event detection as part of the Topic Detection and Tracking (TDT) initiative. Our approach uses a single-pass clustering algorithm, which includes a time-based selection model and a thresholding model. We evaluate two benchmark systems: The first indexes documents by keywords and the second attempts to p...
In this paper, we present the HybridTrim system which uses a machine learning technique to combine linguistic, statistical and positional information to identify topic labels for headlines in a text. We compare our system with the Topiary system which, in contrast, uses a statistical learning approach to finding topic descriptors for headlines. The...
This paper describes the general design and architecture of an intelligent recommendation system aimed mainly at supporting a user in her navigation through the massive amounts of information that she has to cope with in order to find the right information. Alternative recommender system techniques are needed to retrieve quickly high quality recomm...
In this paper, we present several approaches to the retrieval of medical visits in response to user queries on patient demographics. A visit is comprised of one or more medical re-ports. Given a data collection of medical re-ports, TREC Medical Track participants had the opportunity to either preprocess the doc-uments concatenating reports into vis...