
John Dunnion- BSc, MSc
- University College Dublin
John Dunnion
- BSc, MSc
- University College Dublin
About
76
Publications
13,605
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
624
Citations
Current institution
Publications
Publications (76)
Background
Herd fertility in pasture-based dairy farms is a key driver of farm economics. Models for predicting nulliparous reproductive outcomes are rare, but age, genetics, weight, and BCS have been identified as factors influencing heifer conception. The aim of this study was to create a simulation model of heifer conception to service with thor...
Modelling of binary and categorical events is a commonly used tool to simulate epidemiological processes in veterinary research. Logistic and multinomial regression, naïve Bayes, decision trees and support vector machines are popular data mining techniques used to predict the probabilities of events with two or more outcomes. Thorough evaluation of...
The aim of this study was to build and compare predictive models of calving difficulty in dairy heifers and cows for the purpose of decision support and simulation modeling. Models to predict 3 levels of calving difficulty (unassisted, slight assistance, and considerable or veterinary assistance) were created using 4 machine learning techniques: mu...
Reproductive performance in pasture-based production systems has a fundamentally important effect on economic efficiency. The individual factors affecting the probability of submission and conception are multifaceted and have been extensively researched. The present study analyzed some of these factors in relation to service-level probability of co...
The growth of digital clinical data has raised questions as to how best to leverage this data to aid the world of healthcare. Promising application areas include Information Retrieval and Question-Answering systems. Such systems require an in-depth understanding of the texts that are processed. One aspect of this understanding is knowing if a medic...
The Supervised Machine Learning task of classification has parallels with Information Retrieval (IR): in each case, items
(documents in the case of IR) are required to be categorised into discrete classes (relevant or non-relevant). Thus a parallel
can also be drawn between classifier ensembles, where evidence from multiple classifiers are combined...
The SIFT (Segmented Information Fusion Techniques) group in UCD is dedicated to researching Data Fusion in Information Retrieval. This area of research involves the merging of multiple sets of results into a single result set that is presented to the user. As a means of both evaluat- ing the effectiveness of this work and comparing it against other...
Data Fusion is the combination of a number of independent search results, relating to the same document collection, into a single result to be presented to the user. A number of probabilistic data fusion models have been shown to be effective in empirical studies. These typically attempt to estimate the probability that particular documents will be...
This paper describes the IIRG's first implementation of a system for automatic Knowledge Base Popu-lation (KBP). The Text Analysis Conference (TAC), first organised by NIST in 2008, promotes further re-search in Natural Language Technologies. In 2009, NIST added a Knowledge Base Population Track to TAC, the goal of this track was to promote researc...
The SIFT (Segmented Information Fusion Techniques) group in UCD is dedicated to researching Data Fusion in Information Retrieval. This area of research involves the merging of multiple sets of results into a single result set that is presented to the user. As a means of both evaluat-ing the effectiveness of this work and comparing it against other...
Recent developments in the field of data fusion have seen a focus on techniques that use training queries to estimate the probability that various documents are relevant to a given query and use that information to assign scores to those documents on which they are subsequently ranked. This paper introduces SlideFuse, which builds on these techniqu...
Data fusion is the process of combining the output of a number of Information Retrieval (IR) algorithms into a single result
set, to achieve greater retrieval performance. ProbFuse is a data fusion algorithm that uses the history of the underlying IR algorithms to estimate the probability that subsequent
result sets include relevant documents in pa...
In this paper we present a classifier for Recognising Textual Entailment (RTE) and Semantic Equivalence. We evaluate the performance
of this classifier using an evaluation framework provided by the PASCAL RTE Challenge Workshop. Sentence–pairs are represented
as a set of features, which are used by our decision tree classifier to determine if an en...
Readability refers to all characteristics of a document that contribute to its ‘ease of understanding or comprehension due
to the style of writing’ [1]. The readability of a text is dependent on a number of factors, including but not constrained
to; its legibility, syntactic difficulty, semantic difficulty and the organization of the text [2]. As m...
We describe Semantic Equivalence and Textual Entailment Recognition, and outline a system which uses a number of lexical,
syntactic and semantic features to classify pairs of sentences as “semantically equivalent”. We describe an experiment to
show how syntactic and semantic features improve the performance of an earlier system, which used only lex...
Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become
an extremely important area as the amount of electronically available information increases dramatically. There are numerous
methods of performing the IR task both by utilising different techniques and through using different re...
Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown in the past that this can greatly improve retrieval eectiveness over that of the individual results. This paper presents probFuse, a probabilistic approach to data fusion. ProbFuse assumes that the perf...
Our systems (ndsc and ndsc+) use linguis-tic features as training data for a decision tree classifier. These features are derived from the text–hypothesis pairs under ex-amination. The classifier uses the features to decide if the the given hypothesis can be entailed from the text. This decision has an associated confidence value, which is a functi...
In this paper, we present the HybridTrim system, which uses a machine learning technique to combine linguistic, statistical and positional information to identify topic labels for headlines in a text. We compare our system with the Topiary system which, in contrast, uses a statistical learning approach to finding topic descriptors for headlines. Bo...
This paper describes an extensible and scalable approach to indexing documents that is utilized within the Highly Organised Team of Agents for Information Retrieval (HOTAIR) architecture.
This paper describes a scalable mathematical model for dynamically calculating the number of agents to optimally handle the current load within the Highly Organised Team of Agents for Information Retrieval (HOTAIR) architecture.
In this paper we describe a system that performs Topic Dtection, a sub-task of the Topic Detection and Tracking (TDT) Project. We describe the Topic Detection task and present initial results for both a baseline system and a set of extensions that we have implemented in our system that attempt to model events and reportage in the news domain. We de...
In this paper we compare a number of Topiary-style headline generation systems. The Topiary system, developed at the University of Maryland with BBN, was the top performing headline generation system at DUC 2004. Topiary-style headlines consist of a number of general topic labels followed by a compressed version of the lead sentence of a news story...
With the huge number of documents becoming available in electronic form, finding relevant information in a large corpus is becoming an increasingly important, but difficult, task. We believe that semantic processing is required in order to achieve more accurate information retrieval. This paper describes a framework for the creation of semantic mar...
In this paper we compare two parse-and-trim style headline generation systems. The Topiary system uses a statistical learning
approach to finding topic labels for headlines, while our approach, the LexTrim system, identifies key summary words by analysing
the lexical cohesion structure of a text. The performance of these systems is evaluated using...
This report outlines the approach taken by members of the IIRG at University College Dublin in the PASCAL Textual Entailment Challenge 2005. Our tech- nique measures the semantic equivalence of each text/hypothesis pair by examining both linguistic and statistical features in these sentences using a decision tree clas- sifier.
In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content...
We discuss Topic Detection, a sub-task of the Topic Detection and Tracking (TDT) Project, and present a system that uses domain-informed techniques to group news reports into clusters that capture the narrative
of events in the news domain. We present an initial evaluation of this system, and describe an application of these techniques
for the clus...
This paper describes a technique which uses research into the use of existing linguistic resources (VerbNet and WordNet) to
construct conceptual graph representations of texts. We use a two-step approach, firstly identifying the semantic roles in
a sentence, and then using these roles, together with semi-automatically compiled domain-specific knowl...
This paper proposes that there is a substantial relative difference in the performance of information-filtering algorithms
as they are applied to different datasets, and that these performance differences can be leveraged to form the basis of an
Adaptive Information Filtering System. We classify five different datasets based on metrics such as spar...
We describe an experiment to determine the quality of different similarity metrics, with regard to redundancy removal. The three metrics under examination are WordNet distance, Cosine Similarity (Vector-space model) and Latent Semantic Indexing.
In this paper we present a machine learning approach to generating very short news story summaries (i.e. no more than 75 bytes long). Our technique uses a decision tree classifier to establish which phrases in a text should be included in the resultant summary. Our ROUGE evaluation results for task 1 (English text summarisation) and task 3 (transla...
Information is becoming increasingly available in digital formats such as Web Pages, MP3 files and many others. This puts more emphasis on the need for reliable information filtering techniques. New recommendation algorithms are continuously being developed to deal with the problem of information overload. In this paper we present a new, regression...
We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chain based summarisation systems. We also com...
We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Self-Organizing Map (SOM) algorithm to arrange XML marked-up documents on a twodimensional map so that similar documents appear closer to each other. It then employs an indu...
This papers describes novel research involving the development of Textual CBR techniques and applying them to the problem
of Incident Report Retrieval. Incident Report Retrieval is a relatively new research area in the domain of Accident Reporting
and Analysis. We describe T-Ret, an Incident Report Retrieval system that incorporates textual CBR tec...
We discuss Topic Detection, a sub-task of the Topic Detection and Tracking (TDT) Project, and present a system that uses the linguistic and temporal features of news reportage to enhance the discovery of
events in a collection of news articles. We describe an online application of these techniques that constructs topical clusters
from live news fee...
We present a comparative study of lexical chain-based summarisation techniques. The aim of this paper is to highlight the effect of lexical chain scoring metrics and sentence extraction techniques on summary generation. We present our own lexical chain-based summarisation system and compare it to other chain-based summarisation systems. We also com...
We discuss (TDT) Project, and present initial results for both a baseline system and a set of extensions that attempt to model events and reportage in the news domain. Moreover, we describe a system that clusters live news feed and presents this data to the user in a variety of formats. We conclude that this system produces interesting and useful c...
With the huge number of documents becoming available in electronic form, finding the right information in a large corpus is becoming an increasingly important and difficult task. We believe that semantic processing is required for accurate information retrieval.This paper describes a framework for the automatic creation of semantic markup and its i...
In this paper, we present a system for marking up text documents into XML on a Self-Organising Map (SOM). The system organises pre-tagged XML documents on the Self-Organising Map such that the documents similar in content are placed closer to each other. Then, by employing the inductive learning algorithm C5.0, the system learns markup rules from t...
This paper describes a system that uses a combination of existing linguistic resources, namely VerbNet and WordNet, to construct conceptual graph representations of texts. We use a two-step approach, firstly identifying the semantic roles in a sentence, and then using these roles, together with semi-automatically compiled domain-specific knowledge,...
In this paper, we describe a News Story Gisting system that generates a 10-word short summary of a news story. This system uses a machine learning technique to combine linguistic, statistical and positional information in order to generate an appropriate summary. We also present the results of an automatic evaluation of this system with respect to...
In this paper we present a novel system which automatically converts text documents into XML by extracting information from
previously tagged XML documents. The system uses the Self-Organizing Map (SOM) learning algorithm to arrange tagged documents
on a two-dimensional map such that nearby locations contain similar documents. It then employs the i...
Incident reporting is becoming increasingly important in large organizations. Legislation is progressively being introduced to deal with this information. One example is the European Directive No. 94/95/EC, which obliges airlines and national bodies to collect and collate reports of incidents. Typically these organizations use manual files and stan...
Incident Management Systems can play a crucial role in helping to reduce the number of workplace accidents by providing support
for incident analysis. In particular, the retrieval of relevant similar incident reports can help safety personnel to identify
factors and patterns that have contributed or might potentially contribute to accidents. Incide...
Information filtering techniques are becoming more widely used as available information spaces grow exponentially larger. New techniques for filtering information are being developed to tackle the information overload problem. This paper presents an assessment of the perfor- mance of three popular recommendation stratagem over a range of diverse da...
In this paper we present a novel system that can automatically mark up text documents into XML. The system uses the Self-Organizing Map (SOM) algorithm to organize marked documents on a map so that similar documents are placed on nearby locations. Then by using the inductive learning algorithm C5, it automatically generates and applies the markup r...
In this paper we present a novel two-stage automatic XML markup system. The system uses Kohonen's Self-Organizing Map (SOM) learning algorithm to arrange marked-up documents on a two-dimensional map such that nearby locations contain similar documents. It then employs an inductive learning algorithm (C5/See5) to automatically extract and apply mark...
This paper describes the application of Machine Learning (ML) techniques to the problem of Information Retrieval. Specifically, it presents a system which incorporates machine learning techniques in determining the subject(s) of a piece of text. This system is part of a much larger information management system which provides software support for t...
Minstrel-ODM is a basic office data model that supports the filing and retrieval of office data. It is based on a modelling paradigm where office data is viewed as existing in the form of office objects and the modelling formalism is of the semantic (hierarchy) data model kind. In developing the model, ideas have been drawn from work by others on s...
With the complexity of computer systems increasing with time, the need for systems that are capable of managing themselves has become an important consideration in the Information Technology in-dustry. In this paper, we discuss HOTAIR: a scalable, autonomic Multi-Agent Information Retrieval System. In particular, we focus on the incor-poration of s...
In this paper, we present the HybridTrim system which uses a machine learning technique to combine linguistic, statistical and positional information to identify topic labels for headlines in a text. We compare our system with the Topiary system which, in contrast, uses a statistical learning approach to finding topic descriptors for headlines. The...
This paper describes a non-statistical approach for semantic annotation of documents by analysing their syntax and by using semantic/syntactic behav-iour patterns described in VerbNet. We use a two-stage approach, firstly iden-tifying the semantic roles in a sen-tence, and then using these roles to rep-resent some of the relations between the conce...
We introduce a novel two-stage automatic XML mark-up system, which combines the WEBSOM approach to document categorisation in conjunction with the C5 inductive learning algorithm. The WEBSOM method clusters the XML marked-up documents such that semantically similar documents lie close together on a Self-Organising Map (SOM). The C5 algorithm automa...
This paper describes the general design and architecture of an intelligent recommendation system aimed mainly at supporting a user in her navigation through the massive amounts of information that she has to cope with in order to find the right information. Alternative recommender system techniques are needed to retrieve quickly high quality recomm...
The increasing acceptance of XML as a standard for document markup promises to provide solutions for the problems of document management and retrieval. However, existing documents must be converted into XML. In this paper we present the AutoTag system, which automatically converts text documents into XML. The system has a hybrid architecture, arran...
We discuss Topic Detection, a sub-task of the Topic Detection and Tracking (TDT) Project, and present initial results for both a baseline system and a set of extensions that attempt to model events and reportage in the news domain. Moreover, we describe a system that clusters live news feeds and presents this data to the user in a variety of format...
In this paper we present a system which automatically converts text documents into XML by extracting information from previously tagged XML documents. The system uses the Self-Organizing Map (SOM) learning algorithm to arrange tagged documents on a two-dimensional map such that nearby locations contain similar documents. It then employs an inductiv...
In this paper, we present several approaches to the retrieval of medical visits in response to user queries on patient demographics. A visit is comprised of one or more medical re-ports. Given a data collection of medical re-ports, TREC Medical Track participants had the opportunity to either preprocess the doc-uments concatenating reports into vis...