About
107
Publications
11,145
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,571
Citations
Citations since 2017
Introduction
Publications
Publications (107)
Background:
Assessment of the quality of medical evidence available online is a critical step in the systematic review of clinical evidence. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence, and focus on a restricted set of quality criteria.
Objective:
We propose a qu...
BACKGROUND
Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria.
OBJECTIVE
We proposed a quality...
The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature. Although semi-structured information resulting from concept recognition and detection of the defining elements of clinical trials (e.g. PICO criteria) has been commonly used to support literature search, the contributions of t...
We have created a corpus for the extraction of information related to diagnosis from scientific literature focused on eye diseases. It was shown that the annotation of entities has a relatively large agreement among annotators, which translates into strong performance of the trained methods, mostly BioBERT. Furthermore it was observed that relation...
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search through a visual overview of a collection enabling exploration to identify...
The focus of this paper is to address the knowledge acquisition bottleneck for Named Entity Recognition (NER) of mutations, by analysing different approaches to build manually-annotated data. We address first the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required. Once we evaluate the...
Current Artificial Intelligence (AI) methods, most based on deep learning, have facilitated progress in several fields, including computer vision and natural language understanding. The progress of these AI methods is measured using benchmarks designed to solve challenging tasks, such as visual question answering. A question remains of how much und...
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to id...
IBM Watson for Drug Discovery (WDD) is a cognitive computing software platform for early-stage pharmaceutical research. WDD extracts and cross-references life sciences information from very large scale structured and unstructured data, identifying connections and correlations in an unbiased manner, and enabling more informed decision making through...
Objective:
Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple My...
Background:
The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three sha...
The International Classification of Diseases (ICD) is a type of meta-data found in many Electronic Patient Records. Research to explore the utility of these codes in medical Information Retrieval (IR) applications is new, and many areas of investigation remain, including the question of how reliable the assignment of the codes has been. This paper...
paper presents a text mining system using Support Vector Machines for detecting lung cancer admissions. Performance of the system using different clinical data sources is evaluated. We use radiology reports as an initial data source and add other sources, such as pathology reports, patient demographic information, and hospital admission information...
Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for dete...
This paper reports on the 3rd CLEFeHealth evaluation lab, which continues our evaluation resource building activities for the medical domain. In this edition of the lab, we focus on easing patients and nurses in authoring, understanding, and accessing eHealth information. The 2015 CLEFeHealth evaluation lab was structured into two tasks, focusing o...
Objective
We address the task of extracting information from free-text pathology reports, focusing on staging information encoded by the TNM (tumour-node-metastases) and ACPS (Australian clinico-pathological stage) systems. Staging information is critical for diagnosing the extent of cancer in a patient and for planning individualised treatment. Ex...
Purpose
Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning tech...
This report outlines the Task 1 of the ShARe/CLEF eHealth evaluation lab pilot. This task focused on identification (1a) and normalization (1b) of diseases and disorders in clinical reports. It used annotations from the ShARe corpus. A total of 22 teams competed in Task 1a and 17 of them also participated Task 1b. The best systems had an F1 score o...
Objective The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not...
This paper reports the use of a document distance-based approach to automatically expand the number of available relevance judgements when these are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published review paper. We compare the res...
Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaur...
This paper proposes a document distance-based approach to automatically expand the number of available relevance judgements when those are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published clinical systematic review. We show that e...
Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to sup...
This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The dat...
Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hos-pitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mor-tality rate. Surveillance and detection of IFDs ir...
We introduce an approach to question answering in the biomedical domain that utilises similarity matching of question/answer pairs in a document, or a set of background documents, to select the best answer to a multiple-choice question. We explored a range of possible similarity matching methods, ranging from simple word overlap, to dependency grap...
This work describes a system for identifying event mentions in bio-molecular text that are either speculative (e.g. "analysis of IkappaBalpha phosphorylation", where it is not specified whether phosphorylation did or did not occur) or negated (e.g. "inhibition of IkappaBalpha phosphorylation", where phosphorylation did not occur). Our system combin...
As more health data becomes available, information extraction aims to make an impact on the workflows of hospitals and care centers. One of the targeted areas is the management of pathology reports, which are employed for cancer diagnosis and staging. In this work we integrate text mining tools in the workflow of the Royal Melbourne Hospital, to ex...
Medical Subject Headings (MeSH) are used to index the majority of databases generated by the National Library of Medicine. Essentially, MeSH terms are designed to make information, such as scientific articles, more retrievable and assessable to users of systems such as PubMed. This paper proposes a novel method for automating the assignment of biom...
Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels.
We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexica...
This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for cer...
As more health data becomes available, information extraction promises to make an impact on the workflows of hospitals and care centers. One of the targeted areas is the management of pathology reports, which are employed for cancer diagnosis and staging. In this work we integrate text mining tools in the workflow of the Royal Melbourne Hospital, t...
NICTA (National ICT Australia) participated in the Medical Records track of TREC 2011 with seven automatic runs. The main techniques used in our submissions involved using Boolean retrieval for filtering, query transformation, and query expansion. Evaluation of our best run ranks our submissions higher than the median of all systems for this track,...
We combine several techniques to participate in Medical TREC 2011 and later we decompose our combined methodologies to gain a thorough understanding of the effects of each individual technique. In this paper we focus on Information Extraction and Expansion to find the best setting for an ideal IR system. Results suggest that Information Expansion i...
Dataset for paper:
Improving MeSH classification of biomedical articles using citation contexts
B Aljaber, D Martinez, N Stokes, J Bailey
Journal of biomedical informatics 44 (5), 881-896
AIM Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD We construct a corpus of 1,000 medical ab-stracts annotated by hand with medical categories (e.g. "In-tervention", "Outcome"). We explore the use of various fea-tures based on l...
This paper explores visualizations of document collections, which we call topic maps. Our topic maps are based on a topic model of the document collection, where the topic model is used to determine the semantic content of each document. Using two collections of search results, we show how topic maps reveal the semantic structure of a collection an...
This paper reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenisation schemes and methods of definition extension. In experimentation over the Hinoki Sensebank and the Japanese Senseval-2 dictionary task, we demonstrate that sense-sensitiv...
Pathology reports are used to store information about cells and tissues of a patient, and they are crucial to monitor the health of individuals and population groups. In this work we present an evaluation of supervised text classification models for the prediction of relevant categories in pathology reports. Our aim is to integrate automatic classi...
We propose an alternative to conventional information retrieval over Linux forum data, based on thread-, post- and user-level analysis, interfaced with an information retrieval engine via reranking.
In this paper, we present an innovative chart mining technique for improving parse coverage based on partial parse outputs from precision grammars. The general approach of mining features from partial analyses is applicable to a range of lexical acquisition tasks, and is particularly suited to domain-specific lexical tuning and lexical acquisition...
This work describes a system for the tasks of identifying events in biomedical text and marking those that are speculative or negated. The architecture of the system relies on both Machine Learning (ML) approaches and hand-coded precision grammars. We submitted the output of our approach to the event extraction shared task at BioNLP 2009, where our...
Information extraction and text mining are receiving growing attention as useful techniques for addressing the crucial information bottleneck in the biomedical domain. We investigate the challenge of extracting information about genetic mutations from tables, an important source of information in scientific papers. We use various machine learning a...
Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in wh...
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training exam...
To date, parsers have made limited use of semantic information, but there is evidence to suggest that semantic features can enhance parse disambiguation. This paper shows that semantic classes help to obtain significant improvement in both parsing and PP attachment tasks. We devise a gold-standard sense- and parse tree-annotated dataset based on th...
Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of a variety of knowledge sources including linguistic information (from the context...
Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive and difficult to maintain. We demonstrate a two-stage searching s...
This paper reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenisation schemes, scoring mechanisms, methods of gloss extension and filtering methods. In experimentation over the Lexeed Sensebank and the Japanese Senseval-2 dictionary task,...
In this chapter, the supervised approach to word sense disambiguation is presented, which consists of automatically inducing
classification models or rules from annotated examples. We start by introducing the machine learning framework for classification
and some important related concepts. Then, a review of the main approaches in the literature is...
This paper describes the joint submission of two systems to the all-words WSD sub-task of SemEval-2007 task 17. The main goal of this work was to build a competitive unsupervised system by combining heterogeneous algorithms. As a secondary goal, we explored the integration of unsupervised predictions into a supervised system by different means.
In this paper we describe the MELB-MKB system, as entered in the SemEval-2007 lexical substitution task. The core of our system was the "Relatives in Context" unsupervised approach, which ranked the candidate substitutes by web-lookup of the word sequences built combining the target context and each substitute. Our system ranked third in the final...
We experiment with text classification of threads from Linux web user forums, in the context of improving information access to the problems and solutions described in the threads. We specifically focus on classifying threads according to: (1) them describing a specific problem vs. containing a more general discussion; (2) the completeness of the i...
In this paper we describe the MELB-MKB system, as entered in the SemEval-2007 lexical substitution task. The core of our system was the "Relatives in Context" unsupervised approach, which ranked the candidate substitutes by web-lookup of the word sequences built combining the target context and each substitute. Our system ranked third in the final...
We present an unsupervised approach to Word Sense Disambiguation (WSD). We automatically acquire English sense examples using an English-Chinese bilingual dictionary, Chinese monolingual corpora and Chinese-English machine translation software. We then train machine learning classifiers on these sense examples and test them on two gold standard Eng...
This paper explores the use of two graph algorithms for unsupervised induction and tagging of nominal word senses based on corpora. Our main contribution is the op- timization of the free parameters of those algorithms and its evaluation against pub- licly available gold standards. We present a thorough evaluation comprising super- vised and unsupe...
Véronis (2004) has recently proposed an innovative unsupervised algorithm for word sense disambiguation based on small-world graphs called HyperLex. This paper explores two sides of the algorithm. First, we extend Véronis' work by optimizing the free parameters (on a set of words which is different to the target set). Second, given that the empiric...
This paper explores the split of features sets in order to obtain better wsd systems through combinations of classifiers learned over each of the split feature sets. Our results show that only k-NN is able to profit from the combination of split features, and that simple voting is not enough for that. Instead we propose combining all k-NN subsystem...