David Martinez

David Martinez
Doctor Evidence

Phd

About

107
Publications
11,145
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,571
Citations
Citations since 2017
11 Research Items
914 Citations
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
Additional affiliations
January 2011 - December 2012
University of Melbourne
January 2008 - present
Universidad del País Vasco / Euskal Herriko Unibertsitatea
Position
  • Universidad del País Vasco / Euskal Herriko Unibertsitatea
January 2008 - present
The University of Sheffield

Publications

Publications (107)
Article
Background: Assessment of the quality of medical evidence available online is a critical step in the systematic review of clinical evidence. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence, and focus on a restricted set of quality criteria. Objective: We propose a qu...
Preprint
BACKGROUND Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria. OBJECTIVE We proposed a quality...
Preprint
Full-text available
The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature. Although semi-structured information resulting from concept recognition and detection of the defining elements of clinical trials (e.g. PICO criteria) has been commonly used to support literature search, the contributions of t...
Preprint
We have created a corpus for the extraction of information related to diagnosis from scientific literature focused on eye diseases. It was shown that the annotation of entities has a relatively large agreement among annotators, which translates into strong performance of the trained methods, mostly BioBERT. Furthermore it was observed that relation...
Chapter
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search through a visual overview of a collection enabling exploration to identify...
Preprint
Full-text available
The focus of this paper is to address the knowledge acquisition bottleneck for Named Entity Recognition (NER) of mutations, by analysing different approaches to build manually-annotated data. We address first the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required. Once we evaluate the...
Preprint
Full-text available
Current Artificial Intelligence (AI) methods, most based on deep learning, have facilitated progress in several fields, including computer vision and natural language understanding. The progress of these AI methods is measured using benchmarks designed to solve challenging tasks, such as visual question answering. A question remains of how much und...
Preprint
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to id...
Article
IBM Watson for Drug Discovery (WDD) is a cognitive computing software platform for early-stage pharmaceutical research. WDD extracts and cross-references life sciences information from very large scale structured and unstructured data, identifying connections and correlations in an unbiased manner, and enabling more informed decision making through...
Article
Objective: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple My...
Article
Full-text available
Background: The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three sha...
Article
The International Classification of Diseases (ICD) is a type of meta-data found in many Electronic Patient Records. Research to explore the utility of these codes in medical Information Retrieval (IR) applications is new, and many areas of investigation remain, including the question of how reliable the assignment of the codes has been. This paper...
Conference Paper
paper presents a text mining system using Support Vector Machines for detecting lung cancer admissions. Performance of the system using different clinical data sources is evaluated. We use radiology reports as an initial data source and add other sources, such as pathology reports, patient demographic information, and hospital admission information...
Article
Full-text available
Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for dete...
Conference Paper
Full-text available
This paper reports on the 3rd CLEFeHealth evaluation lab, which continues our evaluation resource building activities for the medical domain. In this edition of the lab, we focus on easing patients and nurses in authoring, understanding, and accessing eHealth information. The 2015 CLEFeHealth evaluation lab was structured into two tasks, focusing o...
Article
Objective We address the task of extracting information from free-text pathology reports, focusing on staging information encoded by the TNM (tumour-node-metastases) and ACPS (Australian clinico-pathological stage) systems. Staging information is critical for diagnosing the extent of cancer in a patient and for planning individualised treatment. Ex...
Article
Full-text available
Purpose Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning tech...
Article
Full-text available
This report outlines the Task 1 of the ShARe/CLEF eHealth evaluation lab pilot. This task focused on identification (1a) and normalization (1b) of diseases and disorders in clinical reports. It used annotations from the ShARe corpus. A total of 22 teams competed in Task 1a and 17 of them also participated Task 1b. The best systems had an F1 score o...
Article
Full-text available
Objective The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not...
Conference Paper
Full-text available
This paper reports the use of a document distance-based approach to automatically expand the number of available relevance judgements when these are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published review paper. We compare the res...
Article
Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaur...
Conference Paper
Full-text available
This paper proposes a document distance-based approach to automatically expand the number of available relevance judgements when those are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements are extracted from a list of references in a published clinical systematic review. We show that e...
Conference Paper
Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to sup...
Article
Full-text available
This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The dat...
Article
Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hos-pitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mor-tality rate. Surveillance and detection of IFDs ir...
Article
We introduce an approach to question answering in the biomedical domain that utilises similarity matching of question/answer pairs in a document, or a set of background documents, to select the best answer to a multiple-choice question. We explored a range of possible similarity matching methods, ranging from simple word overlap, to dependency grap...
Conference Paper
This work describes a system for identifying event mentions in bio-molecular text that are either speculative (e.g. "analysis of IkappaBalpha phosphorylation", where it is not specified whether phosphorylation did or did not occur) or negated (e.g. "inhibition of IkappaBalpha phosphorylation", where phosphorylation did not occur). Our system combin...
Article
As more health data becomes available, information extraction aims to make an impact on the workflows of hospitals and care centers. One of the targeted areas is the management of pathology reports, which are employed for cancer diagnosis and staging. In this work we integrate text mining tools in the workflow of the Royal Melbourne Hospital, to ex...
Article
Medical Subject Headings (MeSH) are used to index the majority of databases generated by the National Library of Medicine. Essentially, MeSH terms are designed to make information, such as scientific articles, more retrievable and assessable to users of systems such as PubMed. This paper proposes a novel method for automating the assignment of biom...
Article
Full-text available
Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexica...
Article
Full-text available
This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for cer...
Conference Paper
As more health data becomes available, information extraction promises to make an impact on the workflows of hospitals and care centers. One of the targeted areas is the management of pathology reports, which are employed for cancer diagnosis and staging. In this work we integrate text mining tools in the workflow of the Royal Melbourne Hospital, t...
Conference Paper
Full-text available
NICTA (National ICT Australia) participated in the Medical Records track of TREC 2011 with seven automatic runs. The main techniques used in our submissions involved using Boolean retrieval for filtering, query transformation, and query expansion. Evaluation of our best run ranks our submissions higher than the median of all systems for this track,...
Article
Full-text available
We combine several techniques to participate in Medical TREC 2011 and later we decompose our combined methodologies to gain a thorough understanding of the effects of each individual technique. In this paper we focus on Information Extraction and Expansion to find the best setting for an ideal IR system. Results suggest that Information Expansion i...
Data
Dataset for paper: Improving MeSH classification of biomedical articles using citation contexts B Aljaber, D Martinez, N Stokes, J Bailey Journal of biomedical informatics 44 (5), 881-896
Article
Full-text available
AIM Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD We construct a corpus of 1,000 medical ab-stracts annotated by hand with medical categories (e.g. "In-tervention", "Outcome"). We explore the use of various fea-tures based on l...
Article
Full-text available
This paper explores visualizations of document collections, which we call topic maps. Our topic maps are based on a topic model of the document collection, where the topic model is used to determine the semantic content of each document. Using two collections of search results, we show how topic maps reveal the semantic structure of a collection an...
Article
Full-text available
This paper reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenisation schemes and methods of definition extension. In experimentation over the Hinoki Sensebank and the Japanese Senseval-2 dictionary task, we demonstrate that sense-sensitiv...
Conference Paper
Full-text available
Pathology reports are used to store information about cells and tissues of a patient, and they are crucial to monitor the health of individuals and population groups. In this work we present an evaluation of supervised text classification models for the prediction of relevant categories in pathology reports. Our aim is to integrate automatic classi...
Conference Paper
Full-text available
We propose an alternative to conventional information retrieval over Linux forum data, based on thread-, post- and user-level analysis, interfaced with an information retrieval engine via reranking.
Conference Paper
Full-text available
In this paper, we present an innovative chart mining technique for improving parse coverage based on partial parse outputs from precision grammars. The general approach of mining features from partial analyses is applicable to a range of lexical acquisition tasks, and is particularly suited to domain-specific lexical tuning and lexical acquisition...
Conference Paper
Full-text available
This work describes a system for the tasks of identifying events in biomedical text and marking those that are speculative or negated. The architecture of the system relies on both Machine Learning (ML) approaches and hand-coded precision grammars. We submitted the output of our approach to the event extraction shared task at BioNLP 2009, where our...
Conference Paper
Full-text available
Information extraction and text mining are receiving growing attention as useful techniques for addressing the crucial information bottleneck in the biomedical domain. We investigate the challenge of extracting information about genetic mutations from tables, an important source of information in scientific papers. We use various machine learning a...
Article
Full-text available
Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in wh...
Article
Full-text available
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training exam...
Conference Paper
Full-text available
To date, parsers have made limited use of semantic information, but there is evidence to suggest that semantic features can enhance parse disambiguation. This paper shows that semantic classes help to obtain significant improvement in both parsing and PP attachment tasks. We devise a gold-standard sense- and parse tree-annotated dataset based on th...
Conference Paper
Full-text available
Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of a variety of knowledge sources including linguistic information (from the context...
Conference Paper
Full-text available
Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive and difficult to maintain. We demonstrate a two-stage searching s...
Article
Full-text available
This paper reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenisation schemes, scoring mechanisms, methods of gloss extension and filtering methods. In experimentation over the Lexeed Sensebank and the Japanese Senseval-2 dictionary task,...
Chapter
In this chapter, the supervised approach to word sense disambiguation is presented, which consists of automatically inducing classification models or rules from annotated examples. We start by introducing the machine learning framework for classification and some important related concepts. Then, a review of the main approaches in the literature is...
Conference Paper
Full-text available
This paper describes the joint submission of two systems to the all-words WSD sub-task of SemEval-2007 task 17. The main goal of this work was to build a competitive unsupervised system by combining heterogeneous algorithms. As a secondary goal, we explored the integration of unsupervised predictions into a supervised system by different means.
Conference Paper
Full-text available
In this paper we describe the MELB-MKB system, as entered in the SemEval-2007 lexical substitution task. The core of our system was the "Relatives in Context" unsupervised approach, which ranked the candidate substitutes by web-lookup of the word sequences built combining the target context and each substitute. Our system ranked third in the final...
Conference Paper
Full-text available
We experiment with text classification of threads from Linux web user forums, in the context of improving information access to the problems and solutions described in the threads. We specifically focus on classifying threads according to: (1) them describing a specific problem vs. containing a more general discussion; (2) the completeness of the i...
Conference Paper
In this paper we describe the MELB-MKB system, as entered in the SemEval-2007 lexical substitution task. The core of our system was the "Relatives in Context" unsupervised approach, which ranked the candidate substitutes by web-lookup of the word sequences built combining the target context and each substitute. Our system ranked third in the final...
Conference Paper
Full-text available
We present an unsupervised approach to Word Sense Disambiguation (WSD). We automatically acquire English sense examples using an English-Chinese bilingual dictionary, Chinese monolingual corpora and Chinese-English machine translation software. We then train machine learning classifiers on these sense examples and test them on two gold standard Eng...
Conference Paper
Full-text available
This paper explores the use of two graph algorithms for unsupervised induction and tagging of nominal word senses based on corpora. Our main contribution is the op- timization of the free parameters of those algorithms and its evaluation against pub- licly available gold standards. We present a thorough evaluation comprising super- vised and unsupe...
Conference Paper
Full-text available
Véronis (2004) has recently proposed an innovative unsupervised algorithm for word sense disambiguation based on small-world graphs called HyperLex. This paper explores two sides of the algorithm. First, we extend Véronis' work by optimizing the free parameters (on a set of words which is different to the target set). Second, given that the empiric...
Article
Full-text available
This paper explores the split of features sets in order to obtain better wsd systems through combinations of classifiers learned over each of the split feature sets. Our results show that only k-NN is able to profit from the combination of split features, and that simple voting is not enough for that. Instead we propose combining all k-NN subsystem...