• Home
  • TU Wien
  • Institute of Software Technology and Interactive Systems
  • Allan Hanbury
Allan Hanbury

Allan Hanbury
TU Wien | TU Wien · Institute of Software Technology and Interactive Systems

About

311
Publications
70,411
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,564
Citations

Publications

Publications (311)
Preprint
Full-text available
Robust test collections are crucial for Information Retrieval research. Recently there is a growing interest in evaluating retrieval systems for domain-specific retrieval tasks, however these tasks often lack a reliable test collection with human-annotated relevance assessments following the Cranfield paradigm. In the medical domain, the TripClick...
Chapter
The majority of recent research has approached the Medical Concept Normalization (MCN) task as supervised text classification. However, combining all of the currently available training datasets for this task (CADEC, PsyTAR, COMETA) only covers a small fraction of the concepts contained in the Systematized Nomenclature of Medical-Clinical Terms (SN...
Preprint
Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems. In contrast to term-based matching, DR projects queries and documents into a dense vector space and retrieves results via (approximate) nearest neighbor search. Deploying a new system, such as DR, in...
Preprint
Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the ef...
Preprint
Full-text available
Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled targe...
Preprint
We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the orig...
Article
In English patent document information retrieval, Multi Word Terms (MWTs) are an important factor in determining how relevant a patent document is for a particular search query. Detecting the correct boundaries for these MWTs is no trivial task and often complicated by the special writing style of the patent domain. In this paper we describe a meth...
Preprint
We describe our workflow to create an engaging remote learning experience for a university course, while minimizing the post-production time of the educators. We make use of ubiquitous and commonly free services and platforms, so that our workflow is inclusive for all educators and provides polished experiences for students. Our learning materials...
Preprint
Full-text available
In this paper, we present our approaches for the case law retrieval and the legal case entailment task in the Competition on Legal Information Extraction/Entailment (COLIEE) 2021. As first stage retrieval methods combined with neural re-ranking methods using contextualized language models like BERT achieved great performance improvements for inform...
Preprint
Full-text available
Domain-specific contextualized language models have demonstrated substantial effectiveness gains for domain-specific downstream tasks, like similarity matching, entity recognition or information retrieval. However successfully applying such models in highly specific language domains requires domain adaptation of the pre-trained models. In this pape...
Preprint
An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency du...
Preprint
A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows. The neural IR community made great advancements in training effective dual-encoder dense retrieval (DR) models recently. A dense text retrieval model uses a single vector representation per quer...
Chapter
Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-r...
Conference Paper
Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-r...
Chapter
Domain specific search has always been a challenging information retrieval task due to several challenges such as the domain specific language, the unique task setting, as well as the lack of accessible queries and corresponding relevance judgements. In the last years, pretrained language models – such as BERT – revolutionized web and news search....
Preprint
Full-text available
Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-r...
Preprint
Full-text available
Domain specific search has always been a challenging information retrieval task due to several challenges such as the domain specific language, the unique task setting, as well as the lack of accessible queries and corresponding relevance judgements. In the last years, pretrained language models, such as BERT, revolutionized web and news search. Na...
Conference Paper
Full-text available
The use of stopwords has been thoroughly studied in traditional Information Retrieval systems, but remains unexplored in the context of neural models. Neural re-ranking models take the full text of both the query and document into account. Naturally, removing tokens that do not carry relevance information provides us with an opportunity to improve...
Preprint
The latency of neural ranking models at query time is largely dependent on the architecture and deliberate choices by their designers to trade-off effectiveness for higher efficiency. This focus on low query latency of a rising number of efficient ranking architectures make them feasible for production deployment. In machine learning an increasingl...
Preprint
There are many existing retrieval and question answering datasets. However, most of them either focus on ranked list evaluation or single-candidate question answering. This divide makes it challenging to properly evaluate approaches concerned with ranking documents and providing snippets or answers for a given query. In this work, we present FiRA:...
Article
Full-text available
The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not all...
Preprint
The success of crowdsourcing based annotation of text corpora depends on ensuring that crowdworkers are sufficiently well-trained to perform the annotation task accurately. To that end, a frequent approach to train annotators is to provide instructions and a few example cases that demonstrate how the task should be performed (referred to as the CON...
Preprint
Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only th...
Chapter
The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no coll...
Chapter
In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer includes...
Preprint
Search engines operate under a strict time constraint as a fast response is paramount to user satisfaction. Thus, neural re-ranking models have a limited time-budget to re-rank documents. Given the same amount of time, a faster re-ranking model can incorporate more documents than a less efficient one, leading to a higher effectiveness. To utilize t...
Preprint
Full-text available
The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no coll...
Preprint
In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer includes...
Preprint
The usage of neural network models puts multiple objectives in conflict with each other: Ideally we would like to create a neural model that is effective, efficient, and interpretable at the same time. However, in most instances we have to choose which property is most important to us. We used the opportunity of the TREC 2019 Deep Learning track to...
Article
Evidence synthesis is a key element of evidence-based medicine. However, it is currently hampered by being labour intensive meaning that many trials are not incorporated into robust evidence syntheses and that many are out of date. To overcome this, a variety of techniques are being explored, including using automation technology. Here, we describe...
Chapter
Full-text available
The Cranfield paradigm has dominated information retrieval evaluation for almost 50 years. It has had a major impact on the entire domain of information retrieval since the 1960s and, compared with systematic evaluation in other domains, is very well developed and has helped very much to advance the field. This chapter summarizes some of the shortc...
Chapter
The CLEF–IP evaluation lab ran between 2009 and 2013 with a two-fold expressed purpose: (a) to encourage research in the area of patent retrieval with a focus on cross language retrieval, and (b) to provide a large and clean data set of patent related data, in the three main European languages, for experimentation. In its first year, CLEF–IP organi...
Chapter
Ground truth is a crucial resource for the creation of effective question-answering (Q-A) systems. When no appropriate ground truth is available, as it is often the case in domain-specific Q-A systems (e.g. customer-support, tourism) or in languages other than English, new ground truth can be created by human annotation. The annotation process in w...
Conference Paper
Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words. While these terms are often removed from neural models - mainly as a concession to efficiency demands - they traditionally play an important role in the performance of IR mod...
Preprint
Full-text available
Establishing a docker-based replicability infrastructure offers the community a great opportunity: measuring the run time of information retrieval systems. The time required to present query results to a user is paramount to the users satisfaction. Recent advances in neural IR re-ranking models put the issue of query latency at the forefront. They...
Article
Full-text available
The most successful approaches to video understanding and video matching use local spatio-temporal features as a sparse representation for video content. In the last decade, a great interest in evaluation of local visual features in the domain of images is observed. The aim is to provide researchers with guidance when selecting the best approaches...
Preprint
Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words. While these terms are often removed from neural models - mainly as a concession to efficiency demands - they traditionally play an important role in the performance of IR mod...
Chapter
The training and use of word embeddings for information retrieval has recently gained considerable attention, showing competitive performance across various domains. In this study, we explore the use of word embeddings for patent retrieval, a challenging domain, especially for methods based on distributional semantics. We hypothesize that the previ...
Article
Full-text available
In the original version of this Article the values in the rightmost column of Table 1 were inadvertently shifted relative to the other columns. This has now been corrected in the PDF and HTML versions of the Article.
Preprint
Full-text available
Recent advances in word embedding provide significant benefit to various information processing tasks. Yet these dense representations and their estimation of word-to-word relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word representations i.e. vectors with clearly-defined dimensions, which can be words,...
Article
Full-text available
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis chal...
Article
Full-text available
Every information retrieval (IR) model embeds in its scoring function a form of term frequency (TF) quantification. The contribution of the term frequency is determined by the properties of the function of the chosen TF quantification, and by its TF normalization. The first defines how independent the occurrences of multiple terms are, while the se...
Conference Paper
Full-text available
In this paper, we proposed a framework to evaluate information retrieval systems in presence of multidimensional relevance. This is an important problem in tasks such as consumer health search, where the understandability and trustworthiness of information greatly influence people's decisions based on the search engine results, but common topicalit...
Article
Full-text available
Since the first MICCAI grand challenge was organized in 2007 [1], the impact of biomedical image analysis challenges on both the research field as well as on individual careers has been steadily growing. For example, the acceptance of a journal article today often depends on the performance of a new algorithm being assessed against the state-of-the...
Article
Full-text available
Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these...
Preprint
Full-text available
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis chal...
Preprint
Full-text available
BACKGROUND Understandability plays a key role in ensuring that people accessing health information are capable of gaining insights that can assist them with their health concerns and choices. The access to unclear or misleading information has been shown to negatively impact the health decisions of the general public. OBJECTIVE The aim of this stu...
Article
Full-text available
Local features and descriptors that perform well in the case of photographic images are often unable to capture the content of binary technical drawings due to their different characteristics. Motivated by this, a new local feature representation, the contextual local primitives, is proposed in this paper. It is based on the detection of the juncti...
Article
In this paper, an identification approach for the Population (e.g. patients with headache), the Intervention (e.g. aspirin) and the Comparison (e.g. vitamin C) in Randomized Controlled Trials (RCTs) is proposed. Contrary to previous approaches, the identification is done on a word level, rather than on a sentence level. Additionally, we classify th...
Article
Full-text available
We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation bench...
Conference Paper
Full-text available
Every year more than 25 test collections are built among the main Information Retrieval (IR) evaluation campaigns. They are extremely important in IR because they become the evaluation praxis for the forthcoming years. Test collections are built mostly using the pooling method. The main advantage of this method is that it drastically reduces the nu...
Conference Paper
Exploitation of term relatedness provided by word embedding has gained considerable attention in recent IR literature. However, an emerging question is whether this sort of relatedness fits to the needs of IR with respect to retrieval effectiveness. While we observe a high potential of word embedding as a resource for related terms, the incidence o...
Article
Full-text available
Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms' co-occurrences in s...
Chapter
Information retrieval on the (social) web moves from a pure term-frequency-based approach to an enhanced method that includes conceptual multimodal features on a semantic level. In this paper, we present an approach for semantic-based keyword search and focus especially on its optimization to scale it to real-world sized collections in the social m...
Chapter
Full-text available
This chapter provides an overview of the metrics used in the VISCERAL segmentation benchmarks, namely Anatomy 1, 2 and 3. In particular, it provides an overview of 20 evaluation metrics for segmentation, from which four metrics were selected to be used in VISCERAL benchmarks. It also provides an analysis of these metrics in three ways: first by ana...