Craig Macdonald’s research while affiliated with University of Glasgow and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (342)


PyTerrier operators for combining transformers.
On Precomputation and Caching in Information Retrieval Experiments with Pipeline Architectures
  • Preprint
  • File available

April 2025

Sean MacAvaney

·

Craig Macdonald

Modern information retrieval systems often rely on multiple components executed in a pipeline. In a research setting, this can lead to substantial redundant computations (e.g., retrieving the same query multiple times for evaluating different downstream rerankers). To overcome this, researchers take cached "result" files as inputs, which represent the output of another pipeline. However, these result files can be brittle and can cause a disconnect between the conceptual design of the pipeline and its logical implementation. To overcome both the redundancy problem (when executing complete pipelines) and the disconnect problem (when relying on intermediate result files), we describe our recent efforts to improve the caching capabilities in the open-source PyTerrier IR platform. We focus on two main directions: (1) automatic implicit caching of common pipeline prefixes when comparing systems and (2) explicit caching of operations through a new extension package, pyterrier-caching. These approaches allow for the best of both worlds: pipelines can be fully expressed end-to-end, while also avoiding redundant computations between pipelines.

Download


KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation

February 2025

·

4 Reads

Iterative retrieval-augmented generation (iRAG) models offer an effective approach for multi-hop question answering (QA). However, their retrieval process faces two key challenges: (1) it can be disrupted by irrelevant documents or factually inaccurate chain-of-thoughts; (2) their retrievers are not designed to dynamically adapt to the evolving information needs in multi-step reasoning, making it difficult to identify and retrieve the missing information required at each iterative step. Therefore, we propose KiRAG, which uses a knowledge-driven iterative retriever model to enhance the retrieval process of iRAG. Specifically, KiRAG decomposes documents into knowledge triples and performs iterative retrieval with these triples to enable a factually reliable retrieval process. Moreover, KiRAG integrates reasoning into the retrieval process to dynamically identify and retrieve knowledge that bridges information gaps, effectively adapting to the evolving information needs. Empirical results show that KiRAG significantly outperforms existing iRAG models, with an average improvement of 9.40% in R@3 and 5.14% in F1 on multi-hop QA.


Enhancing Recommender Systems: Deep Modality Alignment with Large Multi-Modal Encoders

February 2025

·

6 Reads

ACM Transactions on Recommender Systems

Zixuan Yi

·

Zijun Long

·

Iadh Ounis

·

[...]

·

Richard McCreadie

In recent years, the rapid growth of online multimedia services, such as e-commerce platforms, has necessitated the development of personalised recommendation approaches that can encode diverse content about each item. Indeed, modern multi-modal recommender systems exploit diverse features obtained from raw images and item descriptions to enhance the recommendation performance. However, the existing multi-modal recommender systems primarily depend on the features extracted individually from different media through pre-trained modality-specific encoders, and exhibit only shallow alignments between different modalities, thereby limiting these systems’ ability to capture the underlying relationships between the modalities. In this paper, we enhance the deep alignment of large multi-modal encoders to address the shallow alignment of modalities in multi-modal recommender systems. These encoders have previously demonstrated state-of-the-art effectiveness in ranking items across various domains. Specifically, we investigate the use of three state-of-the-art large multi-modal encoders — CLIP (dual-stream), VLMo and BEiT-3 (unified) — for recommendation tasks. We explore their benefits for recommendation through using a range of strategies, including the use of pre-trained and fine-tuned encoders, as well as the evaluation of the end-to-end training of these encoders. We show that pre-trained large multi-modal encoders generate more aligned and effective user/item representations compared to existing modality-specific encoders across four existing multi-modal recommendation datasets. Furthermore, we show that fine-tuning these encoders further improves the recommendation performance, with end-to-end training emerging as the most effective paradigm, significantly outperforming both pre-trained and fine-tuned encoders with an improved recommendation performance. We also demonstrate the effectiveness of large multi-modal encoders in facilitating modality alignment by evaluating the contribution of each modality separately. Finally, we show that the dual-stream approach, specifically CLIP, is the most effective architecture for these large multi-modal encoders, outperforming the unified approaches (i.e., VLMo and BEiT3) in terms of effectiveness and efficiency.




Improving Effectiveness by Reducing Overconfidence in Large Catalogue Sequential Recommendation with gBCE loss

October 2024

·

3 Reads

ACM Transactions on Recommender Systems

A large catalogue size is one of the central challenges in training recommendation models: a large number of items makes them memory and computationally inefficient to compute scores for all items during training, forcing these models to deploy negative sampling. However, negative sampling increases the proportion of positive interactions in the training data, and therefore, models trained with negative sampling tend to overestimate the probabilities of positive interactions – a phenomenon we call overconfidence . While the absolute values of the predicted scores/probabilities are not important for the ranking of retrieved recommendations, overconfident models may fail to estimate nuanced differences in the top-ranked items, resulting in degraded performance. In this paper, we show that overconfidence explains why the popular SASRec model underperforms when compared to BERT4Rec. This is contrary to the BERT4Rec authors’ explanation that the difference in performance is due to the bi-directional attention mechanism. To mitigate overconfidence, we propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence. We further propose the gSASRec model, an improvement over SASRec that deploys an increased number of negatives and the gBCE loss. Through detailed experiments on three datasets, we show that gSASRec does not exhibit the overconfidence problem. As a result, gSASRec can outperform BERT4Rec (e.g. +9.47% NDCG on the MovieLens-1M dataset), while requiring less training time (e.g. -73% training time on MovieLens-1M). Moreover, in contrast to BERT4Rec, gSASRec is suitable for large datasets that contain more than 1 million items. Finally, we show how addressing overconfidence can improve model calibration – the ability of a model to predict actual interaction probabilities accurately. By applying gBCE to the SASRec model on MovieLens-1M dataset, we reduce the models’ expected calibration error by 98.9% (from 0.966 to 0.01).


Enhancing Sequential Music Recommendation with Personalized Popularity Awareness

September 2024

·

67 Reads

In the realm of music recommendation, sequential recommender systems have shown promise in capturing the dynamic nature of music consumption. Nevertheless, traditional Transformer-based models, such as SASRec and BERT4Rec, while effective, encounter challenges due to the unique characteristics of music listening habits. In fact, existing models struggle to create a coherent listening experience due to rapidly evolving preferences. Moreover, music consumption is characterized by a prevalence of repeated listening, i.e., users frequently return to their favourite tracks, an important signal that could be framed as individual or personalized popularity. This paper addresses these challenges by introducing a novel approach that incorporates personalized popularity information into sequential recommendation. By combining user-item popularity scores with model-generated scores, our method effectively balances the exploration of new music with the satisfaction of user preferences. Experimental results demonstrate that a Personalized Most Popular recommender, a method solely based on user-specific popularity, outperforms existing state-of-the-art models. Furthermore, augmenting Transformer-based models with personalized popularity awareness yields superior performance, showing improvements ranging from 25.2% to 69.8%. The code for this paper is available at https://github.com/sisinflab/personalized-popularity-awareness.


Hardware Configuration
Efficiency analysis of item scoring methods. mRT is the Median Response Time, measured in milliseconds; SAS is the SASRec model and BERT is the gBERT4Rec model.
Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items

August 2024

·

20 Reads

Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of items: scaling Transformers beyond a few thousand items is problematic for several reasons, including high model memory consumption and slow inference. In this respect, RecJPQ is a state-of-the-art method of reducing the models' memory consumption; RecJPQ compresses item catalogues by decomposing item IDs into a small number of shared sub-item IDs. Despite reporting the reduction of memory consumption by a factor of up to 50x, the original RecJPQ paper did not report inference efficiency improvements over the baseline Transformer-based models. Upon analysing RecJPQ's scoring algorithm, we find that its efficiency is limited by its use of score accumulators for each item, which prevents parallelisation. In contrast, LightRec (a non-sequential method that uses a similar idea of sub-ids) reported large inference efficiency improvements using an algorithm we call PQTopK. We show that it is also possible to improve RecJPQ-based models' inference efficiency using the PQTopK algorithm. In particular, we speed up RecJPQ-enhanced SASRec by a factor of 4.5 x compared to the original SASRec's inference method and by a factor of 1.56 x compared to the method implemented in RecJPQ code on a large-scale Gowalla dataset with more than a million items. Further, using simulated data, we show that PQTopK remains efficient with catalogues of up to tens of millions of items, removing one of the last obstacles to using Transformer-based models in production environments with large catalogues.


Report on the 46th European Conference on Information Retrieval (ECIR 2024)

August 2024

·

25 Reads

ACM SIGIR Forum

The 46 th European Conference on Information Retrieval (ECIR 2024) was held in Glasgow, Scotland, during 24 th -28 th March 2024. The conference brought together over four hundred researchers from the UK, Europe and abroad. ECIR 2024 was a fully in-person conference with a total of 417 attendees, the largest number of in-person attendees of any ECIR. The conference received over 700 submissions, not including submissions to the workshops (280 Full Paper and 184 Short paper submissions). ECIR 2024 introduced a number of novelties, including a new Findings track, an IR4Good track, a new innovation called the "Collab-athon" to foster collaborations within the community, and the Keith van Rijsbergen Award to recognise researchers who have made significant contributions in using theory to advance the field of information retrieval. This report details the conference programme and events. Date : 24--28 March 2024. Website : https://www.ecir2024.org.


Citations (62)


... This external knowledge can come from various sources, such as external corpora [13], knowledge graphs (KGs) [16], or even knowledge stored within large language models (LLMs) [14]. These methods of augmenting PLMs have shown remarkable performance in several natural language processing (NLP) tasks such as question answering [4] and conversation generation [19]. However, leveraging external knowledge for enhancing information retrieval systems has not been fully explored. ...

Reference:

KEIR @ ECIR 2025: The Second Workshop on Knowledge-Enhanced Information Retrieval
TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation
  • Citing Conference Paper
  • January 2024

... In addition, recent studies [1,3,22] have highlighted the importance of integrating transformer-based and frequency-based methods. In particular, frequencybased methods can enhance user-specific recommendations by effectively capturing repeated items, which are crucial for NBR tasks. ...

Enhancing Sequential Music Recommendation with Personalized Popularity Awareness
  • Citing Conference Paper
  • October 2024

... By organizing nodes and edges along with their textual attributes, these structured repositories allow RAG systems to efficiently access additional topological information, supporting advanced operations such as entity retrieval, relationship discovery, and complex multi-hop queries. BioRED [121], QALD-9-plus [147], OpenbookQA [128], CREAK [139], TriviaQA [92], HotpotQA [202], Mintaka [156], MedQA [90], TUDataset [132], CWQ [167], Beyond I.I.D. [58], CommonsenseQA [168], SocialIQA [153], PIQA [15], RiddleSense [108], Freebase [84], ATOMIC [152], FactKG [95], MultiHop-RAG [174], T-REx [42], DBpedia [7], Yago [162] KGs generated from texts Domainspecific, easily updated LMdependent, computationally intensive GraphRAG [41], GRBK [32], ATLANTIC [133], GNN-Ret [105], HippoRAG [63], DALK [100], KGP [189], OpenSCR [66], MindMap [192], FABULA [149], GER [196], FoodGPT [148], ChatKBQA [120], MultiQG [97], HSGE [164], ReTraCk [24], RNG-KBQA [204], ArcaneQA [59], HybridRAG [154], EWEK-QA [31], KG-FiD [208], REANO [46], MedGraphRAG [193], MINERVA [29] 4. ...

REANO: Optimising Retrieval-Augmented Reader Models through Knowledge Graph Generation
  • Citing Conference Paper
  • January 2024

... Inspired by the effectiveness of SASRec [20], numerous sequential recommenders adopt self-attention encoder [46] as the backbone sequence model [6,39,41]. Among them, all CDSR models instantiate multiple self-attention encoders for modeling sequences from different domains [4,5,8,27,52]. ...

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling (Extended Abstract)
  • Citing Conference Paper
  • August 2024

... This approach discards low-quality pages from the search index. As an example of this, Chang et al. recently proposed a neural quality estimator that approximates semantic quality and has proven strongly effective for static indexing pruning [1]. Meanwhile, to speed up the retrieval of high-quality results at query processing time, search engines can implement tiered indexing [3]. ...

Neural Passage Quality Estimation for Static Pruning
  • Citing Conference Paper
  • July 2024

... However, the input query concepts usually do not occur directly in communications due to the writing preference of suspects or anti-forensic measures. Therefore, the innate semantic conceptual view in communications needs to be modeled, e.g., by knowledge graph [24] or probabilistic model [25]. Subsequently, a possible semantic path from investigators to concepts to (recommended) tokens can be revealed by minimizing the difference between these two semantic views, e.g., by contrastive learning. ...

Knowledge Graph Cross-View Contrastive Learning for Recommendation
  • Citing Chapter
  • March 2024

Lecture Notes in Computer Science

... Meanwhile, methods such as LexBoost [13] focus on improving lexical retrieval with graph-based score formulations. In addition, weighted bipartite query-document graphs have been used to include user query characteristics in graph navigation [6]. In contrast with these existing methods that use lexical or semantic signals to construct a corpus graph, we propose GRIT, which uses user click/train annotation data to build a product-product similarity graph. ...

Effective Adhoc Retrieval Through Traversal of a Query-Document Graph
  • Citing Chapter
  • March 2024

Lecture Notes in Computer Science

... Therefore, several recommendation paradigms exist that leverage social connections among users to augment the modeling of user-item interactions with supplementary information sources. These methods(e.g., SFRec [4], SGP [5]) explicitly capture the inter-user influence or cross-user impact of social preferences in recommendations. ...

A Social-aware Gaussian Pre-trained model for effective cold-start recommendation
  • Citing Article
  • March 2024

Information Processing & Management

... It extends the query by utilizing information about the entities in the linked knowledge base such as attributes, categories, associated entities, etc. Tran and Yates [31] effectively improves the retrieval by fusing the text representation and the representation of different entity views. KGPR [10] uses knowledge graphs (KGs) as additional inputs to provide the background knowledge. It uses an entity linking tool to identify entities in queries and passages and then utilizes subgraphs extracted from large KGs that are relevant to queries and passages to improve retrieval performance. ...

KGPR: Knowledge Graph Enhanced Passage Ranking
  • Citing Conference Paper
  • October 2023