Julian McAuley’s research while affiliated with University of California, San Diego and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (406)


GENNEXT: The Next Generation of IR and Recommender Systems with Language Agents, Generative Models, and Conversational AI
  • Conference Paper
  • Full-text available

July 2025

·

72 Reads

·

·

Enrico Palumbo

·

[...]

·

Julian Mcauley

We present GENNEXT, a workshop dedicated to exploring the integration of language agents, generative models, and conversational AI within information retrieval (IR) and recommender systems (RS). Building on the success of our recent RecSys'24 workshop, GENNEXT aims to advance discussions on the applications of language agents powered by Large Language Models (LLMs). The workshop will focus on enhancing interactivity between users and systems through multi-turn dialogues, improving creative content generation, advancing personalization, and enabling multifaceted, context-aware decision-making. For example, a language agent could respond to a query like "Suggest an eco-friendly food tour for a weekend in my city" by using a recommendation API to identify eateries specializing in sustainable or organic cuisine and a pollution API to ensure the selected routes have low air pollution levels. GENNEXT will bring together leading researchers and practitioners through keynotes, paper presentations, and a panel discussion. We invite full papers, short papers, and extended abstracts covering theoretical advancements, practical applications, and evaluation strategies for generative technologies in IR and RS. The workshop will address key themes such as conversational adaptation , generative content creation, and agentic tool usage, while tackling challenges like bias, data privacy, and hallucination risks. Overall, our main ambition is to foster dialogue on creating ethical, sustainable, and innovative systems while addressing emerging opportunities and risks in modern IR and RS. CCS Concepts • Information systems → Recommender systems. This work is licensed under a Creative Commons Attribution 4.0 International License. SIGIR '25, 1 Motivation and Relevance to SIGIR Modern Information Retrieval (IR) and Recommender Systems (RS) are experiencing a profound change with the advent of Large Language Models (LLMs) [4, 5]. Traditional algorithms often rely on static features or past user-item interactions, whereas language-agent systems dynamically integrate world knowledge, language understanding, reasoning, and planning abilities to improve and expand the capabilities of IR and RS in a tangible manner [7, 16-18]. At a high level, language agents typically include an LLM component and a set of tools that the LLM can use. The LLM component can reason about complex queries, model nuanced user taste profiles , generate text for explaining results, and ask follow-ups, and refines queries in conversational setups. Crucially, the LLM also decides which tool(s) to invoke (e.g., traditional recommender and search systems, general purpose APIs) for improved accuracy and relevance. For instance, given a complex request such as 'Find a good jacket for me to buy for my trip to New York next week' issues to a language agent, the LLM component may decide to invoke first a Weather API to check the weather forecasts in New York City in the following week, then reason on what type of jacket is most appropriate to that weather, and finally invoke a recommendation API to suggest a curated set of products. This pipeline shows how LLM-driven tool usage can support users through more personalized and context-sensitive recommendations.

Download

Figure 1: (a) Unidirectional transitions in models like RNNs [15, 18] process items in strict temporal order. (b) Bidirectional models such as BERT4Rec [45] allow context flow in both directions, but only between adjacent items. (c) Undirected item graph from user sequences. Solid lines show the initial symmetric graph S ′ . Orange dashed lines indicate higher-order relationships learned via multi-hop diffusion (Eq. 2) on S ′ , which aggregates weighted sequential paths (α and d). The strength of sequence-based connections between items, derived from the aggregated weighted paths of the diffusion process, is reflected in these symmetric links, making the graph ready for spectral filtering. This sequence is a running example to illustrate graph construction and transition modeling.
Figure 3: Multi-hop diffusion (Eq. 2) on the symmetric base graph S ′ (derived from initial directed sequences) with α = 0.5. Top: S ′ . Middle: Powers (S ′ ) k . Bottom: S (3) encoding decayed proximity.
Figure 4: Frequency response curves of different filter configurations all datasets. The low-pass filter (blue) preserves signals in the lower 30% of the spectrum. The band-pass filter (green) selectively amplifies mid-frequency components associated with personalized patterns. The combined filter response (purple) illustrates how dual-filter captures complementary signals across the spectrum.
Statistics of evaluation datasets.
Filter parameters across datasets.

+2

GSPRec: Temporal-Aware Graph Spectral Filtering for Recommendation

May 2025

·

1 Read

Graph-based recommendation systems are effective at modeling collaborative patterns but often suffer from two limitations: overreliance on low-pass filtering, which suppresses user-specific signals, and omission of sequential dynamics in graph construction. We introduce GSPRec, a graph spectral model that integrates temporal transitions through sequentially-informed graph construction and applies frequency-aware filtering in the spectral domain. GSPRec encodes item transitions via multi-hop diffusion to enable the use of symmetric Laplacians for spectral processing. To capture user preferences, we design a dual-filtering mechanism: a Gaussian bandpass filter to extract mid-frequency, user-level patterns, and a low-pass filter to retain global trends. Extensive experiments on four public datasets show that GSPRec consistently outperforms baselines, with an average improvement of 6.77% in NDCG@10. Ablation studies show the complementary benefits of both sequential graph augmentation and bandpass filtering.


Fast Text-to-Audio Generation with Adversarial Post-Training

May 2025

·

22 Reads

Text-to-audio systems, while increasingly performant, are slow at inference time, thus making their latency unpractical for many creative applications. We present Adversarial Relativistic-Contrastive (ARC) post-training, the first adversarial acceleration algorithm for diffusion/flow models not based on distillation. While past adversarial post-training methods have struggled to compare against their expensive distillation counterparts, ARC post-training is a simple procedure that (1) extends a recent relativistic adversarial formulation to diffusion/flow post-training and (2) combines it with a novel contrastive discriminator objective to encourage better prompt adherence. We pair ARC post-training with a number optimizations to Stable Audio Open and build a model capable of generating \approx12s of 44.1kHz stereo audio in \approx75ms on an H100, and \approx7s on a mobile edge-device, the fastest text-to-audio model to our knowledge.


Figure 2: Illustration of our workflow of defending against indirect prompt injection. taking the maximum value across time steps,
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks

April 2025

·

3 Reads

Large Language Models (LLMs) are identified as being susceptible to indirect prompt injection attack, where the model undesirably deviates from user-provided instructions by executing tasks injected in the prompt context. This vulnerability stems from LLMs' inability to distinguish between data and instructions within a prompt. In this paper, we propose CachePrune that defends against this attack by identifying and pruning task-triggering neurons from the KV cache of the input prompt context. By pruning such neurons, we encourage the LLM to treat the text spans of input prompt context as only pure data, instead of any indicator of instruction following. These neurons are identified via feature attribution with a loss function induced from an upperbound of the Direct Preference Optimization (DPO) objective. We show that such a loss function enables effective feature attribution with only a few samples. We further improve on the quality of feature attribution, by exploiting an observed triggering effect in instruction following. Our approach does not impose any formatting on the original prompt or introduce extra test-time LLM calls. Experiments show that CachePrune significantly reduces attack success rates without compromising the response quality. Note: This paper aims to defend against indirect prompt injection attacks, with the goal of developing more secure and robust AI systems.


Symbolic Representation for Any-to-Any Generative Tasks

April 2025

We propose a symbolic generative task description language and a corresponding inference engine capable of representing arbitrary multimodal tasks as structured symbolic flows. Unlike conventional generative models that rely on large-scale training and implicit neural representations to learn cross-modal mappings, often at high computational cost and with limited flexibility, our framework introduces an explicit symbolic representation comprising three core primitives: functions, parameters, and topological logic. Leveraging a pre-trained language model, our inference engine maps natural language instructions directly to symbolic workflows in a training-free manner. Our framework successfully performs over 12 diverse multimodal generative tasks, demonstrating strong performance and flexibility without the need for task-specific tuning. Experiments show that our method not only matches or outperforms existing state-of-the-art unified models in content quality, but also offers greater efficiency, editability, and interruptibility. We believe that symbolic task representations provide a cost-effective and extensible foundation for advancing the capabilities of generative AI.


Fig. 1. Three Paradigms of FM-Powered Recommender Systems
Fig. 4. An illustration of the generative paradigm for recommendation. User preference inputs (e.g., the profile description, behavior prompts, and task instructions) are utilized to guide the pre-trained foundation models (FM) for RS. The model can be leveraged in a non-tuning manner by directly utilizing its capabilities or via fine-tuning for specific recommendation tasks, producing various forms of generated recommendations such as item generation, explanation generation, and conversation generation.
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms

April 2025

·

119 Reads

Recommender systems (RS) have become essential in filtering information and personalizing content for users. RS techniques have traditionally relied on modeling interactions between users and items as well as the features of content using models specific to each task. The emergence of foundation models (FMs), large scale models trained on vast amounts of data such as GPT, LLaMA and CLIP, is reshaping the recommendation paradigm. This survey provides a comprehensive overview of the Foundation Models for Recommender Systems (FM4RecSys), covering their integration in three paradigms: (1) Feature-Based augmentation of representations, (2) Generative recommendation approaches, and (3) Agentic interactive systems. We first review the data foundations of RS, from traditional explicit or implicit feedback to multimodal content sources. We then introduce FMs and their capabilities for representation learning, natural language understanding, and multi-modal reasoning in RS contexts. The core of the survey discusses how FMs enhance RS under different paradigms. Afterward, we examine FM applications in various recommendation tasks. Through an analysis of recent research, we highlight key opportunities that have been realized as well as challenges encountered. Finally, we outline open research directions and technical challenges for next-generation FM4RecSys. This survey not only reviews the state-of-the-art methods but also provides a critical analysis of the trade-offs among the feature-based, the generative, and the agentic paradigms, outlining key open issues and future research directions.



In-context Ranking Preference Optimization

April 2025

·

4 Reads

Recent developments in Direct Preference Optimization (DPO) allow large language models (LLMs) to function as implicit ranking models by maximizing the margin between preferred and non-preferred responses. In practice, user feedback on such lists typically involves identifying a few relevant items in context rather than providing detailed pairwise comparisons for every possible item pair. Moreover, many complex information retrieval tasks, such as conversational agents and summarization systems, critically depend on ranking the highest-quality outputs at the top, emphasizing the need to support natural and flexible forms of user feedback. To address the challenge of limited and sparse pairwise feedback in the in-context setting, we propose an In-context Ranking Preference Optimization (IRPO) framework that directly optimizes LLMs based on ranking lists constructed during inference. To further capture flexible forms of feedback, IRPO extends the DPO objective by incorporating both the relevance of items and their positions in the list. Modeling these aspects jointly is non-trivial, as ranking metrics are inherently discrete and non-differentiable, making direct optimization difficult. To overcome this, IRPO introduces a differentiable objective based on positional aggregation of pairwise item preferences, enabling effective gradient-based optimization of discrete ranking metrics. We further provide theoretical insights showing that IRPO (i) automatically emphasizes items with greater disagreement between the model and the reference ranking, and (ii) links its gradient to an importance sampling estimator, yielding an unbiased estimator with reduced variance. Empirical results show IRPO outperforms standard DPO approaches in ranking performance, highlighting its effectiveness in aligning LLMs with direct in-context ranking preferences.


Figure 3: Performance metrics for LlaMA3-1B across Redial and Inspired datasets.
Figure 5: Performance metrics for Gemma2-2B across Redial and Inspired datasets.
Figure 8: Comparison of model performance when trained on in-domain data versus synthetic data generated via active augmentation
From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

April 2025

·

3 Reads

Conversational recommender systems (CRS) typically require extensive domain-specific conversational datasets, yet high costs, privacy concerns, and data-collection challenges severely limit their availability. Although Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities, practical applications often favor smaller, internally managed recommender models due to scalability, interpretability, and data privacy constraints, especially in sensitive or rapidly evolving domains. However, training these smaller models effectively still demands substantial domain-specific conversational data, which remains challenging to obtain. To address these limitations, we propose an active data augmentation framework that synthesizes conversational training data by leveraging black-box LLMs guided by active learning techniques. Specifically, our method utilizes publicly available non-conversational domain data, including item metadata, user reviews, and collaborative signals, as seed inputs. By employing active learning strategies to select the most informative seed samples, our approach efficiently guides LLMs to generate synthetic, semantically coherent conversational interactions tailored explicitly to the target domain. Extensive experiments validate that conversational data generated by our proposed framework significantly improves the performance of LLM-based CRS models, effectively addressing the challenges of building CRS in no- or low-resource scenarios.


Improving In-Context Learning with Reasoning Distillation

April 2025

Language models rely on semantic priors to perform in-context learning, which leads to poor performance on tasks involving inductive reasoning. Instruction-tuning methods based on imitation learning can superficially enhance the in-context learning performance of language models, but they often fail to improve the model's understanding of the underlying rules that connect inputs and outputs in few-shot demonstrations. We propose ReDis, a reasoning distillation technique designed to improve the inductive reasoning capabilities of language models. Through a careful combination of data augmentation, filtering, supervised fine-tuning, and alignment, ReDis achieves significant performance improvements across a diverse range of tasks, including 1D-ARC, List Function, ACRE, and MiniSCAN. Experiments on three language model backbones show that ReDis outperforms equivalent few-shot prompting baselines across all tasks and even surpasses the teacher model, GPT-4o, in some cases. ReDis, based on the LLaMA-3 backbone, achieves relative improvements of 23.2%, 2.8%, and 66.6% over GPT-4o on 1D-ARC, ACRE, and MiniSCAN, respectively, within a similar hypothesis search space. The code, dataset, and model checkpoints will be made available at https://github.com/NafisSadeq/reasoning-distillation.git.


Citations (19)


... Conversational recommendation has emerged as a promising framework for e-commerce and digital entertainment, offering an interactive channel for users to express their preferences through natural language [8,18,19,31,62]. Unlike traditional recommendation approaches that rely primarily on past interactions, conversational recommender systems aim to find relevant and personalized items by engaging in a dialogue with users. ...

Reference:

LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation
Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation
  • Citing Conference Paper
  • March 2025

... In the past few years, researchers have also addressed a variety of additional research questions in the broader context of calibration techniques. In terms of the features to be used for calibration, i.e., the calibration target, several work focus on specific item features such as genres of movies, characteristics of musical tracks or item categories [19,20,43,50,51]. A number of other works, in contrast, address problems of potential biases in the recommendations and use calibration as a countermeasure to mitigate these biases. ...

Calibration-Disentangled Learning and Relevance-Prioritized Reranking for Calibrated Sequential Recommendation
  • Citing Conference Paper
  • October 2024

... A primary challenge lies in the brevity of typical conversations, which often consist of only a few sentences and lack sufficient contextual information for accurate user preference understanding. To tackle this issue, prior work introduces external knowledge sources, such as knowledge graphs [8,24,40], large language models (LLMs) [13,36], and conversational recommendation corpora [8,34]. Building on these resources, they design specialized alignment strategies, such as prompt learning [8] and instruction tuning [36], to integrate the additional knowledge for improved user preference understanding and item recommendation. ...

Neighborhood-Based Collaborative Filtering for Conversational Recommendation
  • Citing Conference Paper
  • October 2024

... This move has made it conceivable to construct proposal systems [2] that can provide personalized encounters in real-time, with distant more prominent precision and relevance. This paper aims to use a generative AI [5]model to make a clothing suggestion framework able of translating detailed user inputs such as physical characteristics, fashion inclinations, and stylish tastes, to supply customized furnish proposals. By analyzing these person traits, the framework can create personalized suggestions that improve a user's appearance and offer assistance them find styles that are adjusted with their interesting inclinations. ...

Recommendation with Generative Models

Foundations and Trends® in Information Retrieval

... Retrieval models play a role in handling databases by providing dependable and timely information essential for knowledge-centric activities. The combination of retrieval models with language models has led to the development of Retrieval Augmented Generation (RAG) in light of the increasing prevalence of AI-generated content (AIGC) in [3,4]. This advancement boosts models by integrating data to enhance the overall quality of generated content [5]. ...

CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation
  • Citing Conference Paper
  • August 2024

... This tutorial will address the potential societal impacts of integrating generative models into recommender systems, including ethical considerations, privacy concerns, and the need for transparency and fairness. It will also cover evaluation and benchmarks, as well as toolkits/systems for reproducing Gen-RecSys research [8]. By promoting responsible AI practices, the tutorial aims to contribute to the development of more equitable and effective systems within the landscape of Gen-RecSys. ...

The 1st International Workshop on Risks, Opportunities, and Evaluation of Generative Models in Recommendation (ROEGEN)

... This personalization can be achieved through various techniques, including user feedback, user modeling, and reinforcement learning, among others [302]. For example, ChatGPT utilizes prior user input to generate more customized responses [152,175,303,304]. The level of customization varies according to the context, goals, and learning techniques employed. ...

First Workshop on Generative AI for Recommender Systems and Personalization
  • Citing Conference Paper
  • August 2024

... Modern Information Retrieval (IR) and Recommender Systems (RS) are experiencing a profound change with the advent of Large Language Models (LLMs) [4,5]. Traditional algorithms often rely on static features or past user-item interactions, whereas languageagent systems dynamically integrate world knowledge, language understanding, reasoning, and planning abilities to improve and expand the capabilities of IR and RS in a tangible manner [7,[16][17][18]. ...

A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)