Lina Yao

Lina Yao
CSIRO Data61 and UNSW

Doctor of Philosophy

About

540
Publications
184,791
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,996
Citations
Introduction
I am leading my research group Data Dynamics Lab (D2 Lab) founded in 2016. We strive for developing generalizable and explainable data-efficient data mining, machine learning and deep learning algorithms—as well as designing systems and interfaces—to enable novel ways of human-machine interactions, including an improved understanding of challenges such as robustness, trust, explainability and resilience that improve human-autonomy partnership.

Publications

Publications (540)
Article
Full-text available
Understanding and recognizing human activities is a fundamental research topic for a wide range of important applications such as fall detection and remote health monitoring and intervention. Despite active research in human activity recognition over the past years, existing approaches based on computer vision or wearable sensor technologies presen...
Preprint
Full-text available
A common challenge for most current recommender systems is the cold-start problem. Due to the lack of user-item interactions, the fine-tuned recommender systems are unable to handle situations with new users or new items. Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e. predicting the user preference...
Preprint
Full-text available
In light of the emergence of deep reinforcement learning (DRL) in recommender systems research and several fruitful results in recent years, this survey aims to provide a timely and comprehensive overview of the recent trends of deep reinforcement learning in recommender systems. We start with the motivation of applying DRL in recommender systems....
Preprint
Full-text available
Conditional Neural Processes~(CNPs) bridge neural networks with probabilistic inference to approximate functions of Stochastic Processes under meta-learning settings. Given a batch of non-{\it i.i.d} function instantiations, CNPs are jointly optimized for in-instantiation observation prediction and cross-instantiation meta-representation adaptation...
Preprint
Full-text available
The standard approaches to neural network implementation yield powerful function approximation capabilities but are limited in their abilities to learn meta representations and reason probabilistic uncertainties in their predictions. Gaussian processes, on the other hand, adopt the Bayesian learning scheme to estimate such uncertainties but are con...
Preprint
In offline reinforcement learning-based recommender systems (RLRS), learning effective state representations is crucial for capturing user preferences that directly impact long-term rewards. However, raw state representations often contain high-dimensional, noisy information and components that are not causally relevant to the reward. Additionally,...
Article
Full-text available
With the advent and progression of Natural Language Processing (NLP) methodologies, the domain of automatic citation function classification has gained popularity and considerable research efforts have been contributed to this task. Automatic citation function classification has a joint computational linguistic and bibliometrics background. However...
Article
Objective. Functional magnetic resonance imaging (fMRI) is often modeled as networks of Regions of Interest (ROIs) and their functional connectivity to study brain functions and mental disorders. Limited fMRI data due to high acquisition costs hampers recognition model performance. We aim to address this issue using generative diffusion models for...
Article
The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction in the era of artificial intelligence. With the gradually complex human-computer interaction requirements, it is difficult for traditional text-based dialogue system to meet...
Preprint
Full-text available
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse plat...
Preprint
Full-text available
Recent advancements in text-to-speech (TTS) systems, such as FastSpeech and StyleSpeech, have significantly improved speech generation quality. However, these models often rely on duration generated by external tools like the Montreal Forced Aligner, which can be time-consuming and lack flexibility. The importance of accurate duration is often unde...
Preprint
Full-text available
Diffusion-based Generative AI gains significant attention for its superior performance over other generative techniques like Generative Adversarial Networks and Variational Autoencoders. While it has achieved notable advancements in fields such as computer vision and natural language processing, their application in speech generation remains under-...
Preprint
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of seen attributes and objects. Current CLIP-based methods in CZSL, despite their advancements, often fail to effectively understand and link the attributes and objects due to inherent limitations in CLIP's pretraining mechanisms. To address these shortcomings, this paper...
Preprint
We investigate whether the pre-trained knowledge of vision-language models (VLMs), such as CLIP, can be retained or even enhanced during continual learning (CL) while absorbing knowledge from a data stream. Existing methods often rely on additional reference data, isolated components for distribution or domain predictions, leading to high training...
Preprint
Full-text available
Recent cross-domain recommendation (CDR) studies assume that disentangled domain-shared and domain-specific user representations can mitigate domain gaps and facilitate effective knowledge transfer. However, achieving perfect disentanglement is challenging in practice, because user behaviors in CDR are highly complex, and the true underlying user p...
Preprint
Full-text available
As the significance of understanding the cause-and-effect relationships among variables increases in the development of modern systems and algorithms, learning causality from observational data has become a preferred and efficient approach over conducting randomized control trials. However, purely observational data could be insufficient to reconst...
Preprint
Full-text available
We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by...
Article
Trajectory prediction is fundamental to various intelligent technologies, such as autonomous driving and robotics. The motion prediction of pedestrians and vehicles helps emergency braking, reduces collisions, and improves traffic safety. Current trajectory prediction research faces problems of complex social interactions, high dynamics and multi-m...
Preprint
Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize LLMs based on the proposed evaluation method. To enable offline feedback with rich knowledge and rea...
Preprint
Sequential recommender systems (SRSs) aim to predict the subsequent items which may interest users via comprehensively modeling users' complex preference embedded in the sequence of user-item interactions. However, most of existing SRSs often model users' single low-level preference based on item ID information while ignoring the high-level prefere...
Preprint
Sequential recommendation aims to predict the next item which interests users via modeling their interest in items over time. Most of the existing works on sequential recommendation model users' dynamic interest in specific items while overlooking users' static interest revealed by some static attribute information of items, e.g., category, or bran...
Preprint
Recent advancements in diffusion models have shown promising results in sequential recommendation (SR). However, current diffusion-based methods still exhibit two key limitations. First, they implicitly model the diffusion process for target item embeddings rather than the discrete target item itself, leading to inconsistency in the recommendation...
Conference Paper
Full-text available
Continual learning (CL) aims to help deep neural networks to learn new knowledge while retaining what has been learned. Recently, pre-trained vision-language models such as CLIP [67], with powerful generalizability, have been gaining traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tas...
Preprint
Full-text available
Personalized text-to-image diffusion models have grown popular for their ability to efficiently acquire a new concept from user-defined text descriptions and a few images. However, in the real world, a user may wish to personalize a model on multiple concepts but one at a time, with no access to the data from previous concepts due to storage/privac...
Preprint
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images. Despite their effectiveness, these models often require significant computational resources owing to their numerous sequential denoising steps and the significant inference cost of each step. Recently, Neural Architecture Search (NAS) techniques hav...
Preprint
Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during data collection. Federated learning offers a solution by allowing multiple clients to collaboratively train LLMs without sharing local data. However, FL introduc...
Article
Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users’ short-term preferences and aims to provide a more dynamic and timely recommendation based on ongoing interactions. This survey presents...
Preprint
Full-text available
Functional magnetic resonance imaging (fMRI) is an emerging neuroimaging modality that is commonly modeled as networks of Regions of Interest (ROIs) and their connections, named functional connectivity, for understanding the brain functions and mental disorders. However, due to the high cost of fMRI data acquisition and labeling, the amount of fMRI...
Preprint
Full-text available
Multimodal large language models (MLLMs) equip pre-trained large-language models (LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied, visual prompting has emerged for more fine-grained and free-form visual instructions. This paper presents the first comprehensive survey on visual prompting methods in MLLMs, focu...
Preprint
Recent advances in machine learning algorithms have garnered growing interest in developing versatile Embodied AI systems. However, current research in this domain reveals opportunities for improvement. First, the direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data mo...
Preprint
Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions, but recent studies reveal that these models often fail to achieve satisfactory results on questions involving reasoning, such as mathematics or physics questions. This phenomenon is usually attributed to the uncertainty regard...
Preprint
Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process. Nevertheless, there is no consensus regarding a universally accepted definition for the objective of disentangled representation learning...
Preprint
Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In t...
Preprint
Full-text available
This paper introduces StyleSpeech, a novel Text-to-Speech~(TTS) system that enhances the naturalness and accuracy of synthesized speech. Building upon existing TTS technologies, StyleSpeech incorporates a unique Style Decorator structure that enables deep learning models to simultaneously learn style and phoneme features, improving adaptability and...
Preprint
Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks. The major challenge is how to efficiently find the synergy through cooperative learning where LLMs adapt their reasoning abilities in downstream tasks while feature encoders adjust their e...
Preprint
Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable...
Preprint
In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user prefe...
Preprint
Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI...
Article
Subject-independent Electroencephalography (EEG) recognition remains challenging due to inherent variability of brain anatomy across different subjects. Such variability is further complicated by the Volume Conduction Effect (VCE) that introduces channel-interference noise, exacerbating subject-specific biases in the recorded EEG signals. Existing...
Article
Content delivery networks (CDNs) play a pivotal role in the modern internet infrastructure by enabling efficient content delivery across diverse geographical regions. As an essential component of CDNs, the edge caching scheme directly influences the user experience by determining the caching and eviction of content on edge servers. With the emergen...
Preprint
The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presen...
Article
Generative adversarial network (GAN) has achieved remarkable success in generating high-quality synthetic data by learning the underlying distributions of target data. Recent efforts have been devoted to utilizing optimal transport (OT) to tackle the gradient vanishing and instability issues in GAN. They use the Wasserstein distance as a metric to...
Preprint
Reinforcement learning-based recommender systems have recently gained popularity. However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains. To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for...
Preprint
While Positive-Unlabeled (PU) learning is vital in many real-world scenarios, its application to graph data still remains under-explored. We unveil that a critical challenge for PU learning on graph lies on the edge heterophily, which directly violates the irreducibility assumption for Class-Prior Estimation (class prior is essential for building P...
Preprint
Full-text available
Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this w...
Article
Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its interactive nature. The training of reinforcement learning-based recommender systems demands expensive o...
Article
Human activity recognition is a well‐established research problem in ubiquitous computing. The increased dependency on various smart devices in our daily lives allows us to investigate the sensor data world produced by multimodal sensors embedded in smart devices. However, the raw sensor data are often unlabeled and annotating this vast amount of d...
Article
Full-text available
Truth discovery is the fundamental technique for resolving the conflicts between the information provided by different data sources by detecting the true values. Traditional methods assume that each data item has only one true value and therefore cannot deal with the circumstances where one data item has multiple true values (i.e., multi-value trut...
Article
Full-text available
As a significant application of machine learning in financial scenarios, loan default risk prediction aims to evaluate the client’s default probability. However, most existing deep learning solutions treat each application as an independent individual, neglecting the explicit connections among different application records. Besides, these attempts...
Article
Full-text available
Instrumental variables (IVs), widely applied in economics and healthcare, enable consistent counterfactual prediction in the presence of hidden confounding factors, effectively addressing endogeneity issues. The prevailing IV-based counterfactual prediction methods typically rely on the availability of valid IVs (satisfying Relevance, Exclusivity,...
Preprint
Full-text available
Neurodevelopmental disorders, such as Attention Deficit/Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD), are characterized by comorbidity and heterogeneity. Identifying distinct subtypes within these disorders can illuminate the underlying neurobiological and clinical characteristics, paving the way for more tailored treatments. We...
Preprint
Full-text available
Causal graph recovery is essential in the field of causal inference. Traditional methods are typically knowledge-based or statistical estimation-based, which are limited by data collection biases and individuals' knowledge about factors affecting the relations between variables of interests. The advance of large language models (LLMs) provides oppo...
Article
As a fundamental aspect of human life, two-person interactions contain meaningful information about people’s activities, relationships, and social settings. Human action recognition serves as the foundation for many smart applications, with a strong focus on personal privacy. However, recognizing two-person interactions poses more challenges due to...
Article
Full-text available
The right to be forgotten (RTBF) allows individuals to request the removal of personal information from online platforms. Researchers have proposed machine unlearning algorithms as a solution for erasing specific data from trained models to support RTBF. However, these methods modify how data are fed into the model and how training is done, which m...
Article
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to a fixed structure or uniform parameterization to encapsulate invariant features can inadvertently blend specific aspects. Such an approach struggles with nuance...
Chapter
Unmanned Aerial Vehicles (UAVs) possess significant advantages in terms of mobility and range compared to traditional surveillance cameras. Human action detection from UAV images has the potential to assist in various fields, including search and rescue operations. However, UAV images present challenges such as varying heights, angles, and the pres...
Article
Recent advances in recommender systems have proved the potential of reinforcement learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack...
Chapter
Multivariate time series classification is crucial for various applications such as activity recognition, disease diagnosis, and brain-computer interfaces. Deep learning methods have recently achieved promising performance thanks to their powerful representation learning capacity. However, existing deep learning-based classifiers rely solely on tem...
Conference Paper
Ascertaining counterfactual questions, for instance, “Would individuals with diabetes have exhibited better if they had opted for a different medication?”, is a frequent pursuit in research. Observational studies have become increasingly significant in addressing such queries due to their extensive availability and ease of acquisition relative to R...
Article
The task of Open-World Compositional Zero-Shot Learning (OW-CZSL) is to recognize novel state-object compositions in images from all possible compositions, where the novel compositions are absent during the training stage. The performance of conventional methods degrades significantly due to the large cardinality of possible compositions. Some rece...