About
219
Publications
21,427
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,380
Citations
Introduction
Dongsheng Li is currently a principal research manager with Microsoft Research Asia (MSRA), Shanghai, leading the MSRA Shanghai AI/ML Group. Meanwhile, he is an adjunct professor with School of Computer Science, Fudan University. His research interests focus on machine learning and its applications in brain science and healthcare. He has received several awards in recent years, including the 2018 IBM Corporate Award and the 2023 China Intelligent Computing Technology Innovators Award.
Additional affiliations
April 2015 - February 2020
IBM Research - China
Position
- Researcher
August 2010 - February 2011
September 2012 - March 2015
Education
September 2007 - June 2012
September 2003 - July 2007
Publications
Publications (219)
Sequential recommendation methods can capture dynamic user preferences from user historical interactions to achieve better performance. However, most existing methods only use past information extracted from user historical interactions to train the models, leading to the deviations of user preference modeling. Besides past information, future info...
Language models (LMs) only pretrained on a general and massive corpus usually cannot attain satisfying performance on domain-specific downstream tasks, and hence, applying domain-specific pretraining to LMs is a common and indispensable practice. However, domain-specific pretraining can be costly and time-consuming, hindering LMs' deployment in rea...
Personalized algorithms can inadvertently expose users to discomforting recommendations, potentially triggering negative consequences. The subjectivity of discomfort and the black-box nature of these algorithms make it challenging to effectively identify and filter such content. To address this, we first conducted a formative study to understand us...
Benchmarking the capabilities and limitations of large language models (LLMs) in graph-related tasks is becoming an increasingly popular and crucial area of research. Recent studies have shown that LLMs exhibit a preliminary ability to understand graph structures and node features. However, the potential of LLMs in graph pattern mining remains larg...
The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency,...
Studying the evolution of online news communities is essential for improving the effectiveness of news recommender systems. Traditionally, this has been done through empirical research based on static data analysis. While this approach has yielded valuable insights for optimizing recommender system designs, it is limited by the lack of appropriate...
The rapid evolution of the web has led to an exponential growth in content. Recommender systems play a crucial role in human–computer interaction (HCI) by tailoring content based on individual preferences. Despite their importance, challenges persist in balancing recommendation accuracy with user satisfaction, addressing biases while preserving use...
Advancements in non-invasive electroencephalogram (EEG)-based Brain-Computer Interface (BCI) technology have enabled communication through brain activity, offering significant potential for individuals with motor impairments. Existing methods for decoding characters or words from EEG recordings either rely on continuous external stimulation for hig...
Video generation using diffusion-based models is constrained by high computational costs due to the frame-wise iterative diffusion process. This work presents a Diffusion Reuse MOtion (Dr. Mo) network to accelerate latent video generation. Our key discovery is that coarse-grained noises in earlier denoising steps have demonstrated high motion consi...
Due to low success rates and long cycles of traditional drug development, the clinical tendency is to apply omics techniques to reveal patient-level disease characteristics and individualized responses to treatment. However, the heterogeneous form of data and uneven distribution of targets make drug discovery and precision medicine a non-trivial ta...
Prompt learning has become a prevalent strategy for adapting vision-language foundation models (VLMs) such as CLIP to downstream tasks. With the emergence of large language models (LLMs), recent studies have explored the potential of using category-related descriptions to enhance prompt effectiveness. However, conventional descriptions lack explici...
Recent advancements for large-scale pre-training with neural signals such as electroencephalogram (EEG) have shown promising results, significantly boosting the development of brain-computer interfaces (BCIs) and healthcare. However, these pre-trained models often require full fine-tuning on each downstream task to achieve substantial improvements,...
Large Language Models (LLMs) have achieved great success in various reasoning tasks. In this work, we focus on the graph reasoning ability of LLMs. Although theoretical studies proved that LLMs are capable of handling graph reasoning tasks, empirical evaluations reveal numerous failures. To deepen our understanding on this discrepancy, we revisit t...
Video temporal grounding is an emerging topic aiming to identify specific clips within videos. In addition to pre-trained video models, contemporary methods utilize pre-trained vision-language models (VLM) to capture detailed characteristics of diverse scenes and objects from video frames. However, as pre-trained on images, VLM may struggle to dist...
Graph Neural Networks (GNNs) have demonstrated effectiveness in collaborative filtering tasks due to their ability to extract powerful structural features. However, combining the graph features extracted from user-item interactions and auxiliary features extracted from user genres and item properties remains a challenge. Currently available fusion...
Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using...
Recent recommender systems aim to provide not only accurate recommendations but also explanations that help users understand them better. However, most existing explainable recommendations only consider the importance of content in reviews, such as words or aspects, and ignore the ordering relationship among them. This oversight neglects crucial or...
Effective visual brain-machine interfaces (BMI) is based on reliable and stable EEG biomarkers. However, traditional adaptive filter-based approaches may suffer from individual variations in EEG signals, while deep neural network-based approaches may be hindered by the non-stationarity of EEG signals caused by biomarker attenuation and background o...
The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a singl...
The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of a...
Background
The efficacy of levodopa, the most crucial metric for Parkinson’s disease diagnosis and treatment, is traditionally gauged through the levodopa challenge test, which lacks a predictive model. This study aims to probe the predictive power of T1-weighted MRI, the most accessible modality for levodopa response.
Methods
This retrospective s...
Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the...
Person Re-Identification (ReID) systems pose a significant security risk from backdoor attacks, allowing adversaries to evade tracking or impersonate others. Beyond recognizing this issue, we investigate how backdoor attacks can be deployed in real-world scenarios, where a ReID model is typically trained on data collected in the digital domain and...
Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges d...
Behaving efficiently and flexibly is crucial for biological and artificial embodied agents. Behavior is generally classified into two types: habitual (fast but inflexible), and goal-directed (flexible but slow). While these two types of behaviors are typically considered to be managed by two distinct systems in the brain, recent studies have reveal...
Spiking neural networks (SNNs) represent a promising approach to developing artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike...
Person re-identification (Re-ID) has rapidly advanced due to its widespread real-world applications. It poses a significant risk of exposing private data from its training dataset. This paper aims to quantify this risk by conducting a membership inference (MI) attack. Most existing MI attack methods focus on classification models, while Re-ID follo...
The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes th...
Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information...
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (e.g., WebRTC) dynamically change the video qu...
The emergence of online media has facilitated the dissemination of news, but has also introduced the problem of information overload. To address this issue, providing users with accurate and diverse news recommendations has become increasingly important. News possesses rich and heterogeneous content, and the factors that attract users to news readi...
Prompt learning stands out as one of the most efficient approaches for adapting powerful vision-language foundational models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, despite its success in achieving remarkable performance on in-domain data, prompt learning still faces the significant challe...
Sequential recommendation aims to predict the next items that users will interact with according to the sequential dependencies within historical user interactions. Recently, self-attention based sequence modeling methods have become the mainstream method due to their competitive accuracy. Despite their effectiveness, these methods still have non-t...
Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of...
This chapter first introduces the history of the recommender system and the revolutionary changes in the field of recommender systems. Then, this chapter introduces the basic principles of recommender systems, including introducing the basic assumptions of recommendation algorithms from the perspective of machine learning, introducing how to define...
This chapter provides a summary of the book and offers insights into future trends in the research and application of recommender systems.
This chapter focuses on some problems and considerations in industry applications of recommender systems, and discusses the details of these actual applications based on the code in the Microsoft Recommenders repository and a cloud-based reference architecture. Readers are encouraged to follow the steps and methods in the text in a hands-on manner...
This chapter introduces the basics of deep learning, including feedforward computation and backpropagation algorithms for deep neural networks, as well as various classic neural network models. As readers learn, they can combine the content of other chapters in this book to understand and design different types of neural network models for recommen...
This chapter introduces four types of classic recommendation algorithms, including content-based recommendation algorithms, classic collaborative filtering algorithms, matrix factorization methods, and factorization machines. Before the emergence of deep learning, these methods were the most mainstream techniques for recommender systems, widely rec...
This chapter introduces the relationship between collaborative filtering and deep learning and then presented various deep learning-based collaborative filtering algorithms. Leveraging cutting-edge methods from deep learning, these algorithms can significantly improve the accuracy, scalability, diversity, and interpretability of recommendation syst...
This chapter introduces the hotspots of recommender system research, the key challenges of recommender system application, and how to achieve responsible recommendation technically. These contents may become the key of recommender system research and application in the future, so they need the continuous attention of researchers and developers.
Click-through rate (CTR) prediction is widely used in academia and industry. Most CTR tasks fall into a feature embedding & feature interaction paradigm, where the accuracy of CTR prediction is mainly improved by designing practical feature interaction structures. However, recent studies have argued that the fixed feature embedding learned only thr...
Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down...
The recommendation ecosystem involves interactions between recommender systems(Computer) and users(Human). Orthogonal to the perspective of recommender systems, we attempt to utilize LLMs from the perspective of users and propose a more human-central recommendation framework named RAH, which consists of Recommender system, Assistant and Human. The...
Sequential recommendation demonstrates the capability to recommend items by modeling the sequential behavior of users. Traditional methods typically treat users as sequences of items, overlooking the collaborative relationships among them. Graph-based methods incorporate collaborative information by utilizing the user-item interaction graph. Howeve...
Recommender systems are important for providing personalized services to users, but the vast amount of collected user data has raised concerns about privacy (e.g., sensitive data), security (e.g., malicious data) and utility (e.g., toxic data). To address these challenges, recommendation unlearning has emerged as a promising approach, which allows...
Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational...
The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving...
Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an ind...
Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by various risk measures, under the framework of distributional reinforcement learning. However, it remains uncle...
A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusi...
While person Re-identification (Re-ID) has progressed rapidly due to its wide real-world applications, it also causes severe risks of leaking personal information from training data. Thus, this paper focuses on quantifying this risk by membership inference (MI) attack. Most of the existing MI attack algorithms focus on classification models, while...
Ensemble methods can deliver surprising performance gains but also bring significantly higher computational costs, e.g., can be up to 2048X in large-scale ensemble tasks. However, we found that the majority of computations in ensemble methods are redundant. For instance, over 77% of samples in CIFAR-100 dataset can be correctly classified with only...
Modeling multi-variate time-series (MVTS) data is a long-standing research subject and has found wide applications. Recently, there is a surge of interest in modeling spatial relations between variables as graphs, i.e., first learning one static graph for each dataset and then exploiting the graph structure via graph neural networks. However, as sp...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected dataset. However, it will yield unsatisfactory performance if the quality of the offline datasets is poor. In th...
Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignm...
Session-based recommendation aims to predict the next item a user may interact with based on sessions’ information. Most existing session-based recommendation models only rely on the information of item-to-item transitions. However, we argue that there exists another type of information in a session that can be mined and made use of, i.e., the info...