Xiangliang Zhang

Xiangliang Zhang
King Abdullah University of Science and Technology | KAUST · Department of Computer Science

PhD Univ. Paris 11, INRIA

About

323
Publications
60,477
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,046
Citations

Publications

Publications (323)
Article
Full-text available
Data stream clustering provides insights into the underlying patterns of data flows. This paper focuses on selecting the best representatives from clusters of streaming data. There are two main challenges: how to cluster with the best representatives and how to handle the evolving patterns that are important characteristics of streaming data with d...
Conference Paper
Full-text available
Detecting changes in multidimensional data streams is an important and challenging task. In unsupervised change detection, changes are usually detected by comparing the distribution in a current (test) window with a reference window. It is thus essential to design divergence metrics and density estimators for comparing the data distributions, which...
Article
Full-text available
While early emphasis of Infrastructure as a Service (IaaS) clouds was on providing resource elasticity to end users, providers are increasingly interested in over-committing their resources to maximize the utilization and returns of their capital investments. In principle, over-committing resources hedges that users — on average — only need a small...
Conference Paper
Full-text available
The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a data set. However, it suffers two major shortcomings: i) the number of clusters is vague with the user-defined parameter called self-confidence, and ii) the quadratic computational complexity. When aiming at a...
Article
Personalized federated learning (PFL) learns a personalized model for each client in a decentralized manner, where each client owns private data that are not shared and data among clients are non-independent and identically distributed (i.i.d.) However, existing PFL solutions assume that clients have sufficient training samples to jointly induce pe...
Conference Paper
Graph representation learning has attracted tremendous attention due to its remarkable performance in many real-world applications. However, prevailing supervised graph representation learning models for specific tasks often suffer from label sparsity issue as data labeling is always time and resource consuming. In light of this, few-shot learning...
Conference Paper
Most real-world knowledge graphs (KG) are far from complete and comprehensive. This problem has motivated efforts in predicting the most plausible missing facts to complete a given KG, i.e., knowledge graph completion (KGC). However, existing KGC methods suffer from two main issues, 1) the false negative issue, i.e., the sampled negative training i...
Article
Retrosynthetic planning plays an important role in the field of organic chemistry, which could generate a synthetic route for the target product. The synthetic route is a series of reactions which are started from the available molecules. The most challenging problem in the generation of the synthetic route is the large search space of the candidat...
Article
This paper studies learning node representations with graph neural networks (GNNs) for unsupervised scenario. Specifically, we derive a theoretical analysis and provide an empirical demonstration about the non-steady performance of GNNs over different graph datasets, when the supervision signals are not appropriately defined. The performance of GNN...
Preprint
Neural logical reasoning (NLR) is a fundamental task in knowledge discovery and artificial intelligence. NLR aims at answering multi-hop queries with logical operations on structured knowledge bases based on distributed representations of queries and answers. While previous neural logical reasoners can give specific entity-level answers, i.e., perf...
Preprint
Full-text available
The related work section is an important component of a scientific paper, which highlights the contribution of the target paper in the context of the reference papers. Authors can save their time and effort by using the automatically generated related work section as a draft to complete the final related work. Most of the existing related work sect...
Preprint
Most real-world knowledge graphs (KG) are far from complete and comprehensive. This problem has motivated efforts in predicting the most plausible missing facts to complete a given KG, i.e., knowledge graph completion (KGC). However, existing KGC methods suffer from two main issues, 1) the false negative issue, i.e., the candidates for sampling neg...
Preprint
Graph representation learning has attracted tremendous attention due to its remarkable performance in many real-world applications. However, prevailing (semi-)supervised graph representation learning models for specific tasks often suffer from label sparsity issue as data labeling is always time and resource consuming. In light of this, few-shot le...
Article
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events a...
Article
Group recommendation aims to recommend items to a group of users. In this work, we study group recommendation in a particular scenario, namely occasional group recommendation, where groups are formed ad hoc and users may just constitute a group for the first time—that is, the historical group-item interaction records are highly limited. Most state-...
Article
We study a new research problem named semi-supervised few-shot multi-label node classification which has the following characteristics: 1) the extreme imbalance between the number of labeled and unlabeled nodes that are connected on graphs (handled by semi-supervised node learning); 2) the few labeled nodes per label (few-shot learning); and 3) the...
Preprint
Full-text available
Light-Emitting Diodes (LEDs) based underwater optical wireless communications (UOWCs), a technology with low latency and high data rates, have attracted significant importance for underwater robots. However, maintaining a controlled line of sight link between transmitter and receiver is challenging due to the constant movement of the underlying opt...
Article
We study the problem of computing the similarity join in a dynamic context, where the sets are updated dynamically. This, however, is inefficient with the existing methods, because they assume that data collections are static and have to compute the join result from scratch whenever a set is updated. We propose ALJoin, an adaptive filtering approac...
Preprint
Full-text available
Due to the advantage of reducing storage while speeding up query time on big heterogeneous data, cross-modal hashing has been extensively studied for approximate nearest neighbor search of multi-modal data. Most hashing methods assume that training data is class-balanced.However, in practice, real world data often have a long-tailed distribution. I...
Preprint
Full-text available
Due to the unreliability of Internet workers, it's difficult to complete a crowdsourcing project satisfactorily, especially when the tasks are multiple and the budget is limited. Recently, meta learning has brought new vitality to few-shot learning, making it possible to obtain a classifier with a fair performance using only a few training samples....
Preprint
Full-text available
We raise and define a new crowdsourcing scenario, open set crowdsourcing, where we only know the general theme of an unfamiliar crowdsourcing project, and we don't know its label space, that is, the set of possible labels. This is still a task annotating problem, but the unfamiliarity with the tasks and the label space hampers the modelling of the...
Preprint
Full-text available
Cross-modal hashing (CMH) is one of the most promising methods in cross-modal approximate nearest neighbor search. Most CMH solutions ideally assume the labels of training and testing set are identical. However, the assumption is often violated, causing a zero-shot CMH problem. Recent efforts to address this issue focus on transferring knowledge fr...
Article
Trusting a machine-learning model is a critical factor that will speed the spread of the fourth industrial revolution. Trust can be achieved by understanding how a model is making decisions. For white-box models, it is easy to “see” the model and examine its prediction. For black-box models, the explanation of the decision process is not straightfo...
Article
Full-text available
In the mobile Internet era, recommender systems have become an irreplaceable tool to help users discover useful items, thus alleviating the information overload problem. Recent research on deep neural network (DNN)-based recommender systems have made significant progress in improving prediction accuracy, largely attributed to the widely accessible...
Preprint
Full-text available
This paper provides an overview of the Arabic Sentiment Analysis Challenge organized by King Abdullah University of Science and Technology (KAUST). The task in this challenge is to develop machine learning models to classify a given tweet into one of the three categories Positive, Negative, or Neutral. From our recently released ASAD dataset, we pr...
Article
Full-text available
The tunnel junction (TJ) is a crucial structure for numerous III-nitride devices. A fundamental challenge for TJ design is to minimize the TJ resistance at high current densities. In this work, we propose the asymmetric p-AlGaN/i-InGaN/n-AlGaN TJ structure for the first time. P-AlGaN/i-InGaN/n-AlGaN TJs were simulated with different Al or In compos...
Preprint
Full-text available
Current app ranking and recommendation systems are mainly based on user-generated information, e.g., number of downloads and ratings. However, new apps often have few (or even no) user feedback, suffering from the classic cold-start problem. How to quickly identify and then recommend new apps of high quality is a challenging issue. Here, a fundamen...
Chapter
Despite of the pervasive existence of multi-label evasion attack, it is an open yet essential problem to characterize the origin of the adversarial vulnerability of a multi-label learning system and assess its attackability. In this study, we focus on non-targeted evasion attack against multi-label classifiers. The goal of the threat is to cause mi...
Article
Cross-modal hashing has been intensively studied to efficiently retrieve multi-modal data across modalities. Supervised cross-modal hashing methods leverage the labels of training data to improve the retrieval performance. However, most of these methods still assume that the semantic labels of training data are ideally complete and noise-free. This...
Article
Crowdsensed Data Trading (CDT) is a novel data trading paradigm, where each data consumer can publicize its data demand as some crowdsensing tasks, and some mobile users (i.e., data sellers) can compete for these tasks, collect the corresponding data, and sell the results to the consumers. Existing CDT systems generally depend on a data trading bro...
Preprint
Full-text available
As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in pract...
Article
Full-text available
Data prediction and imputation are important parts of marine animal movement trajectory analysis as they can help researchers understand animal movement patterns and address missing data issues. Compared with traditional methods, deep learning methods can usually provide enhanced pattern extraction capabilities, but their applications in marine dat...
Conference Paper
Math word problems (MWPs) have been recently addressed with Seq2Seq models by `translating' math problems described in natural language to a mathematical expression, following a typical encoder-decoder structure. Although effective in solving classical math problems, these models fail when a subtle variation is applied to the word expression of a m...
Preprint
Full-text available
Math word problem (MWP) solving is the task of transforming a sequence of natural language problem descriptions to executable math equations. An MWP solver not only needs to understand complex scenarios described in the problem texts, but also identify the key mathematical variables and associate text descriptions with math equation logic. Although...
Article
Recently, bearing the message passing paradigm, graph neural networks(GNNs) have greatly advanced the performance of node representation learning on graphs. However, a majority class of GNNs are only designed for homogeneous graphs, leading to inferior adaptivity to the more informative heterogeneous graphs with various types of nodes and edges. Al...
Article
Motivation: Alternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computation...
Article
Alternative splicing enables a gene spliced into different isoforms and protein variants. Identifying individual functions of isoforms help deciphering the functional diversity of proteins. Although much efforts have been made for automatic gene function prediction, few have been moved toward isoform function prediction, mainly due to the unavailab...
Preprint
Full-text available
Despite of the pervasive existence of multi-label evasion attack, it is an open yet essential problem to characterize the origin of the adversarial vulnerability of a multi-label learning system and assess its attackability. In this study, we focus on non-targeted evasion attack against multi-label classifiers. The goal of the threat is to cause mi...
Article
Entity synonym discovery (ESD) from text corpus is an essential problem in many entity-leveraging applications, e.g., web search and question answering. This paper aims to address three limitations that widely exist in the current ESD solutions: 1) the lack of effective utilization for synonym set information; 2) the feature extraction of entities...
Conference Paper
Full-text available
Self-supervised learning (SSL), which can automatically generate ground-truth samples from raw data, holds vast potential to improve recommender systems. Most existing SSL-based methods perturb the raw data graph with uniform node/edge dropout to generate new data views and then conduct the self-discrimination based contrastive learning over differ...
Preprint
Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed heterogeneous graph, which is composed of paper-paper citation, paper-dataset citation, and also paper content. We propo...
Preprint
Full-text available
Self-supervised learning (SSL), which can automatically generate ground-truth samples from raw data, holds vast potential to improve recommender systems. Most existing SSL-based methods perturb the raw data graph with uniform node/edge dropout to generate new data views and then conduct the self-discrimination based contrastive learning over differ...
Article
Full-text available
Traditional clustering algorithms focus on a single clustering result; as such, they cannot explore potential diverse patterns of complex real world data. To deal with this problem, approaches that exploit meaningful alternative clusterings in data have been developed in recent years. Existing algorithms, including single view/multi-view multiple c...
Article
Full-text available
Building realistic and reliable models of the subsurface is the primary goal of seismic imaging. Here we construct an ensemble of convolutional neural networks (CNNs) to build velocity models directly from the data. Most other approaches attempt to map full data into 2D labels. We exploit the regularity of seismic acquisition and train CNNs to map...
Article
Full-text available
The global lockdown to mitigate COVID-19 pandemic health risks has altered human interactions with nature. Here, we report immediate impacts of changes in human activities on wildlife and environmental threats during the early lockdown months of 2020, based on 877 qualitative reports and 332 quantitative assessments from 89 different studies. Hundr...
Article
Full-text available
Social media (e.g., Twitter) has been an extremely popular tool for public health surveillance. The novel coronavirus disease 2019 (COVID-19) is the first pandemic experienced by a world connected through the internet. We analyzed 105+ million tweets collected between March 1 and May 15, 2020, and Weibo messages compiled between January 20 and May...
Chapter
Full-text available
As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in pract...
Article
A gene can be spliced into different isoforms by alternative splicing, which contributes to the functional diversity of protein species. Computational prediction of gene-disease associations (GDAs) has been studied for decades. However, the process of identifying the isoform-disease associations (IDAs) at a large scale is rarely explored, which can...
Preprint
Full-text available
With the ubiquitous graph-structured data in various applications, models that can learn compact but expressive vector representations of nodes have become highly desirable. Recently, bearing the message passing paradigm, graph neural networks (GNNs) have greatly advanced the performance of node representation learning on graphs. However, a majorit...
Preprint
Full-text available
As a well-established approach, factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering. With the prominent development of deep neural networks (DNNs), there is a recent and ongoing trend of enhancing the expressiveness of FM-based mod...
Preprint
Full-text available
In the mobile Internet era, recommender systems have become an irreplaceable tool to help users discover useful items, thus alleviating the information overload problem. Recent research on deep neural network (DNN)-based recommender systems have made significant progress in improving prediction accuracy, largely attributed to the widely accessible...
Article
Full-text available
Anthropogenic litter density and composition data were obtained by conducting aerial surveys on 44 beaches along the Saudi Arabian Coast of the Red Sea [1]. The aerial surveys were completed with commercial drones of the DJI Phantom suite flown at a 10 m altitude. The stills have a resolution of less than 0.5 cm pixels⁻¹, hence, litter objects of f...
Preprint
Full-text available
In this work, we study group recommendation in a particular scenario, namely Occasional Group Recommendation (OGR). Most existing works have addressed OGR by aggregating group members' personal preferences to learn the group representation. However, the representation learning for a group is most complex beyond the fusion of group member representa...
Article
Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, and user-product. In bipartite graphs, a butterfly (i.e., $2\times 2$ bi-clique) is the smallest non-trivial cohesive structure and plays an important role in applications such as anomaly detection. Considerable efforts focus on count...
Article
Graph node embedding aims at learning a vector representation for all nodes given a graph. It is a central problem in many machine learning tasks (e.g., node classification, recommendation, community detection). The key problem in graph node embedding lies in how to define the dependence to neighbors Existing approaches specify (either explicitly o...