Xiaowei Xu

Xiaowei Xu
University of Arkansas at Little Rock | UALR · Department of Information Sciences

Doctor of Philosophy

About

147
Publications
33,037
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
31,370
Citations
Introduction
I am a professor in the Department of Information Science at the University of Arkansas at Little Rock with passion in teaching and research in Artificial Intelligence (AI) and Machine Learning. My current research interests include Causal AI, transfer learning, learning representations, language models and computer vision. Currently AI is good at finding the correlations and the effect, but lacks the ability to infer the cause. Causal AI is about endowing causal inference and reasoning to AI.
Additional affiliations
July 2007 - present
University of Arkansas at Little Rock
Position
  • Professor
Description
  • I am a professor with passion in teaching and research in Artificial Intelligence and Machine Learning.
September 1998 - July 2002
Siemens
Position
  • Senior Researcher
Description
  • I am a team lead with Ph.D. Students in the Department of Neural Computation, which is a research division in Siemens. My research area spans Machine Learning and Data Science.
June 1993 - August 1998
Ludwig-Maximilians-University of Munich
Position
  • Research Associate
Education
June 1993 - July 1998
Ludwig-Maximilians-University of Munich
Field of study
  • Computer Science
September 1985 - December 1987
Chinese Academy of Sciences
Field of study
  • Computer Science
September 1979 - July 1983
Nankai University
Field of study
  • Mathematics

Publications

Publications (147)
Preprint
Full-text available
In this paper, the authors propose TriBERTa, a supervised entity resolution system that utilizes a pre-trained large language model and a triplet loss function to learn representations for entity matching. The system consists of two steps: first, name entity records are fed into a Sentence Bidirectional Encoder Representations from Transformers (SB...
Article
Causality assessment is vital in patient safety and pharmacovigilance (PSPV) for safety signal detection, adverse reaction management, and regulatory submission. Large language models (LLMs), especially those designed with transformer architecture, are revolutionizing various fields, including PSPV. While attempts to utilize Bidirectional Encoder R...
Article
Full-text available
Causality plays an essential role in multiple scientific disciplines, including the social, behavioral, and biological sciences and portions of statistics and artificial intelligence. Manual-based causality assessment from a large number of free text-based documents is very time-consuming, labor-intensive, and sometimes even impractical. Herein, we...
Article
Online metric learning has been widely exploited for large-scale data classification due to the low computational cost. However, amongst online practical scenarios where the features are evolving ( e.g. , some features are vanished and some new features are augmented), most metric learning models cannot be successfully applied to these scenarios,...
Article
Full-text available
Background: T ransformer-based language models have delivered clear improvements in a wide range of natural language processing (NLP) tasks. However, those models have a significant limitation; specifically, they cannot infer causality, a prerequisite for deployment in pharmacovigilance, and health care. Therefore, these transformer-based language...
Article
Recent years have witnessed the rapidly growing of the amount of semistructured documents in real‐world applications. Due to the huge size of the real‐world data, how to manage semistructured documents effectively is a big challenge for researchers. As a fundamental task in natural language processing field, document classification is a feasible wa...
Article
Full-text available
Motivation: Medical image enhancement is a crucial part to improve the quality of the images. The excellent visual effects and image quality can help doctors make quick diagnoses. Among medical images, Magnetic Resonance Imaging (MRI) images play a vital role in clinical diagnosis. Its imaging principle highlights the human tissue part ignoring the...
Article
Full-text available
A Correction to this paper has been published: https://doi.org/10.1007/s11042-021-10782-7
Preprint
Weakly-supervised learning has attracted growing research attention on medical lesions segmentation due to significant saving in pixel-level annotation cost. However, 1) most existing methods require effective prior and constraints to explore the intrinsic lesions characterization, which only generates incorrect and rough prediction; 2) they neglec...
Chapter
Unsupervised domain adaptation without consuming annotation process for unlabeled target data attracts appealing interests in semantic segmentation. However, 1) existing methods neglect that not all semantic representations across domains are transferable, which cripples domain-wise transfer with untransferable knowledge; 2) they fail to narrow cat...
Article
Full-text available
Product reviews are extremely valuable for online shoppers in providing purchase decisions. Driven by immense profit incentives, fraudsters deliberately fabricate untruthful reviews to distort the reputation of online products. As online reviews become more and more important, group spamming, i.e., a team of fraudsters working collaboratively to at...
Article
Full-text available
For resolving or alleviating the transportation problems, it is necessary to efficiently manage the public transportation and provide public transport services with high quality and advocate green travel, which rely on accurate traffic data. In order to obtain more accurate bus speed in the future, this paper proposed a novel dynamic hierarchical s...
Article
Full-text available
Artificial intelligence (AI)-based applications have found widespread applications in many fields of science, technology, and medicine. The use of enhanced computing power of machines in clinical medicine and diagnostics has been under exploration since the 1960s. More recently, with the advent of advances in computing, algorithms enabling machine...
Preprint
Unsupervised domain adaptation without consuming annotation process for unlabeled target data attracts appealing interests in semantic segmentation. However, 1) existing methods neglect that not all semantic representations across domains are transferable, which cripples domain-wise transfer with untransferable knowledge; 2) they fail to narrow cat...
Article
Weakly-supervised learning has attracted growing research attention on medical lesions segmentation due to significant saving in pixel-level annotation cost. However, 1) most existing methods require effective prior and constraints to explore the intrinsic lesions characterization, which only generates incorrect and rough prediction; 2) they neglec...
Preprint
Full-text available
Online metric learning has been widely exploited for large-scale data classification due to the low computational cost. However, amongst online practical scenarios where the features are evolving (e.g., some features are vanished and some new features are augmented), most metric learning models cannot be successfully applied into these scenarios al...
Chapter
Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain que...
Preprint
Unsupervised domain adaptation has attracted growing research attention on semantic segmentation. However, 1) most existing models cannot be directly applied into lesions transfer of medical images, due to the diverse appearances of same lesion among different datasets; 2) equal attention has been paid into all semantic representations instead of n...
Article
Full-text available
Background: Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high la...
Presentation
Protecting the safety of patients and consumers is the main mission of the FDA. Our solution is aimed at developing deep neural network models for post-market drug safety surveillance. Our causal AI model combines deep neural language model with Bayesian network. We applied the model to the FDA Adverse Event Reporting System (FAERS) for modeling th...
Preprint
Full-text available
[Background] Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high la...
Article
Full-text available
For the increasing travel demands and public transport problems, dynamically adjusting timetable or bus scheduling is necessary based on accurate real-time passenger flow forecasting. In order to get more accurate passenger flow in future, this paper proposes a novel hierarchical hybrid model based on time series model, deep belief networks (DBNs),...
Preprint
Full-text available
[Background] Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high la...
Preprint
Full-text available
[Background] Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high la...
Preprint
Product reviews are extremely valuable for online shoppers in providing purchase decisions. Driven by immense profit incentives, fraudsters deliberately fabricate untruthful reviews to distort the reputation of online products. As online reviews become more and more important, group spamming, i.e., a team of fraudsters working collaboratively to at...
Preprint
Full-text available
The primary goal of ad-hoc retrieval (document retrieval in the context of question answering) is to find relevant documents satisfied the information need posted in a natural language query. It requires a good understanding of the query and all the documents in a corpus, which is difficult because the meaning of natural language texts depends on t...
Article
Full-text available
In modern society, route guidance problems can be found everywhere. Reinforcement learning models can be normally used to solve such kind of problems; particularly, Sarsa Learning is suitable for tackling with dynamic route guidance problem. But how to solve the large state space of digital road network is a challenge for Sarsa Learning, which is v...
Article
Traffic congestion is a significant problem in the research field of Intelligent Transportation Systems. In this paper, a Hybrid Temporal Association Rules Mining method is proposed to predict traffic congestion. In the proposed method, DBSCAN algorithm is applied to find traffic environments, which generate eligible rules for predicting traffic co...
Preprint
Full-text available
Deep language models learning a hierarchical representation proved to be a powerful tool for natural language processing, text mining and information retrieval. However, representations that perform well for retrieval must capture semantic meaning at different levels of abstraction or context-scopes. In this paper, we propose a new method to genera...
Preprint
Full-text available
Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain que...
Article
Anomaly detection is one of the fundamental problems within diverse research areas and application domains. In comparison with most sparse representation based anomaly detection methods adopting a relaxation term of sparsity via ℓ1 norm, we propose an unsupervised anomaly detection method optimized via an adaptive greedy model based on ℓ0 norm cons...
Article
Full-text available
Online product reviews are becoming increasingly important due to their guidance function in people’s purchase decisions. As being highly subjective, online reviews are subject to opinion spamming, i.e., fraudsters write fake reviews or give unfair ratings to promote or demote target products. Although there have been much efforts in this field, th...
Article
Full-text available
In order to meet the real-time public travel demands, the bus operators need to adjust the timetables in time. Therefore, it is necessary to predict the variations of the short-term passenger flow. Under the help of the advanced public transportation systems, a large amount of real-time data about passenger flow is collected from the automatic pass...
Article
Full-text available
Online product reviews nowadays are increasingly prevalent in E-commerce websites. People often refer to product reviews to evaluate the quality of a product before purchasing. However, there have been a large number of review spammers who often work collaboratively to promote or demote target products, which severely harm the review system. Much p...
Article
Lifelong learning intends to learn new consecutive tasks depending on previously accumulated experiences, i.e., knowledge library. However, the knowledge among different new coming tasks are imbalance. Therefore, in this paper, we try to mimic an effective "human cognition" strategy by actively sorting the importance of new tasks in the process of...
Article
Predicting specific household characteristics (e.g., age of person, household income, cooking style, etc) from their everyday electricity consumption (i.e., smart meter data) enables energy provider to develop many intelligent business applications or help consumers to reduce their energy consumption. However, most existing works intend to predict...
Article
Full-text available
The state-of-the-art online learning approaches is only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic "human learning", i.e., endow a new capability to the learned metric for a new task from new online samples and incorporating previous experiences and knowledge. Therefore, we pro...
Conference Paper
Full-text available
Online social network services now have generally enormous monthly active users. Each user may have hundreds of different ties to families, friends or acquaintances. Discovering multiple social ties is pivotal in understanding the human relationship and recognizing the role played by individuals in very large networks. In this paper, an incremental...
Article
Full-text available
Background: Each lung structure exhales a unique pattern of aerosols, which can be used to detect and monitor lung diseases non-invasively. The challenges are accurately interpreting the exhaled aerosol fingerprints and quantitatively correlating them to the lung diseases. Objective and methods: In this study, we presented a paradigm of an exhal...
Article
Full-text available
The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich infor...
Article
The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors, then used supervised machine learning to predict in vivo hepatotoxic effects. A s...
Article
This study presents a hierarchical background modelling and subtraction approach for real-time detection of moving objects. At the first level, a novel pixel-wise background modelling method is proposed for coarse detection. The method can dynamically assign the optimal number of components for each pixel with the borrow-lend strategy. And a flexib...
Article
Full-text available
Given the significant impact on public health and drug development, drug safety has been a focal point and research emphasis across multiple disciplines in addition to scientific investigation, including consumer advocates, drug developers and regulators. Such a concern and effort has led numerous databases with drug safety information available in...
Article
Full-text available
The phenome represents a distinct set of information in the human population. It has been explored particularly in its relationship with the genome to identify correlations for diseases. The phenome has been also explored for drug repositioning with efforts focusing on the search space for the most similar candidate drugs. For a comprehensive analy...
Conference Paper
There is an increasing need for a storage system for petabyte scale graphs. In an attempt along the line, in this paper we develop a graph storage system, called Graph Store, for large graphs on top of the Hadoop Distributed File System (HDFS). Graph Store provides efficient graph storage and processing in a package. This paper also addresses criti...
Article
Full-text available
Background High Content Screening (HCS) has become an important tool for toxicity assessment, partly due to its advantage of handling multiple measurements simultaneously. This approach has provided insight and contributed to the understanding of systems biology at cellular level. To fully realize this potential, the simultaneously measured multipl...
Conference Paper
Similarity search is a key function for many applications including databases, pattern recognition and recommendation systems to name a few. In this paper, we first propose" query, a similarity search based on the popular cosine similarity for information retrieval and social network analysis. In contrast to traditional similarity search " query re...
Conference Paper
Substantial percent of global Internet users are now actively use Twitter. In recent times, Twitter has been experiencing explosive growth, attracting celebrities consequently a growing mass of user coverage. Insights of such a social network aid researchers in understanding behavioral dynamics of the society. Though there have been attempts to stu...
Conference Paper
The big data analytics community has accepted MapReduce as a programming model for processing massive data on distributed systems such as a Hadoop cluster. MapReduce has been evolving to improve its performance. We identified skewed workload among workers in the MapReduce ecosystem. The problem of skewed workload is of serious concern for massive d...
Conference Paper
Full-text available
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detecti...
Conference Paper
Motivated from related entity finding problem, in this paper, we introduce a novel approach to query answering called "NMiner." NMiner takes advantage of heuristics to find answers to complex semantic queries. It uses a combination of natural language processing techniques to parse sentences and extract entities, hypertext structure of the document...
Article
Full-text available
Drug repositioning offers an opportunity to revitalize the slowing drug discovery pipeline by finding new uses for currently existing drugs. Our hypothesis is that drugs sharing similar side effect profiles are likely to be effective for the same disease, and thus repositioning opportunities can be identified by finding drug pairs with similar side...
Article
Drug repositioning, exemplified by sildenafil and thalidomide, is a promising way to explore alternative indications for existing drugs. Recent research has shown that bioinformatics-based approaches have the potential to offer systematic insights into the complex relationships among drugs, targets and diseases necessary for successful repositionin...
Article
Full-text available
Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should...
Data
Description of data: List of all seed genes using various IDs in all three case studies. The highlighted columns are the input gene ID at atBioNet.
Data
The 14 literature-identified potential SLE biomarkers in case study 2.
Article
Full-text available
The fact that similarity breeds connections, the principle of homophily, has been well-studied in existing sociology literature. Several studies have observed this phenomenon by conducting surveys on human subjects. These studies have concluded that new ties are formed between similar individuals. This phenomenon has been used to explain several so...
Chapter
Similarity breeds connections, the principle of homophily, has been well studied in existing sociology literature. Several studies have observed this phenomenon by conducting surveys on human subjects. These studies have concluded that new ties are formed between similar individuals. This phenomenon has been used to explain several socio-psychologi...
Conference Paper
The fetal magnetocardiogram (fMCG) contains a wealth of information regarding the health of a fetus. The purpose of this study is to classify fMCG data into the following two groups: high-risk and normal. In this presentation the authors first describe how the feature vector containing both time and frequency domain attributes is built from the tim...