Xingquan Zhu

Florida Atlantic University, Boca Raton, Florida, United States

Are you Xingquan Zhu?

Claim your profile

Publications (194)128.96 Total impact

  • Meng Fang · Jie Yin · Xingquan Zhu ·

  • Jia Wu · Zhibin Hong · Shirui Pan · Xingquan Zhu · Zhihua Cai · Chengqi Zhang ·

    Knowledge and Information Systems 09/2015; DOI:10.1007/s10115-015-0872-1 · 1.78 Impact Factor
  • Meng Fang · Jie Yin · Xingquan Zhu · Chengqi Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a novel transfer learning framework for network node classification. Our objective is to accurately predict the labels of nodes in a target network by leveraging information from an auxiliary source network. Such a transfer learning framework is potentially useful for broader areas of network classification, where emerging new networks might not have sufficient labeled information because node labels are either costly to obtain or simply not available, whereas many established networks from related domains are available to benefit the learning. In reality, the source and the target networks may not share common nodes or connections, so the major challenge of cross-network transfer learning is to identify knowledge/patterns transferable between networks and potentially useful to support cross-network learning. In this work, we propose to learn common signature subgraphs between networks, and use them to construct new structure features for the target network. By combining the original node content features and the new structure features, we develop an iterative classification algorithm, TrGraph, that utilizes label dependency to jointly classify nodes in the target network. Experiments on real-world networks demonstrate that TrGraph achieves the superior performance compared to the state-of-the-art baseline methods, and transferring generalizable structure information can indeed improve the node classification accuracy.
    IEEE Transactions on Knowledge and Data Engineering 09/2015; 27(9):1-1. DOI:10.1109/TKDE.2015.2413789 · 2.07 Impact Factor
  • Jia Wu · Shirui Pan · Xingquan Zhu · Chengqi Zhang ·

  • Meng Fang · Jie Yin · Xingquan Zhu ·

    Data Mining and Knowledge Discovery 08/2015; DOI:10.1007/s10618-015-0424-z · 1.99 Impact Factor
  • Source
    Shirui Pan · Jia Wu · Xingquan Zhu · Guodong Long · Chengqi Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain. &
    Pattern Recognition 05/2015; 48(11). DOI:10.1016/j.patcog.2015.05.019 · 3.10 Impact Factor
  • Source
    Shirui Pan · Jia Wu · Xingquan Zhu · Chengqi Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the \hbox{population}, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph \hbox{samples}. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream.
    Cybernetics, IEEE Transactions on 04/2015; 45(5):940-954. DOI:10.1109/TCYB.2014.2341031 · 3.47 Impact Factor
  • Source
    Dataset: iSRD
    Hamzah Al Najada · Xingquan Zhu ·

  • Jia Wu · Shirui Pan · Xingquan Zhu · Zhihua Cai · Peng Zhang · Chengqi Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Naive Bayes (NB) is a popular machine learning tool for classification, due to its simplicity, high computational efficiency, and good classification accuracy, especially for high dimensional data such as texts. In reality, the pronounced advantage of NB is often challenged by the strong conditional independence assumption between attributes, which may deteriorate the classification performance. Accordingly, numerous efforts have been made to improve NB, by using approaches such as structure extension, attribute selection, attribute weighting, instance weighting, local learning and so on. In this paper, we propose a new Artificial Immune System (AIS) based self-adaptive attribute weighting method for Naive Bayes classification. The proposed method, namely AISWNB, uses immunity theory in Artificial Immune Systems to search optimal attribute weight values, where self-adjusted weight values will alleviate the conditional independence assumption and help calculate the conditional probability in an accurate way. One noticeable advantage of AISWNB is that the unique immune system based evolutionary computation process, including initialization, clone, section, and mutation, ensures that AISWNB can adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. As a result, AISWNB can obtain good attribute weight values during the learning process. Experiments and comparisons on 36 machine learning benchmark data sets and six image classification data sets demonstrate that AISWNB significantly outperforms its peers in classification accuracy, class probability estimation, and class ranking performance.
    Expert Systems with Applications 02/2015; 42(3):1487–1502. DOI:10.1016/j.eswa.2014.09.019 · 2.24 Impact Factor
  • Source
    Peng Zhang · Chuan Zhou · Peng Wang · Byron J. Gao · Xingquan Zhu · Li Guo ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Ensemble learning is a common tool for data stream classification, mainly because of its inherent advantages of handling large volumes of stream data and concept drifting. Previous studies, to date, have been primarily focused on building accurate ensemble models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real-world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can be automatically updated by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Theoretical analysis and empirical studies on both synthetic and real-world data streams demonstrate the performance of our approach.
    IEEE Transactions on Knowledge and Data Engineering 02/2015; 27(2):461-474. DOI:10.1109/TKDE.2014.2298018 · 2.07 Impact Factor
  • Source
    Shirui Pan · Jia Wu · Xingquan Zhu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph classification has drawn great interests in recent years due to the increasing number of applications involving objects with complex structure relationships. To date, all existing graph classification algorithms assume, explicitly or implicitly, that misclassifying instances in different classes incurs an equal amount of cost (or risk), which is often not the case in real-life applications (where misclassifying a certain class of samples, such as diseased patients, is subject to more expensive costs than others). Although cost-sensitive learning has been extensively studied, all methods are based on data with instance-feature representation. Graphs, however, do not have features available for learning and the feature space of graph data are likely infinite and needs to be carefully explored in order to favor classes with a higher cost. In this paper, we propose, CogBoost, a fast costsensitive graph classification algorithm, which aims to minimize the misclassification costs (instead of the errors) and achieve fast learning speed for large scale graph datasets. To minimize the misclassification costs, CogBoost iteratively selects the most discriminative subgraph by considering costs of different classes, and then solves a linear programming problem in each iteration by using Bayes decision rule based optimal loss function. In addition, a cutting plane algorithm is derived to speed up the solving of linear programs for fast learning on large graph datasets. Experiments and comparisons on real-world large graph datasets demonstrate the effectiveness and the efficiency of our algorithm.
    IEEE Transactions on Knowledge and Data Engineering 01/2015; 27(11). DOI:10.1109/TKDE.2015.2391115 · 2.07 Impact Factor
  • Shirui Pan · Jia Wu · Xingquan Zhu · Chengqi Zhang · Philip Yu ·

    IEEE Transactions on Knowledge and Data Engineering 01/2015; DOI:10.1109/TKDE.2015.2492567 · 2.07 Impact Factor
  • Boyu Li · Ting Guo · Xingquan Zhu · Zhanshan Li ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Model-based diagnosis in discrete event systems (DESs) is a major research topic in failure diagnosis, where diagnosability plays an important role in the construction of the diagnosis engine. To improve the solution efficiency for diagnosability, this paper proposes novel techniques to solve the problems of testing and optimizing for diagnosability. We propose a new concept, reverse twin plant, which is generated backwards from the final states of the DESs so there is no need to generate a complete copy of the DES model to determine the diagnosability. Such a design makes our testing algorithm much faster than existing methods. An efficient optimizing algorithm, which makes a non-diagnosable system diagnosable, is also proposed in the paper by expanding the minimal observable space with operation on just a part of the DES model. Examples and theoretical studies demonstrate the performance of the proposed designs.
    Engineering Applications of Artificial Intelligence 11/2014; 38. DOI:10.1016/j.engappai.2014.10.007 · 2.21 Impact Factor
  • Jia Wu · Xingquan Zhu · Chengqi Zhang · P.S. Yu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper formulates a multi-graph learning task. In our problem setting, a bag contains a number of graphs and a class label. A bag is labeled positive if at least one graph in the bag is positive, and negative otherwise. In addition, the genuine label of each graph in a positive bag is unknown, and all graphs in a negative bag are negative. The aim of multi-graph learning is to build a learning model from a number of labeled training bags to predict previously unseen test bags with maximum accuracy. This problem setting is essentially different from existing multi-instance learning (MIL), where instances in MIL share well-defined feature values, but no features are available to represent graphs in a multi-graph bag. To solve the problem, we propose a Multi-Graph Feature based Learning ( gMGFL) algorithm that explores and selects a set of discriminative subgraphs as features to transfer each bag into a single instance, with the bag label being propagated to the transferred instance. As a result, the multi-graph bags form a labeled training instance set, so generic learning algorithms, such as decision trees, can be used to derive learning models for multi-graph classification. Experiments and comparisons on real-world multi-graph tasks demonstrate the algorithm performance.
    IEEE Transactions on Knowledge and Data Engineering 10/2014; 26(10):2382-2396. DOI:10.1109/TKDE.2013.2297923 · 2.07 Impact Factor
  • Source
    Hamzah Al Najada · Xingquan Zhu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Internet is playing an essential role for modern information systems. Applications, such as e-commerce websites, are becoming popularly available for people to purchase different types of products online. During such an online shopping process, users often rely on online review reports from previous customers to make the final decision. Because online reviews are playing essential roles for the selling of online products (or services), some vendors (or customers) are providing fake/spam reviews to mislead the customers. Any false reviews of the products may result in unfair market competition and financial loss for the customers or vendors. In this research, we aim to distinguish between spam and non-spam reviews by using supervised classification methods. When training a classifier to identify spam vs. non-spam reviews, a challenging issue is that spam reviews are only a very small portion of the online review reports. This naturally leads to a data imbalance issue for training classifiers for spam review detection, where learning methods without emphasizing on minority samples (i.e., spams) may result in poor performance in detecting spam reviews (although the overall accuracy of the algorithm might be relatively high). In order to tackle the challenge, we employ a bagging based approach to build a number of balanced datasets, through which we can train a set of spam classifiers and use their ensemble to detect review spams. Experiments and comparisons demonstrate that our method, iSRD, outperforms baseline methods for review spam detection.
    IEEE IRI 2014, San Francisco, California, USA; 08/2014
  • Bin Li · Xingquan Zhu · Ruijiang Li · Chengqi Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Cross-domain collaborative filtering (CF) aims to share common rating knowledge across multiple related CF domains to boost the CF performance. In this paper, we view CF domains as a 2-D site-time coordinate system, on which multiple related domains, such as similar recommender sites or successive time-slices, can share group-level rating patterns. We propose a unified framework for cross-domain CF over the site-time coordinate system by sharing group-level rating patterns and imposing user/item dependence across domains. A generative model, say ratings over site-time (ROST), which can generate and predict ratings for multiple related CF domains, is developed as the basic model for the framework. We further introduce cross-domain user/item dependence into ROST and extend it to two real-world cross-domain CF scenarios: 1) ROST (sites) for alleviating rating sparsity in the target domain, where multiple similar sites are viewed as related CF domains and some items in the target domain depend on their correspondences in the related ones; and 2) ROST (time) for modeling user-interest drift over time, where a series of time-slices are viewed as related CF domains and a user at current time-slice depends on herself in the previous time-slice. All these ROST models are instances of the proposed unified framework. The experimental results show that ROST (sites) can effectively alleviate the sparsity problem to improve rating prediction performance and ROST (time) can clearly track and visualize user-interest drift over time.
    Cybernetics, IEEE Transactions on 08/2014; 45(5). DOI:10.1109/TCYB.2014.2343982 · 3.47 Impact Factor
  • Jia Wu · Shirui Pan · Xingquan Zhu · Zhihua Cai ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we formulate a novel graph-based learning problem, multi-graph classification (MGC), which aims to learn a classifier from a set of labeled bags each containing a number of graphs inside the bag. A bag is labeled positive, if at least one graph in the bag is positive, and negative otherwise. Such a multi-graph representation can be used for many real-world applications, such as webpage classification, where a webpage can be regarded as a bag with texts and images inside the webpage being represented as graphs. This problem is a generalization of multi-instance learning (MIL) but with vital differences, mainly because instances in MIL share a common feature space whereas no feature is available to represent graphs in a multi-graph bag. To solve the problem, we propose a boosting based multi-graph classification framework (bMGC). Given a set of labeled multi-graph bags, bMGC employs dynamic weight adjustment at both bag- and graph-levels to select one subgraph in each iteration as a weak classifier. In each iteration, bag and graph weights are adjusted such that an incorrectly classified bag will receive a higher weight because its predicted bag label conflicts to the genuine label, whereas an incorrectly classified graph will receive a lower weight value if the graph is in a positive bag (or a higher weight if the graph is in a negative bag). Accordingly, bMGC is able to differentiate graphs in positive and negative bags to derive effective classifiers to form a boosting model for MGC. Experiments and comparisons on real-world multi-graph learning tasks demonstrate the algorithm performance.
    Cybernetics, IEEE Transactions on 07/2014; 45(3). DOI:10.1109/TCYB.2014.2327111 · 3.47 Impact Factor
  • Meng Fang · Xingquan Zhu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional active learning assumes that the labeler is capable of providing ground truth label for each queried instance. In reality, a labeler might not have sufficient knowledge to label a queried instance but can only guess the label with his/her best knowledge. As a result, the label provided by the labeler, who is regarded to have uncertain labeling knowledge, might be incorrect. In this paper, we formulate this problem as a new “uncertain labeling knowledge” based active learning paradigm, and our key is to characterize the knowledge set of each labeler for active learning. By taking each unlabeled instance’s information and its likelihood of belonging to the uncertain knowledge set as a whole, we define an objective function to ensure that each queried instance is the most informative one for labeling and the labeler should also have sufficient knowledge to label the instance. To ensure label quality, we propose to use diversity density to characterize a labeler’s uncertain knowledge and further employ an error-reduction-based mechanism to either accept or decline a labeler’s label on uncertain instances. Experiments demonstrate the effectiveness of the proposed algorithm for real-world active learning tasks with uncertain labeling knowledge.
    Pattern Recognition Letters 07/2014; 43(1):98–108. DOI:10.1016/j.patrec.2013.10.011 · 1.55 Impact Factor
  • Source
    Jia Wu · Zhibin Hong · Shirui Pan · Xingquan Zhu · Chengqi Zhang · Zhihua Cai ·

    Proceedings of the 2014 SIAM International Conference on Data Mining, 04/2014: pages 217-225; , ISBN: 978-1-61197-344-0
  • Yifan Fu · Bin Li · Xingquan Zhu · Chengqi Zhang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional active learning methods require the labeler to provide a class label for each queried instance. The labelers are normally highly skilled domain experts to ensure the correctness of the provided labels, which in turn results in expensive labeling cost. To reduce labeling cost, an alternative solution is to allow nonexpert labelers to carry out the labeling task without explicitly telling the class label of each queried instance. In this paper, we propose a new active learning paradigm, in which a nonexpert labeler is only asked “whether a pair of instances belong to the same class”, namely, a pairwise label homogeneity. Under such circumstances, our active learning goal is twofold: (1) decide which pair of instances should be selected for query, and (2) how to make use of the pairwise homogeneity information to improve the active learner. To achieve the goal, we propose a “Pairwise Query on Max-flow Paths” strategy to query pairwise label homogeneity from a nonexpert labeler, whose query results are further used to dynamically update a Min-cut model (to differentiate instances in different classes). In addition, a “Confidence-based Data Selection” measure is used to evaluate data utility based on the Min-cut model’s prediction results. The selected instances, with inferred class labels, are included into the labeled set to form a closed-loop active learning process. Experimental results and comparisons with state-of-the-art methods demonstrate that our new active learning paradigm can result in good performance with nonexpert labelers.
    IEEE Transactions on Knowledge and Data Engineering 04/2014; 26(4):808-822. DOI:10.1109/TKDE.2013.165 · 2.07 Impact Factor

Publication Stats

3k Citations
128.96 Total Impact Points


  • 2006-2015
    • Florida Atlantic University
      • Department of Computer and Electrical Engineering and Computer Science
      Boca Raton, Florida, United States
  • 2009-2014
    • University of Technology Sydney
      • • Faculty of Engineering and Information Technology
      • • Centre for Quantum Computation and Intelligent Systems (QCIS)
      Sydney, New South Wales, Australia
  • 2013
    • University of Illinois at Chicago
      Chicago, Illinois, United States
  • 2007-2009
    • Chinese Academy of Sciences
      • Research Center for Cyber Economy and Knowledge Management
      Peping, Beijing, China
  • 2003-2009
    • University of Vermont
      • Department of Computer Science
      Burlington, Vermont, United States
  • 2002-2003
    • University of North Carolina at Charlotte
      • Department of Computer Science
      Charlotte, North Carolina, United States
  • 2001-2003
    • Purdue University
      • Department of Computer Science
      West Lafayette, IN, United States
  • 2000-2001
    • Fudan University
      • School of Computer Science
      Shanghai, Shanghai Shi, China