Dacheng Tao

University of Technology Sydney , Sydney, New South Wales, Australia

Are you Dacheng Tao?

Claim your profile

Publications (439)874.19 Total impact

  • Lefei Zhang · Qian Zhang · Liangpei Zhang · Dacheng Tao · Xin Huang · Bo Du
    [Show abstract] [Hide abstract]
    ABSTRACT: In computer vision and pattern recognition researches, the studied objects are often characterized by multiple feature representations with high dimensionality, thus it is essential to encode that multiview feature into a unified and discriminative embedding that is optimal for a given task. To address this challenge, this paper proposes an ensemble manifold regularized sparse low-rank approximation (EMR-SLRA) algorithm for multiview feature embedding. The EMR-SLRA algorithm is based on the framework of least-squares component analysis, in particular, the low dimensional feature representation and the projection matrix are obtained by the low-rank approximation of the concatenated multiview feature matrix. By considering the complementary property among multiple features, EMR-SLRA simultaneously enforces the ensemble manifold regularization on the output feature embedding. In order to further enhance its robustness against the noise, the group sparsity is introduced into the objective formulation to impose direct noise reduction on the input multiview feature matrix. Since there is no closed-form solution for EMR-SLRA, this paper provides an efficient EMR-SLRA optimization procedure to obtain the output feature embedding. Experiments on the pattern recognition applications confirm the effectiveness of the EMR-SLRA algorithm compare with some other multiview feature dimensionality reduction approaches.
    Pattern Recognition 10/2015; 48(10). DOI:10.1016/j.patcog.2014.12.016 · 3.10 Impact Factor
  • Dongjin Song · Wei Liu · Tianyi Zhou · Dacheng Tao · David A Meyer
    [Show abstract] [Hide abstract]
    ABSTRACT: Conditional random fields (CRFs) are a flexible yet powerful probabilistic approach and have shown advantages for popular applications in various areas, including text analysis, bioinformatics, and computer vision. Traditional CRF models, however, are incapable of selecting relevant features as well as suppressing noise from noisy original features. Moreover, conventional optimization methods often converge slowly in solving the training procedure of CRFs, and will degrade significantly for tasks with a large number of samples and features. In this paper, we propose robust CRFs (RCRFs) to simultaneously select relevant features. An optimal gradient method (OGM) is further designed to train RCRFs efficiently. Specifically, the proposed RCRFs employ the l1 norm of the model parameters to regularize the objective used by traditional CRFs, therefore enabling discovery of the relevant unary features and pairwise features of CRFs. In each iteration of OGM, the gradient direction is determined jointly by the current gradient together with the historical gradients, and the Lipschitz constant is leveraged to specify the proper step size. We show that an OGM can tackle the RCRF model training very efficiently, achieving the optimal convergence rate [Formula: see text] (where k is the number of iterations). This convergence rate is theoretically superior to the convergence rate O(1/k) of previous first-order optimization methods. Extensive experiments performed on three practical image segmentation tasks demonstrate the efficacy of OGM in training our proposed RCRFs.
    IEEE Transactions on Image Processing 10/2015; 24(10):3124-36. DOI:10.1109/TIP.2015.2438553 · 3.63 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Multiple distortion assessment is a big challenge in image quality assessment (IQA). In this paper, a no reference IQA model for multiply-distorted images is proposed. The features, which are sensitive to each distortion type even in the presence of other distortions, are first selected from three kinds of NSS features. An improved Bag-of-Words (BoW) model is then applied to encode the selected features. Lastly, a simple yet effective linear combination is used to map the image features to the quality score. The combination weights are obtained through lasso regression. A series of experiments show that the feature selection strategy and the improved BoW model are effective in improving the accuracy of quality prediction for multiple distortion IQA. Compared with other algorithms, the proposed method delivers the best result for multiple distortion IQA.
    IEEE Signal Processing Letters 10/2015; 22(10):1-1. DOI:10.1109/LSP.2015.2436908 · 1.75 Impact Factor
  • Zhe Chen · Zhibin Hong · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Over these years, Correlation Filter-based Trackers (CFTs) have aroused increasing interests in the field of visual object tracking, and have achieved extremely compelling results in different competitions and benchmarks. In this paper, our goal is to review the developments of CFTs with extensive experimental results. 11 trackers are surveyed in our work, based on which a general framework is summarized. Furthermore, we investigate different training schemes for correlation filters, and also discuss various effective improvements that have been made recently. Comprehensive experiments have been conducted to evaluate the effectiveness and efficiency of the surveyed CFTs, and comparisons have been made with other competing trackers. The experimental results have shown that state-of-art performance, in terms of robustness, speed and accuracy, can be achieved by several recent CFTs, such as MUSTer and SAMF. We find that further improvements for correlation filter-based tracking can be made on estimating scales, applying part-based tracking strategy and cooperating with long-term tracking methods.
  • Chunlei Peng · Xinbo Gao · Nannan Wang · Dacheng Tao · Xuelong Li · Jie Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Face sketch-photo synthesis plays an important role in law enforcement and digital entertainment. Most of the existing methods only use pixel intensities as the feature. Since face images can be described using features from multiple aspects, this paper presents a novel multiple representations-based face sketch-photo-synthesis method that adaptively combines multiple representations to represent an image patch. In particular, it combines multiple features from face images processed using multiple filters and deploys Markov networks to exploit the interacting relationships between the neighboring image patches. The proposed framework could be solved using an alternating optimization strategy and it normally converges in only five outer iterations in the experiments. Our experimental results on the Chinese University of Hong Kong (CUHK) face sketch database, celebrity photos, CUHK Face Sketch FERET Database, IIIT-D Viewed Sketch Database, and forensic sketches demonstrate the effectiveness of our method for face sketch-photo synthesis. In addition, cross-database and database-dependent style-synthesis evaluations demonstrate the generalizability of this novel method and suggest promising solutions for face identification in forensic science.
    IEEE transactions on neural networks and learning systems 09/2015; DOI:10.1109/TNNLS.2015.2464681 · 4.29 Impact Factor
  • Cheng Deng · Jie Xu · Kaibing Zhang · Dacheng Tao · Xinbo Gao · Xuelong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: For regression-based single-image super-resolution (SR) problem, the key is to establish a mapping relation between high-resolution (HR) and low-resolution (LR) image patches for obtaining a visually pleasing quality image. Most existing approaches typically solve it by dividing the model into several single-output regression problems, which obviously ignores the circumstance that a pixel within an HR patch affects other spatially adjacent pixels during the training process, and thus tends to generate serious ringing artifacts in resultant HR image as well as increase computational burden. To alleviate these problems, we propose to use structured output regression machine (SORM) to simultaneously model the inherent spatial relations between the HR and LR patches, which is propitious to preserve sharp edges. In addition, to further improve the quality of reconstructed HR images, a nonlocal (NL) self-similarity prior in natural images is introduced to formulate as a regularization term to further enhance the SORM-based SR results. To offer a computation-effective SORM method, we use a relative small nonsupport vector samples to establish the accurate regression model and an accelerating algorithm for NL self-similarity calculation. Extensive SR experiments on various images indicate that the proposed method can achieve more promising performance than the other state-of-the-art SR methods in terms of both visual quality and computational cost.
    IEEE transactions on neural networks and learning systems 09/2015; DOI:10.1109/TNNLS.2015.2468069 · 4.29 Impact Factor
  • Wenrui Hu · Dacheng Tao · Wensheng Zhang · Yuan Xie · Yehui Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new low-rank tensor model based on the circulant algebra, namely, twist tensor nuclear norm or t-TNN for short. The twist tensor denotes a 3-way tensor representation to laterally store 2D data slices in order. On one hand, t-TNN convexly relaxes the tensor multi-rank of the twist tensor in the Fourier domain, which allows an efficient computation using FFT. On the other, t-TNN is equal to the nuclear norm of block circulant matricization of the twist tensor in the original domain, which extends the traditional matrix nuclear norm in a block circulant way. We test the t-TNN model on a video completion application that aims to fill missing values and the experiment results validate its effectiveness, especially when dealing with video recorded by a non-stationary panning camera. The block circulant matricization of the twist tensor can be transformed into a circulant block representation with nuclear norm invariance. This representation, after transformation, exploits the horizontal translation relationship between the frames in a video, and endows the t-TNN model with a more powerful ability to reconstruct panning videos than the existing state-of-the-art low-rank models.
  • Changxing Ding · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
    IEEE Transactions on Multimedia 09/2015; DOI:10.1109/TMM.2015.2477042 · 2.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Local binary patterns (LBP) achieve great success in texture analysis, however they are not robust to noise. The two reasons for such disadvantage of LBP schemes are (1) they encode the texture spatial structure based only on local information which is sensitive to noise and (2) they use exact values as the quantization thresholds, which make the extracted features sensitive to small changes in the input image. In this paper, we propose a noise-robust adaptive hybrid pattern (AHP) for noised texture analysis. In our scheme, two solutions from the perspective of texture description model and quantization algorithm have been developed to reduce the feature׳s noise sensitiveness. First, a hybrid texture description model is proposed. In this model, the global texture spatial structure which is depicted by a global description model is encoded with the primitive microfeature for texture description. Second, we develop an adaptive quantization algorithm in which equal probability quantization is utilized to achieve the maximum partition entropy. Higher noise-tolerance can be obtained with the minimum lost information in the quantization process. The experimental results of texture classification on two texture databases with three different types of noise show that our approach leads significant improvement in noised texture analysis. Furthermore, our scheme achieves state-of-the-art performance in noisy face recognition.
    Pattern Recognition 08/2015; 48(8). DOI:10.1016/j.patcog.2015.01.001 · 3.10 Impact Factor
  • Lin Zhao · Xinbo Gao · Dacheng Tao · Xuelong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method for tracking human pose by employing max-margin Markov models. Representing a human body by part-based models such as pictorial structure, the problem of pose tracking can be modeled by a discrete Markov random field. Considering max-margin Markov networks provide an efficient way to deal with both structured data and strong generalization guarantees, it is thus natural to learn the model parameters using the max-margin technique. Since tracking human pose needs to couple limbs in adjacent frames, the model will introduce loops and will be intractable for learning and inference. Previous work has resorted to pose estimation methods, which discard temporal information by parsing frames individually. Alternatively, approximate inference strategies have been used, which can overfit to statistics of a particular dataset. Thus, the performance and generalization of these methods are limited. In this paper, we approximate the full model by introducing an ensemble of two tree-structured sub-models, Markov networks for spatial parsing and Markov chains for temporal parsing. Both models can be trained jointly using the max-margin technique, and an iterative parsing process is proposed to achieve the ensemble inference. We apply our model on three challengeable datasets which contains highly varied and articulated poses. Comprehensive experimental results demonstrate the superior performance of our method over state-of-the-art approaches.
    IEEE Transactions on Image Processing 08/2015; DOI:10.1109/TIP.2015.2473662 · 3.63 Impact Factor
  • Source
    Tongliang Liu · Dacheng Tao
  • Jie Gui · Tongliang Liu · Dacheng Tao · Zhenan Sun · Tieniu Tan
    [Show abstract] [Hide abstract]
    ABSTRACT: Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k-NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.
    08/2015; DOI:10.1109/TCYB.2015.2457234
  • Xianhua Zeng · Wei Bian · Wei Liu · Jialie Shen · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Image denoising is a fundamental problem in computer vision and image processing that holds considerable practical importance for real-world applications. The traditional patch-based and sparse coding-driven image denoising methods convert two-dimensional image patches into one-dimensional vectors for further processing. Thus, these methods inevitably break down the inherent two-dimensional geometric structure of natural images. To overcome this limitation pertaining to previous image denoising methods, we propose a two-dimensional image denoising model, namely, the Dictionary Pair Learning (DPL) model, and we design a corresponding algorithm called the Dictionary Pair Learning on the Grassmann-manifold (DPLG) algorithm. The DPLG algorithm first learns an initial dictionary pair (i.e., the left and right dictionaries) by employing a subspace partition technique on the Grassmann manifold, wherein the refined dictionary pair is obtained through a sub-dictionary pair merging. The DPLG obtains a sparse representation by encoding each image patch only with the selected sub-dictionary pair. The non-zero elements of the sparse representation are further smoothed by the graph Laplacian operator to remove noise. Consequently, the DPLG algorithm not only preserves the inherent two-dimensional geometric structure of natural images but also performs manifold smoothing in the two-dimensional sparse coding space. We demonstrate that the DPLG algorithm also improves the SSIM values of the perceptual visual quality for denoised images using experimental evaluations on the benchmark images and Berkeley segmentation datasets. Moreover, the DPLG also produces the competitive PSNR values from popular image denoising algorithms.
    IEEE Transactions on Image Processing 08/2015; 24(11). DOI:10.1109/TIP.2015.2468172 · 3.63 Impact Factor
  • Tongliang Liu · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Extracting low-rank and sparse structures from matrices has been extensively studied in machine learning, compressed sensing, and conventional signal processing, and has been widely applied to recommendation systems, image reconstruction, visual analytics, and brain signal processing. Manhattan nonnegative matrix factorization (MahNMF) is an extension of the conventional NMF, which models the heavy-tailed Laplacian noise by minimizing the Manhattan distance between a nonnegative matrix X and the product of two nonnegative low-rank factor matrices. Fast algorithms have been developed to restore the low-rank and sparse structures of X in the MahNMF. In this paper, we study the statistical performance of the MahNMF in the frame of the statistical learning theory. We decompose the expected reconstruction error of the MahNMF into the estimation error and the approximation error. The estimation error is bounded by the generalization error bounds of the MahNMF, while the approximation error is analyzed using the asymptotic results of the minimum distortion of vector quantization. The generalization error bound is valuable for determining the size of the training sample needed to guarantee a desirable upper bound for the defect between the expected and empirical reconstruction errors. Statistical performance analysis shows how the reduced dimensionality affects the estimation and approximation errors. Our framework can also be used for analyzing the performance of the NMF.
    IEEE transactions on neural networks and learning systems 08/2015; DOI:10.1109/TNNLS.2015.2458986 · 4.29 Impact Factor
  • Xun Yang · Meng Wang · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Object tracking is a fundamental problem in computer vision. Although much progress has been made, object tracking is still a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. To improve the reliability and effectiveness, this paper presents an approach that explores the combination of graph-based ranking and multiple feature representations for tracking. We construct multiple graph matrices with various types of visual features, and integrate the multiple graphs into a regularization framework to learn a ranking vector. In particular, the approach has exploited temporal consistency by adding a regularization term to constrain the difference between two weight vectors at adjacent frames. An effective iterative optimization scheme is also proposed in this paper. Experimental results on a variety of challenging video sequences show that the proposed algorithm performs favorably against the state-of-the-art visual tracking methods.
    Neurocomputing 07/2015; 159(1). DOI:10.1016/j.neucom.2015.02.046 · 2.08 Impact Factor
  • Source
  • Xiaoyan Li · Hongjie He · Ruxin Wang · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Single image super-resolution (SR) aims to construct a high-resolution (HR) version from a single low-resolution (LR) image. The SR reconstruction is challenging because of the missing details in the given LR image. Thus, it is critical to explore and exploit effective prior knowledge for boosting the reconstruction performance. In this paper, we propose a novel SR method by exploiting both the directional group sparsity of the image gradients and the directional features in similarity weight estimation. The proposed SR approach is based on two observations: 1) most of the sharp edges are oriented in a limited number of directions; 2) an image pixel can be estimated by the weighted averaging of its neighbors. In consideration of these observations, we apply the curvelet transform to extract directional features which are then used for region selection and weight estimation. A combined total variation (CTV) regularizer is presented which assumes that the gradients in natural images has a straightforward group sparsity structure. In addition, a directional non-local means (D-NLM) regularization term takes pixel values and directional information into account to suppress unwanted artifacts. By assembling the designed regularization terms, we solve the SR problem of an energy function with minimal reconstruction error by applying a framework of templates for first-order conic solvers (TFOCS). The thorough quantitative and qualitative results in terms of PSNR, SSIM, IFC, and preference matrix, demonstrate that the proposed approach achieves higher quality SR reconstruction than state-of-the-art algorithms.
    IEEE Transactions on Image Processing 05/2015; 24(9). DOI:10.1109/TIP.2015.2432713 · 3.63 Impact Factor
  • Xiao Liu · Mingli Song · Dacheng Tao · Jiajun Bu · Chun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in object detection have led to the development of segmentation by detection approaches that integrate top-down geometric priors for multi-class object segmentation. A key yet under-addressed issue in utilizing topdown cues for the problem of multi-class object segmentation by detection is efficiently generating robust and accurate geometric priors. In this paper, we propose a random geometric prior forest scheme to obtain object-adaptive geometric priors efficiently and robustly. In the scheme, a testing object first searches for training neighbors with similar geometries by using the random geometric prior forest, and then the geometry of the testing object is reconstructed by linearly combining the geometries of its neighbors. Our scheme enjoys several favorable properties when compared with conventional methods. First, it is robust and very fast because its inference does not suffer from bad initializations, poor local minimums or complex optimization. Second, the figure/ground geometries of training samples are utilized in a multi-task manner. Third, our scheme is objectadaptive but does not require the labeling of parts or poselets, and thus, it is quite easy to implement. To demonstrate the effectiveness of the proposed scheme, we integrate the obtained top-down geometric priors with conventional bottom-up color cues in the frame of graph cut. The proposed random geometric prior forest achieves the best segmentation results of all of the methods tested on VOC2010/2012 and is 90 times faster than the current state-of-the-art method.
    IEEE Transactions on Image Processing 05/2015; 24(10). DOI:10.1109/TIP.2015.2432711 · 3.63 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality. © 2015 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
    PLoS ONE 05/2015; 10(5):e0124685. DOI:10.1371/journal.pone.0124685 · 3.23 Impact Factor
  • Bin Wang · Xinbo Gao · Jie Li · Xuelong Li · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: A novel level set method (LSM) with the constraint of shape priors is proposed to implement a selective image segmentation. Firstly, the shape priors are aligned by using image moment to deprive the spatial related information. Secondly, the aligned shape priors are projected into the subspace expanded by using locality preserving projection to measure the similarity between the shapes. Finally, a new energy functional is built by combing data-driven and shape-driven energy items to implement a selective image segmentation method. We assess the proposed method and some representative LSMs on the synthetic, medical and natural images, the results suggest that the proposed one is superior to the pure data-driven LSMs and the representative LSMs with shape priors.
    Neurocomputing 05/2015; DOI:10.1016/j.neucom.2014.07.086 · 2.08 Impact Factor

Publication Stats

9k Citations
874.19 Total Impact Points


  • 2010–2015
    • University of Technology Sydney 
      • Centre for Quantum Computation and Intelligent Systems (QCIS)
      Sydney, New South Wales, Australia
  • 2010–2014
    • Chinese Academy of Sciences
      • • State Key Laboratory of Transient Optics and Photonics
      • • Xi'an Institute of Optics and Precision Mechanics
      Peping, Beijing, China
  • 2012
    • Xihua University
      Hua-yang, Sichuan, China
  • 2009–2012
    • Xidian University
      • School of Life Sciences and Technology
      Ch’ang-an, Shaanxi, China
  • 2008–2012
    • Nanyang Technological University
      • School of Computer Engineering
      Tumasik, Singapore
    • Zhejiang University
      Hang-hsien, Zhejiang Sheng, China
  • 2011
    • State Key Laboratory Of Transient Optics And Photonics
      Ch’ang-an, Shaanxi, China
  • 2007–2010
    • The University of Hong Kong
      • Department of Computer Science
      Hong Kong, Hong Kong
  • 2007–2009
    • The Hong Kong Polytechnic University
      • Department of Computing
      Hong Kong, Hong Kong
  • 2005–2009
    • Birkbeck, University of London
      • Department of Computer Science and Information Systems
      Londinium, England, United Kingdom
  • 2006–2008
    • University of London
      Londinium, England, United Kingdom
  • 2004–2005
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong