Dacheng Tao

University of Technology Sydney , Sydney, New South Wales, Australia

Are you Dacheng Tao?

Claim your profile

Publications (449)910.94 Total impact

  • Qingshan Liu · Jiankang Deng · Dacheng Tao ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem continues to be challenging due to the large variability in expression, illumination, and pose, and the existence of occlusions in real-world face images. In this paper, we present a dual sparse constrained cascade regression model for robust face alignment. Instead of using the least-squares method during the training process of regressors, sparse constraint is introduced to select robust features and compress the size of the model. Moreover, sparse shape constraint is incorporated between each cascade regression, and the explicit shape constraints are able to suppress the ambiguity in local features. To improve the model's adaptation to large pose variation, face pose is estimated by five fiducial landmarks located by deep convolutional neuron network, which is used to adaptively design the cascade regression model. To our best knowledge, this is the first attempt to fuse explicit shape constraint (sparse shape constraint) and implicit context information (sparse feature selection) for robust face alignment in the framework of cascade regression. Extensive experiments on nine challenging wild data sets demonstrate the advantages of the proposed method over the state-of-the-art methods.
    IEEE Transactions on Image Processing 11/2015; DOI:10.1109/TIP.2015.2502485 · 3.63 Impact Factor
  • Bo Du · Zengmao Wang · Lefei Zhang · Liangpei Zhang · Wei Liu · Jialie Shen · Dacheng Tao ·
    [Show abstract] [Hide abstract]
    ABSTRACT: How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified best-versus-second-best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
    Cybernetics, IEEE Transactions on 11/2015; DOI:10.1109/TCYB.2015.2496974 · 3.47 Impact Factor
  • Xiaoshuang Shi · Zhenhua Guo · Feiping Nie · Lin Yang · Jane You · Dacheng Tao ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Principal component analysis (PCA) is widely applied in various areas, one of the typical applications is in face. Many versions of PCA have been developed for face recognition. However, most of these approaches are sensitive to grossly corrupted entries in a 2D matrix representing a face image. In this paper, we try to reduce the influence of grosses like variations in lighting, facial expressions and occlusions to improve the robustness of PCA. In order to achieve this goal, we present a simple but effective unsupervised preprocessing method, two-dimensional whitening reconstruction (TWR), which includes two stages: 1) A whitening process on a 2D face image matrix rather than a concatenated 1D vector; 2) 2D face image matrix reconstruction. TWR reduces the pixel redundancy of the internal image, meanwhile maintains important intrinsic features. In this way, negative effects introduced by gross-like variations are greatly reduced. Furthermore, the face image with TWR preprocessing could be approximate to a Gaussian signal, on which PCA is more effective. Experiments on benchmark face databases demonstrate that the proposed method could significantly improve the robustness of PCA methods on classification and clustering, especially for the faces with severe illumination changes.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; DOI:10.1109/TPAMI.2015.2501810 · 5.78 Impact Factor
  • Chang Xu · Dacheng Tao · Chao Xu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: It is practical to assume that an individual view is unlikely to be sufficient for effective multi-view learning. Therefore, integration of multi-view information is both valuable and necessary. In this paper, we propose the Multi-view Intact Space Learning (MISL) algorithm, which integrates the encoded complementary information in multiple views to discover a latent intact representation of the data. Even though each view on its own is insufficient, we show theoretically that by combing multiple views we can obtain abundant information for latent intact space learning. Employing the Cauchy loss (a technique used in statistical learning) as the error measurement strengthens robustness to outliers. We propose a new definition of multi-view stability and then derive the generalization error bound based on multi-view stability and Rademacher complexity, and show that the complementarity between multiple views is beneficial for the stability and generalization. MISL is efficiently optimized using a novel Iteratively Reweight Residuals (IRR) technique, whose convergence is theoretically analyzed. Experiments on synthetic data and real-world datasets demonstrate that MISL is an effective and promising algorithm for practical applications.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 11/2015; 37(12):1-1. DOI:10.1109/TPAMI.2015.2417578 · 5.78 Impact Factor
  • Yong Luo · Yonggang Wen · Dacheng Tao · Jie Gui · Chao Xu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We therefore propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization and each sub-problem can be efficiently solved. Experiments on two challenging real-world image datasets demonstrate the effectiveness and superiority of the proposed method.
    IEEE Transactions on Image Processing 11/2015; DOI:10.1109/TIP.2015.2495116 · 3.63 Impact Factor
  • Dacheng Tao · Maoying Qiao · Wei Bian · Yi Da Xu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
    IEEE Transactions on Knowledge and Data Engineering 11/2015; 27(11):1-1. DOI:10.1109/TKDE.2015.2433262 · 2.07 Impact Factor
  • Changxing Ding · Dacheng Tao ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
    IEEE Transactions on Multimedia 11/2015; DOI:10.1109/TMM.2015.2477042 · 2.30 Impact Factor
  • Lefei Zhang · Qian Zhang · Liangpei Zhang · Dacheng Tao · Xin Huang · Bo Du ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In computer vision and pattern recognition researches, the studied objects are often characterized by multiple feature representations with high dimensionality, thus it is essential to encode that multiview feature into a unified and discriminative embedding that is optimal for a given task. To address this challenge, this paper proposes an ensemble manifold regularized sparse low-rank approximation (EMR-SLRA) algorithm for multiview feature embedding. The EMR-SLRA algorithm is based on the framework of least-squares component analysis, in particular, the low dimensional feature representation and the projection matrix are obtained by the low-rank approximation of the concatenated multiview feature matrix. By considering the complementary property among multiple features, EMR-SLRA simultaneously enforces the ensemble manifold regularization on the output feature embedding. In order to further enhance its robustness against the noise, the group sparsity is introduced into the objective formulation to impose direct noise reduction on the input multiview feature matrix. Since there is no closed-form solution for EMR-SLRA, this paper provides an efficient EMR-SLRA optimization procedure to obtain the output feature embedding. Experiments on the pattern recognition applications confirm the effectiveness of the EMR-SLRA algorithm compare with some other multiview feature dimensionality reduction approaches.
    Pattern Recognition 10/2015; 48(10). DOI:10.1016/j.patcog.2014.12.016 · 3.10 Impact Factor
  • Chang Xu · Dacheng Tao · Chao Xu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: One underlying assumption of conventional multi-view learning algorithms is that all examples can be successfully observed on all the views. However, due to various failures or faults in collecting and pre-processing the data on different views, we are more likely to be faced with an incomplete-view setting, where an example could be missing its representation on one view (i.e., missing view) or could be only partially observed on that view (i.e., missing variables). Low-rank assumption used to be effective for recovering the random missing variables of features, but it is disabled by concentrated missing variables and has no effect on missing views. This paper suggests that the key to handling the incomplete-view problem is to exploit the connections between multiple views, enabling the incomplete views to be restored with the help of the complete views. We propose an effective algorithm to accomplish multi-view learning with incomplete views by assuming that different views are generated from a shared subspace. To handle the large scale problem and obtain fast convergence, we investigate a successive overrelaxation (SOR) method to solve the objective function. Convergence of the optimization technique is theoretically analyzed. Experimental results on toy data and real-world datasets suggest that studying the incomplete-view problem in multi-view learning is significant and that the proposed algorithm can effectively handle the incomplete views in different applications.
    IEEE Transactions on Image Processing 10/2015; DOI:10.1109/TIP.2015.2490539 · 3.63 Impact Factor
  • Source
    Article: Where2Stand

    ACM Transactions on Intelligent Systems and Technology 10/2015; 7(1):1-22. DOI:10.1145/2770879 · 1.25 Impact Factor
  • Source
    Yongsheng Dong · Dacheng Tao · Xuelong Li ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Effective representation of image texture is important for an image-classification task. Statistical modelling in wavelet domains has been widely used to image texture representation. However, due to the intraclass complexity and interclass diversity of textures, it is hard to use a predefined probability distribution function to fit adaptively all wavelet subband coefficients of different textures. In this article, we propose a novel modelling approach, Heterogeneous and Incrementally Generated Histogram (HIGH), to indirectly model the wavelet coefficients by use of four local features in wavelet subbands. By concatenating all the HIGHs in allwavelet subbands of a texture, we can construct a nonnegative multiresolution vector (NMV) to represent a texture image. Considering the NMV's high dimensionality and nonnegativity, we further propose a Hessian regularized discriminative nonnegative matrix factorization to compute a low-dimensional basis of the linear subspace of NMVs. Finally, we present a texture classification approach by projecting NMVs on the lowdimensional basis. Experimental results show that our proposed texture classification method outperforms seven representative approaches.
    ACM Transactions on Intelligent Systems and Technology 10/2015; 7(1):1-21. DOI:10.1145/2738050 · 1.25 Impact Factor
  • Yanan Lu · Fengying Xie · Tongliang Liu · Zhiguo Jiang · Dacheng Tao ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple distortion assessment is a big challenge in image quality assessment (IQA). In this paper, a no reference IQA model for multiply-distorted images is proposed. The features, which are sensitive to each distortion type even in the presence of other distortions, are first selected from three kinds of NSS features. An improved Bag-of-Words (BoW) model is then applied to encode the selected features. Lastly, a simple yet effective linear combination is used to map the image features to the quality score. The combination weights are obtained through lasso regression. A series of experiments show that the feature selection strategy and the improved BoW model are effective in improving the accuracy of quality prediction for multiple distortion IQA. Compared with other algorithms, the proposed method delivers the best result for multiple distortion IQA.
    IEEE Signal Processing Letters 10/2015; 22(10):1-1. DOI:10.1109/LSP.2015.2436908 · 1.75 Impact Factor
  • Dongjin Song · Wei Liu · Tianyi Zhou · Dacheng Tao · David A Meyer ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Conditional random fields (CRFs) are a flexible yet powerful probabilistic approach and have shown advantages for popular applications in various areas, including text analysis, bioinformatics, and computer vision. Traditional CRF models, however, are incapable of selecting relevant features as well as suppressing noise from noisy original features. Moreover, conventional optimization methods often converge slowly in solving the training procedure of CRFs, and will degrade significantly for tasks with a large number of samples and features. In this paper, we propose robust CRFs (RCRFs) to simultaneously select relevant features. An optimal gradient method (OGM) is further designed to train RCRFs efficiently. Specifically, the proposed RCRFs employ the l1 norm of the model parameters to regularize the objective used by traditional CRFs, therefore enabling discovery of the relevant unary features and pairwise features of CRFs. In each iteration of OGM, the gradient direction is determined jointly by the current gradient together with the historical gradients, and the Lipschitz constant is leveraged to specify the proper step size. We show that an OGM can tackle the RCRF model training very efficiently, achieving the optimal convergence rate [Formula: see text] (where k is the number of iterations). This convergence rate is theoretically superior to the convergence rate O(1/k) of previous first-order optimization methods. Extensive experiments performed on three practical image segmentation tasks demonstrate the efficacy of OGM in training our proposed RCRFs.
    IEEE Transactions on Image Processing 10/2015; 24(10):3124-36. DOI:10.1109/TIP.2015.2438553 · 3.63 Impact Factor
  • Zhe Chen · Zhibin Hong · Dacheng Tao ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Over these years, Correlation Filter-based Trackers (CFTs) have aroused increasing interests in the field of visual object tracking, and have achieved extremely compelling results in different competitions and benchmarks. In this paper, our goal is to review the developments of CFTs with extensive experimental results. 11 trackers are surveyed in our work, based on which a general framework is summarized. Furthermore, we investigate different training schemes for correlation filters, and also discuss various effective improvements that have been made recently. Comprehensive experiments have been conducted to evaluate the effectiveness and efficiency of the surveyed CFTs, and comparisons have been made with other competing trackers. The experimental results have shown that state-of-art performance, in terms of robustness, speed and accuracy, can be achieved by several recent CFTs, such as MUSTer and SAMF. We find that further improvements for correlation filter-based tracking can be made on estimating scales, applying part-based tracking strategy and cooperating with long-term tracking methods.
  • Chunlei Peng · Xinbo Gao · Nannan Wang · Dacheng Tao · Xuelong Li · Jie Li ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Face sketch-photo synthesis plays an important role in law enforcement and digital entertainment. Most of the existing methods only use pixel intensities as the feature. Since face images can be described using features from multiple aspects, this paper presents a novel multiple representations-based face sketch-photo-synthesis method that adaptively combines multiple representations to represent an image patch. In particular, it combines multiple features from face images processed using multiple filters and deploys Markov networks to exploit the interacting relationships between the neighboring image patches. The proposed framework could be solved using an alternating optimization strategy and it normally converges in only five outer iterations in the experiments. Our experimental results on the Chinese University of Hong Kong (CUHK) face sketch database, celebrity photos, CUHK Face Sketch FERET Database, IIIT-D Viewed Sketch Database, and forensic sketches demonstrate the effectiveness of our method for face sketch-photo synthesis. In addition, cross-database and database-dependent style-synthesis evaluations demonstrate the generalizability of this novel method and suggest promising solutions for face identification in forensic science.
    IEEE transactions on neural networks and learning systems 09/2015; DOI:10.1109/TNNLS.2015.2464681 · 4.29 Impact Factor
  • Cheng Deng · Jie Xu · Kaibing Zhang · Dacheng Tao · Xinbo Gao · Xuelong Li ·
    [Show abstract] [Hide abstract]
    ABSTRACT: For regression-based single-image super-resolution (SR) problem, the key is to establish a mapping relation between high-resolution (HR) and low-resolution (LR) image patches for obtaining a visually pleasing quality image. Most existing approaches typically solve it by dividing the model into several single-output regression problems, which obviously ignores the circumstance that a pixel within an HR patch affects other spatially adjacent pixels during the training process, and thus tends to generate serious ringing artifacts in resultant HR image as well as increase computational burden. To alleviate these problems, we propose to use structured output regression machine (SORM) to simultaneously model the inherent spatial relations between the HR and LR patches, which is propitious to preserve sharp edges. In addition, to further improve the quality of reconstructed HR images, a nonlocal (NL) self-similarity prior in natural images is introduced to formulate as a regularization term to further enhance the SORM-based SR results. To offer a computation-effective SORM method, we use a relative small nonsupport vector samples to establish the accurate regression model and an accelerating algorithm for NL self-similarity calculation. Extensive SR experiments on various images indicate that the proposed method can achieve more promising performance than the other state-of-the-art SR methods in terms of both visual quality and computational cost.
    IEEE transactions on neural networks and learning systems 09/2015; DOI:10.1109/TNNLS.2015.2468069 · 4.29 Impact Factor
  • Source
    Wenrui Hu · Dacheng Tao · Wensheng Zhang · Yuan Xie · Yehui Yang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new low-rank tensor model based on the circulant algebra, namely, twist tensor nuclear norm or t-TNN for short. The twist tensor denotes a 3-way tensor representation to laterally store 2D data slices in order. On one hand, t-TNN convexly relaxes the tensor multi-rank of the twist tensor in the Fourier domain, which allows an efficient computation using FFT. On the other, t-TNN is equal to the nuclear norm of block circulant matricization of the twist tensor in the original domain, which extends the traditional matrix nuclear norm in a block circulant way. We test the t-TNN model on a video completion application that aims to fill missing values and the experiment results validate its effectiveness, especially when dealing with video recorded by a non-stationary panning camera. The block circulant matricization of the twist tensor can be transformed into a circulant block representation with nuclear norm invariance. This representation, after transformation, exploits the horizontal translation relationship between the frames in a video, and endows the t-TNN model with a more powerful ability to reconstruct panning videos than the existing state-of-the-art low-rank models.
  • Ziqi Zhu · Xinge You · C.L. Philip Chen · Dacheng Tao · Weihua Ou · Xiubao Jiang ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Local binary patterns (LBP) achieve great success in texture analysis, however they are not robust to noise. The two reasons for such disadvantage of LBP schemes are (1) they encode the texture spatial structure based only on local information which is sensitive to noise and (2) they use exact values as the quantization thresholds, which make the extracted features sensitive to small changes in the input image. In this paper, we propose a noise-robust adaptive hybrid pattern (AHP) for noised texture analysis. In our scheme, two solutions from the perspective of texture description model and quantization algorithm have been developed to reduce the feature׳s noise sensitiveness. First, a hybrid texture description model is proposed. In this model, the global texture spatial structure which is depicted by a global description model is encoded with the primitive microfeature for texture description. Second, we develop an adaptive quantization algorithm in which equal probability quantization is utilized to achieve the maximum partition entropy. Higher noise-tolerance can be obtained with the minimum lost information in the quantization process. The experimental results of texture classification on two texture databases with three different types of noise show that our approach leads significant improvement in noised texture analysis. Furthermore, our scheme achieves state-of-the-art performance in noisy face recognition.
    Pattern Recognition 08/2015; 48(8). DOI:10.1016/j.patcog.2015.01.001 · 3.10 Impact Factor
  • Lin Zhao · Xinbo Gao · Dacheng Tao · Xuelong Li ·
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method for tracking human pose by employing max-margin Markov models. Representing a human body by part-based models such as pictorial structure, the problem of pose tracking can be modeled by a discrete Markov random field. Considering max-margin Markov networks provide an efficient way to deal with both structured data and strong generalization guarantees, it is thus natural to learn the model parameters using the max-margin technique. Since tracking human pose needs to couple limbs in adjacent frames, the model will introduce loops and will be intractable for learning and inference. Previous work has resorted to pose estimation methods, which discard temporal information by parsing frames individually. Alternatively, approximate inference strategies have been used, which can overfit to statistics of a particular dataset. Thus, the performance and generalization of these methods are limited. In this paper, we approximate the full model by introducing an ensemble of two tree-structured sub-models, Markov networks for spatial parsing and Markov chains for temporal parsing. Both models can be trained jointly using the max-margin technique, and an iterative parsing process is proposed to achieve the ensemble inference. We apply our model on three challengeable datasets which contains highly varied and articulated poses. Comprehensive experimental results demonstrate the superior performance of our method over state-of-the-art approaches.
    IEEE Transactions on Image Processing 08/2015; 24(12). DOI:10.1109/TIP.2015.2473662 · 3.63 Impact Factor
  • Source
    Tongliang Liu · Dacheng Tao ·

Publication Stats

10k Citations
910.94 Total Impact Points


  • 2010-2015
    • University of Technology Sydney 
      • Centre for Quantum Computation and Intelligent Systems (QCIS)
      Sydney, New South Wales, Australia
  • 2011-2014
    • Chinese Academy of Sciences
      • • State Key Laboratory of Transient Optics and Photonics
      • • Xi'an Institute of Optics and Precision Mechanics
      Peping, Beijing, China
    • State Key Laboratory Of Transient Optics And Photonics
      Ch’ang-an, Shaanxi, China
  • 2012
    • Xihua University
      Hua-yang, Sichuan, China
  • 2008-2012
    • Nanyang Technological University
      • School of Computer Engineering
      Tumasik, Singapore
    • Xidian University
      • School of Life Sciences and Technology
      Ch’ang-an, Shaanxi, China
    • Zhejiang University
      Hang-hsien, Zhejiang Sheng, China
  • 2007-2010
    • The University of Hong Kong
      • Department of Computer Science
      Hong Kong, Hong Kong
  • 2007-2009
    • The Hong Kong Polytechnic University
      • Department of Computing
      Hong Kong, Hong Kong
  • 2005-2009
    • Birkbeck, University of London
      • Department of Computer Science and Information Systems
      Londinium, England, United Kingdom
  • 2006-2008
    • University of London
      Londinium, England, United Kingdom
  • 2004-2005
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong