Dacheng Tao

State Key Laboratory Of Transient Optics And Photonics, Ch’ang-an, Shaanxi, China

Are you Dacheng Tao?

Claim your profile

Publications (480)943.78 Total impact

  • Source

    Full-text · Dataset · Feb 2016
  • Richang Hong · Luming Zhang · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Photo enhancement refers to the process of increasing the aesthetic appeal of a photo, such as changing the photo aspect ratio and spatial recomposition. It is a widely used technique in the printing industry, graphic design, and cinematography. In this paper, we propose a unified and socially-aware photo enhancement framework which can leverage the experience of photographers with various aesthetic topics (e.g., portrait and landscape). We focus on photos from the image hosting site Flickr, which has 87 million users and to which more than 3.5 million photos are uploaded daily. First, a tag-wise regularized topic model is proposed to describe the aesthetic topic of each Flickr user, and coherent and interpretable topics are discovered by leveraging both the visual features and tags of photos. Next, a graph is constructed to describe the similarities in aesthetic topics between users. Noticeably, densely connected users have similar aesthetic topics, which are categorized into different communities by a dense subgraph mining algorithm. Lastly, a probabilistic model is exploited to enhance the aesthetic attractiveness of a test photo by leveraging the photographic experiences of Flickr users from the corresponding communities of that photo. Paired-comparison-based user studies show that our method performs competitively on photo retargeting and recomposition. Moreover, our approach accurately detects aesthetic communities in a photo set crawled from nearly 100,000 Flickr users.
    No preview · Article · Jan 2016 · IEEE Transactions on Image Processing
  • Tongliang Liu · Dacheng Tao · Dong Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: The $k$-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative $k$-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, $k$-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the $k$-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data is mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for $k$-dimensional coding schemes that are tighter than dimensionality-independent bounds when data is in a finite-dimensional feature space? The answer is positive. In this paper, we address this problem and derive a dimensionality-dependent generalization bound for $k$-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order $\mathcal{O}\left(\left(mk\ln(mkn)/n\right)^{\lambda_n}\right)$, where $m$ is the dimension of features, $k$ is the number of the columns in the linear implementation of coding schemes, $n$ is the size of sample, $\lambda_n>0.5$ when $n$ is finite and $\lambda_n=0.5$ when $n$ is infinite. We show that our bound can be tighter than previous results, because it avoids inducing the worst-case upper bound on $k$ of the loss function and converges faster. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to these dimensionality-independent generalization bounds.
    No preview · Article · Jan 2016
  • Source
    Shaoli Huang · Zhe Xu · Dacheng Tao · Ya Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the fine-grained recognition process by modeling subtle differences from object parts. Based on manually-labeled strong part annotations, the proposed architecture consists of a fully convolutional network to locate multiple object parts and a two-stream classification network that en- codes object-level and part-level cues simultaneously. By adopting a set of sharing strategies between the computation of multiple object parts, the proposed architecture is very efficient running at 20 frames/sec during inference. Experimental results on the CUB-200-2011 dataset reveal the effectiveness of the proposed architecture, from both the perspective of classification accuracy and model interpretability.
    Full-text · Article · Dec 2015
  • Zhe Xu · Zhibin Hong · Ya Zhang · Junjie Wu · Ah Chung Tsoi · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present Multinomial Latent Logistic Regression (MLLR), a new learning paradigm that introduces latent variables to logistic regression. By inheriting the advantages of logistic regression, MLLR is efficiently optimized using second-order derivatives and provides effective probabilistic analysis on output predictions. MLLR is particularly effective in weakly-supervised settings where the latent variable has an exponential number of possible values. The effectiveness of MLLR is demonstrated on four different image understanding applications, including a new challenging architectural style classification task. Furthermore, we show that MLLR can be generalized to general structured output prediction, and in doing so we provide a thorough investigation of the connections and differences between MLLR and existing related algorithms, including latent structural SVMs and hidden CRFs.
    No preview · Article · Dec 2015 · IEEE Transactions on Image Processing
  • Xianglong Liu · Cheng Deng · Bo Lang · Dacheng Tao · Xuelong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In our paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both naive construction methods and state-of-the-art hashing algorithms.
    No preview · Article · Dec 2015 · IEEE Transactions on Image Processing
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, many consumer videos are captured by portable devices such as iPhone. Different from constrained videos that are produced by professionals, e.g., those for broadcast, summarizing multiple handheld videos from a same scenery is a challenging task. This is because: 1) these videos have dramatic semantic and style variances, making it difficult to extract the representative key frames; 2) the handheld videos are with different degrees of shakiness, but existing summarization techniques cannot alleviate this problem adaptively; and 3) it is difficult to develop a quality model that evaluates a video summary, due to the subjectiveness of video quality assessment. To solve these problems, we propose perceptual multiattribute optimization which jointly refines multiple perceptual attributes (i.e., video aesthetics, coherence, and stability) in a multivideo summarization process. In particular, a weakly supervised learning framework is designed to discover the semantically important regions in each frame. Then, a few key frames are selected based on their contributions to cover the multivideo semantics. Thereafter, a probabilistic model is proposed to dynamically fit the key frames into an aesthetically pleasing video summary, wherein its frames are stabilized adaptively. Experiments on consumer videos taken from sceneries throughout the world demonstrate the descriptiveness, aesthetics, coherence, and stability of the generated summary.
    No preview · Article · Dec 2015 · Cybernetics, IEEE Transactions on
  • Kun Zeng · Jun Yu · Ruxin Wang · Cuihua Li · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Sparse coding has been widely applied to learning-based single image super-resolution (SR) and has obtained promising performance by jointly learning effective representations for low-resolution (LR) and high-resolution (HR) image patch pairs. However, the resulting HR images often suffer from ringing, jaggy, and blurring artifacts due to the strong yet ad hoc assumptions that the LR image patch representation is equal to, is linear with, lies on a manifold similar to, or has the same support set as the corresponding HR image patch representation. Motivated by the success of deep learning, we develop a data-driven model coupled deep autoencoder (CDA) for single image SR. CDA is based on a new deep architecture and has high representational capability. CDA simultaneously learns the intrinsic representations of LR and HR image patches and a big-data-driven function that precisely maps these LR representations to their corresponding HR representations. Extensive experimentation demonstrates the superior effectiveness and efficiency of CDA for single image SR compared to other state-of-the-art methods on Set5 and Set14 datasets.
    No preview · Article · Dec 2015 · Cybernetics, IEEE Transactions on

  • No preview · Article · Dec 2015
  • Jiankang Deng · Qingshan Liu · Jing Yang · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic face alignment is a fundamental step in facial image analysis. However, this problem continues to be challenging due to the large variability of expression, illumination, occlusion, pose, and detection drift in the real-world face images. In this paper, we present a multi-view, multi-scale and multi-component cascade shape regression(M3CSR) model for robust face alignment. Firstly, face view is estimated according to the deformable facial parts for learning view specified CSR, which can decrease the shape variance, alleviate the drift of face detection and accelerate shape convergence. Secondly, multi-scale HoG features are used as the shape-index features to incorporate local structure information implicitly, and a multi-scale optimization strategy is adopted to avoid trapping in local optimum. Finally, a component-based shape refinement process is developed to further improve the performance of face alignment. Extensive experiments on the IBUG dataset and the 300-W challenge dataset demonstrate the superiority of the proposed method over the state-of-the-art methods.
    No preview · Article · Dec 2015 · Image and Vision Computing
  • Qingshan Liu · Jiankang Deng · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem continues to be challenging due to the large variability in expression, illumination, and pose, and the existence of occlusions in real-world face images. In this paper, we present a dual sparse constrained cascade regression model for robust face alignment. Instead of using the least-squares method during the training process of regressors, sparse constraint is introduced to select robust features and compress the size of the model. Moreover, sparse shape constraint is incorporated between each cascade regression, and the explicit shape constraints are able to suppress the ambiguity in local features. To improve the model's adaptation to large pose variation, face pose is estimated by five fiducial landmarks located by deep convolutional neuron network, which is used to adaptively design the cascade regression model. To our best knowledge, this is the first attempt to fuse explicit shape constraint (sparse shape constraint) and implicit context information (sparse feature selection) for robust face alignment in the framework of cascade regression. Extensive experiments on nine challenging wild data sets demonstrate the advantages of the proposed method over the state-of-the-art methods.
    No preview · Article · Nov 2015 · IEEE Transactions on Image Processing
  • [Show abstract] [Hide abstract]
    ABSTRACT: How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified best-versus-second-best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
    No preview · Article · Nov 2015 · Cybernetics, IEEE Transactions on
  • [Show abstract] [Hide abstract]
    ABSTRACT: Principal component analysis (PCA) is widely applied in various areas, one of the typical applications is in face. Many versions of PCA have been developed for face recognition. However, most of these approaches are sensitive to grossly corrupted entries in a 2D matrix representing a face image. In this paper, we try to reduce the influence of grosses like variations in lighting, facial expressions and occlusions to improve the robustness of PCA. In order to achieve this goal, we present a simple but effective unsupervised preprocessing method, two-dimensional whitening reconstruction (TWR), which includes two stages: 1) A whitening process on a 2D face image matrix rather than a concatenated 1D vector; 2) 2D face image matrix reconstruction. TWR reduces the pixel redundancy of the internal image, meanwhile maintains important intrinsic features. In this way, negative effects introduced by gross-like variations are greatly reduced. Furthermore, the face image with TWR preprocessing could be approximate to a Gaussian signal, on which PCA is more effective. Experiments on benchmark face databases demonstrate that the proposed method could significantly improve the robustness of PCA methods on classification and clustering, especially for the faces with severe illumination changes.
    No preview · Article · Nov 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Chang Xu · Dacheng Tao · Chao Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: It is practical to assume that an individual view is unlikely to be sufficient for effective multi-view learning. Therefore, integration of multi-view information is both valuable and necessary. In this paper, we propose the Multi-view Intact Space Learning (MISL) algorithm, which integrates the encoded complementary information in multiple views to discover a latent intact representation of the data. Even though each view on its own is insufficient, we show theoretically that by combing multiple views we can obtain abundant information for latent intact space learning. Employing the Cauchy loss (a technique used in statistical learning) as the error measurement strengthens robustness to outliers. We propose a new definition of multi-view stability and then derive the generalization error bound based on multi-view stability and Rademacher complexity, and show that the complementarity between multiple views is beneficial for the stability and generalization. MISL is efficiently optimized using a novel Iteratively Reweight Residuals (IRR) technique, whose convergence is theoretically analyzed. Experiments on synthetic data and real-world datasets demonstrate that MISL is an effective and promising algorithm for practical applications.
    No preview · Article · Nov 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Yong Luo · Yonggang Wen · Dacheng Tao · Jie Gui · Chao Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We therefore propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization and each sub-problem can be efficiently solved. Experiments on two challenging real-world image datasets demonstrate the effectiveness and superiority of the proposed method.
    No preview · Article · Nov 2015 · IEEE Transactions on Image Processing
  • Dacheng Tao · Maoying Qiao · Wei Bian · Yi Da Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
    No preview · Article · Nov 2015 · IEEE Transactions on Knowledge and Data Engineering
  • Source
    Changxing Ding · Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
    Full-text · Article · Nov 2015 · IEEE Transactions on Multimedia
  • Lefei Zhang · Qian Zhang · Liangpei Zhang · Dacheng Tao · Xin Huang · Bo Du
    [Show abstract] [Hide abstract]
    ABSTRACT: In computer vision and pattern recognition researches, the studied objects are often characterized by multiple feature representations with high dimensionality, thus it is essential to encode that multiview feature into a unified and discriminative embedding that is optimal for a given task. To address this challenge, this paper proposes an ensemble manifold regularized sparse low-rank approximation (EMR-SLRA) algorithm for multiview feature embedding. The EMR-SLRA algorithm is based on the framework of least-squares component analysis, in particular, the low dimensional feature representation and the projection matrix are obtained by the low-rank approximation of the concatenated multiview feature matrix. By considering the complementary property among multiple features, EMR-SLRA simultaneously enforces the ensemble manifold regularization on the output feature embedding. In order to further enhance its robustness against the noise, the group sparsity is introduced into the objective formulation to impose direct noise reduction on the input multiview feature matrix. Since there is no closed-form solution for EMR-SLRA, this paper provides an efficient EMR-SLRA optimization procedure to obtain the output feature embedding. Experiments on the pattern recognition applications confirm the effectiveness of the EMR-SLRA algorithm compare with some other multiview feature dimensionality reduction approaches.
    No preview · Article · Oct 2015 · Pattern Recognition
  • Chang Xu · Dacheng Tao · Chao Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: One underlying assumption of conventional multi-view learning algorithms is that all examples can be successfully observed on all the views. However, due to various failures or faults in collecting and pre-processing the data on different views, we are more likely to be faced with an incomplete-view setting, where an example could be missing its representation on one view (i.e., missing view) or could be only partially observed on that view (i.e., missing variables). Low-rank assumption used to be effective for recovering the random missing variables of features, but it is disabled by concentrated missing variables and has no effect on missing views. This paper suggests that the key to handling the incomplete-view problem is to exploit the connections between multiple views, enabling the incomplete views to be restored with the help of the complete views. We propose an effective algorithm to accomplish multi-view learning with incomplete views by assuming that different views are generated from a shared subspace. To handle the large scale problem and obtain fast convergence, we investigate a successive overrelaxation (SOR) method to solve the objective function. Convergence of the optimization technique is theoretically analyzed. Experimental results on toy data and real-world datasets suggest that studying the incomplete-view problem in multi-view learning is significant and that the proposed algorithm can effectively handle the incomplete views in different applications.
    No preview · Article · Oct 2015 · IEEE Transactions on Image Processing
  • Dacheng Tao · Chaoqun Hong · Jun Yu · Jian Wan · Meng Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Video-based human pose recovery is usually conducted by retrieving relevant poses using image features. In the retrieving process, the mapping between 2D images and 3D poses is assumed to be linear in most of traditional methods. However, their relationships are inherently non-linear, which limits recovery performance of these methods. In this paper, we propose a novel pose recovery method using non-linear mapping with multi-layered deep neural network. It is based on feature extraction with multimodal fusion and back-propagation deep learning. In multimodal fusion, we construct hypergraph Laplacian with low-rank representation. In this way, we obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix. In back-propagation deep learning, we learn a non-linear mapping from 2D images to 3D poses with parameter fine-tuning. Experimental results on three datasets show that the recovery error has been reduced by 20% to 25%, which demonstrates the effectiveness of the proposed method.
    No preview · Article · Oct 2015 · IEEE Transactions on Image Processing

Publication Stats

11k Citations
943.78 Total Impact Points

Institutions

  • 2011-2015
    • State Key Laboratory Of Transient Optics And Photonics
      Ch’ang-an, Shaanxi, China
  • 2010-2015
    • University of Technology Sydney 
      • • Centre for Quantum Computation and Intelligent Systems (QCIS)
      • • Faculty of Engineering and Information Technology
      Sydney, New South Wales, Australia
  • 2011-2014
    • Chinese Academy of Sciences
      • • State Key Laboratory of Transient Optics and Photonics
      • • Xi'an Institute of Optics and Precision Mechanics
      Peping, Beijing, China
  • 2012
    • Xihua University
      Hua-yang, Sichuan, China
  • 2008-2012
    • Xidian University
      • School of Life Sciences and Technology
      Ch’ang-an, Shaanxi, China
    • Nanyang Technological University
      • School of Computer Engineering
      Tumasik, Singapore
    • Zhejiang University
      Hang-hsien, Zhejiang Sheng, China
  • 2007-2010
    • The University of Hong Kong
      • Department of Computer Science
      Hong Kong, Hong Kong
  • 2007-2009
    • The Hong Kong Polytechnic University
      • Department of Computing
      Hong Kong, Hong Kong
  • 2005-2009
    • Birkbeck, University of London
      • Department of Computer Science and Information Systems
      Londinium, England, United Kingdom
  • 2006-2008
    • University of London
      Londinium, England, United Kingdom
  • 2004-2005
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong