Dacheng Tao

University of Technology Sydney , Sydney, New South Wales, Australia

Are you Dacheng Tao?

Claim your profile

Publications (418)704.16 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Local binary patterns (LBP) achieve great success in texture analysis, however they are not robust to noise. The two reasons for such disadvantage of LBP schemes are (1) they encode the texture spatial structure based only on local information which is sensitive to noise and (2) they use exact values as the quantization thresholds, which make the extracted features sensitive to small changes in the input image. In this paper, we propose a noise-robust adaptive hybrid pattern (AHP) for noised texture analysis. In our scheme, two solutions from the perspective of texture description model and quantization algorithm have been developed to reduce the feature׳s noise sensitiveness. First, a hybrid texture description model is proposed. In this model, the global texture spatial structure which is depicted by a global description model is encoded with the primitive microfeature for texture description. Second, we develop an adaptive quantization algorithm in which equal probability quantization is utilized to achieve the maximum partition entropy. Higher noise-tolerance can be obtained with the minimum lost information in the quantization process. The experimental results of texture classification on two texture databases with three different types of noise show that our approach leads significant improvement in noised texture analysis. Furthermore, our scheme achieves state-of-the-art performance in noisy face recognition.
    Pattern Recognition 08/2015; DOI:10.1016/j.patcog.2015.01.001 · 2.58 Impact Factor
  • Yong Luo, Tongliang Liu, Dacheng Tao, Chao Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: There is growing interest in multi-label image classification due to its critical role in web-based image analytics-based applications, such as large-scale image retrieval and browsing. Matrix completion has recently been introduced as a method for transductive (semi-supervised) multi-label classification, and has several distinct advantages, including robustness to missing data and background noise in both feature and label space. However, it is limited by only considering data represented by a single-view feature, which cannot precisely characterize images containing several semantic concepts. To utilize multiple features taken from different views, we have to concatenate the different features as a long vector. But this concatenation is prone to overfitting and often leads to very high time complexity in MC based image classification. Therefore, we propose to weightedly combine the MC outputs of different views, and present the multi-view matrix completion (MVMC) framework for transductive multilabel image classification. To learn the view combination weights effectively, we apply a cross validation strategy on the labeled set. In particular, MVMC splits the labeled set into two parts, and predicts the labels of one part using the known labels of the other part. The predicted labels are then used to learn the view combination coefficients. In the learning process, we adopt the average precision (AP) loss, which is particular suitable for multi-label image classification, since the ranking based criteria are critical for evaluating a multi-label classification system. A least squares loss formulation is also presented for the sake of efficiency, and the robustness of the algorithm based on the AP loss compared with the other losses is investigated. Experimental evaluation on two real world datasets (PASCAL VOC' 07 and MIR Flickr) demonstrate the effectiveness of MVMC for transductive (semi-supervised) multi-label image classification, and show that MVMC can exploit complementary properties of different features and output-consistent labels for improved multi-label image classification.
    IEEE Transactions on Image Processing 04/2015; DOI:10.1109/TIP.2015.2421309 · 3.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An auroral substorm is an important geophysical phenomenon that reflects the interaction between the solar wind and the Earth's magnetosphere. Detecting substorms is of practical significance in order to prevent disruption to communication and global positioning systems. However, existing detection methods can be inaccurate or require time-consuming manual analysis and are therefore impractical for large-scale data sets. In this paper, we propose an automatic auroral substorm detection method based on a shape-constrained sparse and low-rank decomposition (SCSLD) framework. Our method automatically detects real substorm onsets in large-scale aurora sequences, which overcomes the limitations of manual detection. To reduce noise interference inherent in current SLD methods, we introduce a shape constraint to force the noise to be assigned to the low-rank part (stationary background), thus ensuring the accuracy of the sparse part (moving object) and improving the performance. Experiments conducted on aurora sequences in solar cycle 23 (1996-2008) show that the proposed SCSLD method achieves good performance for motion analysis of aurora sequences. Moreover, the obtained results are highly consistent with manual analysis, suggesting that the proposed automatic method is useful and effective in practice.
    IEEE transactions on neural networks and learning systems 03/2015; DOI:10.1109/TNNLS.2015.2411613 · 4.37 Impact Factor
  • Lin Zhao, Xinbo Gao, Dacheng Tao, Xuelong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: We investigate the tracking of 2-D human poses in a video stream to determine the spatial configuration of body parts in each frame, but this is not a trivial task because people may wear different kinds of clothing and may move very quickly and unpredictably. The technology of pose estimation is typically applied, but it ignores the temporal context and cannot provide smooth, reliable tracking results. Therefore, we develop a tracking and estimation integrated model (TEIM) to fully exploit temporal information by integrating pose estimation with visual tracking. However, joint parsing of multiple articulated parts over time is difficult, because a full model with edges capturing all pairwise relationships within and between frames is loopy and intractable. In previous models, approximate inference was usually resorted to, but it cannot promise good results and the computational cost is large. We overcome these problems by exploring the idea of divide and conquer, which decomposes the full model into two much simpler tractable submodels. In addition, a novel two-step iteration strategy is proposed to efficiently conquer the joint parsing problem. Algorithmically, we design TEIM very carefully so that: 1) it enables pose estimation and visual tracking to compensate for each other to achieve desirable tracking results; 2) it is able to deal with the problem of tracking loss; and 3) it only needs past information and is capable of tracking online. Experiments are conducted on two public data sets in the wild with ground truth layout annotations, and the experimental results indicate the effectiveness of the proposed TEIM framework.
    IEEE transactions on neural networks and learning systems 03/2015; DOI:10.1109/TNNLS.2015.2411287 · 4.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. In this paper, a new local image descriptor, termed Distinctive Efficient Robust Features, or DERF, is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells (P-GCs) in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function Difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG (TF-DoG) which leads to much better performance. Extensive experiments conducted in the image matching task on the Multiview Stereo Correspondence Data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
    IEEE Transactions on Image Processing 03/2015; 24(8). DOI:10.1109/TIP.2015.2409739 · 3.11 Impact Factor
  • Lin Zhao, Xinbo Gao, Dacheng Tao, Xuelong Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Articulated human pose estimation in unconstrained conditions is a great challenge. We propose a deep structure that represents a human body in different granularity from coarse-to-fine for better detecting parts and describing spatial constrains between different parts. Typical approaches for this problem just utilize a single level structure, which is difficult to capture various body appearances and hard to model high-order part dependencies. In this paper, we build a three layer Markov network to model the body structure that separates the whole body to poselets (combined parts) then to parts representing joints. Parts at different levels are connected through a parent-child relationship to represent high-order spatial relationships. Unlike other multi-layer models, our approach explores more reasonable granularity for part detection and sophisticatedly designs part connections to model body configurations more effectively. Moreover, each part in our model contains different types so as to capture a wide range of pose modes. And our model is a tree structure, which can be trained jointly and favors exact inference. Extensive experimental results on two challenging datasets show the performance of our model improving or being on-par with state-of-the-art approaches.
    Signal Processing 03/2015; 108:36–45. DOI:10.1016/j.sigpro.2014.07.031 · 2.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Various sparse-representation-based methods have been proposed to solve tracking problems, and most of them employ least squares (LSs) criteria to learn the sparse representation. In many tracking scenarios, traditional LS-based methods may not perform well owing to the presence of heavy-tailed noise. In this paper, we present a tracking approach using an approximate least absolute deviation (LAD)-based multitask multiview sparse learning method to enjoy robustness of LAD and take advantage of multiple types of visual features, such as intensity, color, and texture. The proposed method is integrated in a particle filter framework, where learning the sparse representation for each view of the single particle is regarded as an individual task. The underlying relationship between tasks across different views and different particles is jointly exploited in a unified robust multitask formulation based on LAD. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components that enable a more robust and accurate approximation. We show that the proposed formulation can be effectively approximated by Nesterov's smoothing method and efficiently solved using the accelerated proximal gradient method. The presented tracker is implemented using four types of features and is tested on numerous synthetic sequences and real-world video sequences, including the CVPR2013 tracking benchmark and ALOV$++$ data set. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared with several state-of-the-art trackers.
    IEEE transactions on neural networks and learning systems 02/2015; DOI:10.1109/TNNLS.2015.2399233 · 4.37 Impact Factor
  • Source
    Changxing Ding, Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, pose-invariant face recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this paper, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, i.e., pose-robust feature extraction approaches, multi-view subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed.
  • Source
    Yuan Gao, Miaojing Shi, Dacheng Tao, Chao Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: The bag-of-visual-words (BoW) model is effective for representing images and videos in many computer vision problems, and achieves promising performance in image retrieval. Nevertheless, the level of retrieval efficiency in a large-scale database is not acceptable for practical usage. Considering that the relevant images in the database of a given query are more likely to be distinctive than ambiguous, this paper defines “database saliency” as the distinctiveness score calculated for every image to measure its overall “saliency” in the database. By taking advantage of database saliency, we propose a saliency-inspired fast image retrieval scheme, S-sim, which significantly improves efficiency while retains state-of-the-art accuracy in image retrieval. There are two stages in S-sim: the bottom-up saliency mechanism computes the database saliency value of each image by hierarchically decomposing a posterior probability into local patches and visual words, the concurrent information of visual words is then bottom-up propagated to estimate the distinctiveness, and the top-down saliency mechanism discriminatively expands the query via a very low-dimensional linear SVM trained on the top-ranked images after initial search, ranking images are then sorted on their distances to the decision boundary as well as the database saliency values. We comprehensively evaluate S-sim on common retrieval benchmarks, e.g., Oxford and Paris datasets. Thorough experiments suggest that, because of the offline database saliency computation and online low-dimensional SVM, our approach significantly speeds up online retrieval and outperforms the state-of-the-art BoW-based image retrieval schemes.
    IEEE Transactions on Multimedia 02/2015; 17(3):359-369. DOI:10.1109/TMM.2015.2389616 · 1.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Canonical correlation analysis (CCA) has proven an effective tool for two-view dimension reduction due to its profound theoretical foundation and success in practical applications. In respect of multi-view learning, however, it is limited by its capability of only handling data represented by two-view features, while in many real-world applications, the number of views is frequently many more. Although the ad hoc way of simultaneously exploring all possible pairs of features can numerically deal with multi-view data, it ignores the high order statistics (correlation information) which can only be discovered by simultaneously exploring all features. Therefore, in this work, we develop tensor CCA (TCCA) which straightforwardly yet naturally generalizes CCA to handle the data of an arbitrary number of views by analyzing the covariance tensor of the different views. TCCA aims to directly maximize the canonical correlation of multiple (more than two) views. Crucially, we prove that the multi-view canonical correlation maximization problem is equivalent to finding the best rank-1 approximation of the data covariance tensor, which can be solved efficiently using the well-known alternating least squares (ALS) algorithm. As a consequence, the high order correlation information contained in the different views is explored and thus a more reliable common subspace shared by all features can be obtained. In addition, a non-linear extension of TCCA is presented. Experiments on various challenge tasks, including large scale biometric structure prediction, internet advertisement classification and web image annotation, demonstrate the effectiveness of the proposed method.
  • Ya Li, Xinmei Tian, Mingli Song, Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: With the explosive growth of the use of imagery, visual recognition plays an important role in many applications and attracts increasing research attention. Given several related tasks, single-task learning learns each task separately and ignores the relationships among these tasks. Different from single-task learning, multi-task learning can explore more information to learn all tasks jointly by using relationships among these tasks. In this paper, we propose a novel multi-task learning model based on the proximal support vector machine. The proximal support vector machine uses the large-margin idea as does the standard Support Vector Machines but with looser constraints and much lower computational cost. Our multi-task proximal support vector machine inherits the merits of the proximal support vector machine and achieves better performance compared with other popular multi-task learning models. Experiments are conducted on several multi-task learning datasets, including two classification datasets and one regression dataset. All results demonstrate the effectiveness and efficiency of our proposed multi-task proximal support vector machine.
    Pattern Recognition 02/2015; DOI:10.1016/j.patcog.2015.01.014 · 2.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: With much attention from both academia and industrial communities, visual search reranking has recently been proposed to refine image search results obtained from text-based image search engines. Most of the traditional reranking methods cannot capture both relevance and diversity of the search results at the same time. Or they ignore the hierarchical topic structure of search result. Each topic is treated equally and independently. However, in real applications, images returned for certain queries are naturally in hierarchical organization, rather than simple parallel relation. In this paper, a new reranking method "topic-aware reranking (TARerank)" is proposed. TARerank describes the hierarchical topic structure of search results in one model, and seamlessly captures both relevance and diversity of the image search results simultaneously. Through a structured learning framework, relevance and diversity are modeled in TARerank by a set of carefully designed features, and then the model is learned from human-labeled training samples. The learned model is expected to predict reranking results with high relevance and diversity for testing queries. To verify the effectiveness of the proposed method, we collect an image search dataset and conduct comparison experiments on it. The experimental results demonstrate that the proposed TARerank outperforms the existing relevance-based and diversified reranking methods.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph Laplacian has been widely exploited in traditional graph-based semisupervised learning (SSL) algorithms to regulate the labels of examples that vary smoothly on the graph. Although it achieves a promising performance in both transductive and inductive learning, it is not effective for handling ambiguous examples (shown in Fig. 1). This paper introduces deformed graph Laplacian (DGL) and presents label prediction via DGL (LPDGL) for SSL. The local smoothness term used in LPDGL, which regularizes examples and their neighbors locally, is able to improve classification accuracy by properly dealing with ambiguous examples. Theoretical studies reveal that LPDGL obtains the globally optimal decision function, and the free parameters are easy to tune. The generalization bound is derived based on the robustness analysis. Experiments on a variety of real-world data sets demonstrate that LPDGL achieves top-level performance on both transductive and inductive settings by comparing it with popular SSL algorithms, such as harmonic functions, AnchorGraph regularization, linear neighborhood propagation, Laplacian regularized least square, and Laplacian support vector machine.
    IEEE transactions on neural networks and learning systems 01/2015; DOI:10.1109/TNNLS.2014.2376936 · 4.37 Impact Factor
  • Changxing Ding, Chang Xu, Dacheng Tao
    [Show abstract] [Hide abstract]
    ABSTRACT: Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable of handling the full range of pose variations within 90 of yaw. The proposed framework first transforms the original pose-invariant face recognition problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is then developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the proposed multitask learning scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace. Lastly, face matching is performed at patch level rather than at the holistic level. Extensive and systematic experimentation on FERET, CMU-PIE, and Multi-PIE databases shows that the proposed method consistently outperforms single-task based baselines as well as state-of-the-art methods for the pose problem. We further extend the proposed algorithm for the unconstrained face verification problem and achieve top level performance on the challenging LFW dataset.
    IEEE Transactions on Image Processing 01/2015; 24(3). DOI:10.1109/TIP.2015.2390959 · 3.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective: Glaucoma is an irreversible chronic eye disease that leads to vision loss. As it can be slowed down through treatment, detecting the disease in time is important. However, many patients are unaware of the disease because it progresses slowly without easily noticeable symptoms. Currently, there is no effective method for low cost population-based glaucoma detection or screening. Recent studies have shown that automated optic nerve head assessment from 2D retinal fundus images is promising for low cost glaucoma screening. In this paper, we propose a method for cup to disc ratio (CDR) assessment using 2D retinal fundus images. Methods: In the proposed method, the optic disc is first segmented and reconstructed using a novel sparse dissimilarity-constrained coding (SDC) approach which considers both the dissimilarity constraint and the sparsity constraint from a set of reference discs with known CDRs. Subsequently, the reconstruction coefficients from the SDC are used to compute the CDR for the testing disc. Results: The proposed method has been tested for CDR assessment in a database of 650 images with CDRs manually measured by trained professionals previously. Experimental results show an average CDR error of 0.064 and correlation coefficient of 0.67 compared with the manual CDRs, better than the state-of-theart methods. Our proposed method has also been tested for glaucoma screening. The method achieves areas under curve of 0.83 and 0.88 on datasets of 650 and 1676 images, respectively, outperforming other methods. Conclusion: The proposed method achieves good accuracy for glaucoma detection. Significance: The method has a great potential to be used for large-scale populationbased glaucoma screening.
    IEEE transactions on bio-medical engineering 01/2015; 62(5). DOI:10.1109/TBME.2015.2389234 · 2.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Example learning-based super-resolution (SR) algorithms show promise for restoring a high-resolution (HR) image from a single low-resolution (LR) input. The most popular approaches, however, are either time- or space-intensive, which limits their practical applications in many resource-limited settings. In this paper we propose a novel computationally efficient single image SR method that learns multiple linear mappings (MLM) to directly transform LR feature subspaces into HR subspaces. Specifically, we first partition the large non-linear feature space of LR images into a cluster of linear subspaces. Multiple LR subdictionaries are then learned, followed by inferring the corresponding HR subdictionaries based on the assumption that the LR-HR features share the same representation coefficients. We establish MLM from the input LR features to the desired HR outputs in order to achieve fast yet stable SR recovery. Furthermore, in order to suppress displeasing artifacts generated by the MLM-based method, we apply a fast non-local means (NLM) algorithm to construct a simple yet effective similaritybased regularization term for SR enhancement. Experimental results indicate that our approach is both quantitatively and qualitatively superior to other application-oriented SR methods, while maintaining relatively low time and space complexity.
    IEEE Transactions on Image Processing 01/2015; 24(3). DOI:10.1109/TIP.2015.2389629 · 3.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Whereas the transform coding algorithms have been proved to be efficient and practical for grey-level and color images compression, they could not directly deal with the hyperspectral images (HSI) by simultaneously considering both the spatial and spectral domains of the data cube. The aim of this paper is to present an HSI compression and reconstruction method based on the multi-dimensional or tensor data processing approach. By representing the observed hyperspectral image cube to a 3-order-tensor, we introduce a tensor decomposition technology to approximately decompose the original tensor data into a core tensor multiplied by a factor matrix along each mode. Thus, the HSI is compressed to the core tensor and could be reconstructed by the multi-linear projection via the factor matrices. Experimental results on particular applications of hyperspectral remote sensing images such as unmixing and detection suggest that the reconstructed data by the proposed approach significantly preserves the HSI׳s data quality in several aspects.
    Neurocomputing 01/2015; 147:358–363. DOI:10.1016/j.neucom.2014.06.052 · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce an efficient tensor to vector projection algorithm for human gait feature representation and recognition. The proposed approach is based on the multi-dimensional or tensor signal processing technology, which finds a low-dimensional tensor subspace of original input gait sequence tensors while most of the data variation has been well captured. In order to further enhance the class separability and avoid the potential overfitting, we adopt a discriminative locality preserving projection with sparse regularization to transform the refined tensor data to the final vector feature representation for subsequent recognition. Numerous experiments are carried out to evaluate the effectiveness of the proposed sparse and discriminative tensor to vector projection algorithm, and the proposed method achieves good performance for human gait recognition using the sequences from the University of South Florida (USF) HumanID Database.
    Signal Processing 01/2015; 106:245–252. DOI:10.1016/j.sigpro.2014.08.005 · 2.24 Impact Factor
  • Pattern Recognition 01/2015; DOI:10.1016/j.patcog.2014.12.016 · 2.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Distance metric learning (DML) is successful in discovering intrinsic relations in data. However, most algorithms are computationally demanding when the problem size becomes large. In this paper, we propose a discriminative metric learning algorithm, develop a distributed scheme learning metrics on moderate-sized subsets of data, and aggregate the results into a global solution. The technique leverages the power of parallel computation. The algorithm of the aggregated DML (ADML) scales well with the data size and can be controlled by the partition. We theoretically analyze and provide bounds for the error induced by the distributed treatment. We have conducted experimental evaluation of the ADML, both on specially designed tests and on practical image annotation tasks. Those tests have shown that the ADML achieves the state-of-the-art performance at only a fraction of the cost incurred by most existing methods.
    IEEE transactions on neural networks and learning systems 12/2014; DOI:10.1109/TNNLS.2014.2377211 · 4.37 Impact Factor

Publication Stats

8k Citations
704.16 Total Impact Points

Institutions

  • 2010–2015
    • University of Technology Sydney 
      • • Centre for Quantum Computation and Intelligent Systems (QCIS)
      • • Faculty of Engineering and Information Technology
      Sydney, New South Wales, Australia
  • 2008–2012
    • Zhejiang University
      • College of Computer Science and Technology
      Hangzhou, Zhejiang Sheng, China
    • Tianjin University
      • Department of Electronic Information Engineering
      T’ien-ching-shih, Tianjin Shi, China
  • 1970–2012
    • Nanyang Technological University
      • School of Computer Engineering
      Tumasik, Singapore
  • 2011
    • Wuhan University
      • State Key Laboratory of Information engineering in Surveying, Mapping and Remote Sensing
      Wuhan, Hubei, China
    • National University of Defense Technology
      • National Key Laboratory of Parallel and Distributed Processing
      Ch’ang-sha-shih, Hunan, China
    • State Key Laboratory Of Transient Optics And Photonics
      Ch’ang-an, Shaanxi, China
    • Xiamen University
      • Department of Computer Science
      Xiamen, Fujian, China
  • 2010–2011
    • Chinese Academy of Sciences
      • Xi'an Institute of Optics and Precision Mechanics
      Peping, Beijing, China
  • 2009–2011
    • Xidian University
      • School of Life Sciences and Technology
      Ch’ang-an, Shaanxi, China
  • 2007–2010
    • The University of Hong Kong
      • Department of Computer Science
      Hong Kong, Hong Kong
  • 2007–2009
    • The Hong Kong Polytechnic University
      • Department of Computing
      Hong Kong, Hong Kong
  • 2005–2009
    • Birkbeck, University of London
      • Department of Computer Science and Information Systems
      Londinium, England, United Kingdom
  • 2006–2007
    • University of London
      Londinium, England, United Kingdom
    • The University of Sheffield
      • Department of Electronic and Electrical Engineering
      Sheffield, ENG, United Kingdom
  • 2001–2006
    • Tongji Hospital
      Wu-han-shih, Hubei, China
  • 2004–2005
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong