About
137
Publications
45,143
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,907
Citations
Publications
Publications (137)
Cross-modal retrieval for chest X-ray images and diagnostic reports is the automated process of fetching reports or related images from an extensive medical records database using specific queries. Current methods for cross-modal retrieval of chest X-ray images and diagnostic reports often struggle with the fine-grained semantic representation alig...
With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-...
Wild desert grasslands are characterized by diverse habitats, uneven plant distribution, similarities among plant class, and the presence of plant shadows. However, the existing models for detecting plant species in desert grasslands exhibit low precision, require a large number of parameters, and incur high computational cost, rendering them unsui...
Knowledge distillation (KD) is a prevalent model compression technique in deep learning, aiming to leverage knowledge from a large teacher model to enhance the training of a smaller student model. It has found success in deploying compact deep models in intelligent applications like intelligent transportation, smart health, and distributed intellig...
In recent years, various distillation methods for semantic segmentation have been proposed. However, these methods typically train the student model to imitate the intermediate features or logits of the teacher model directly, thereby overlooking the high-discrepancy regions learned by both models, particularly the differences in instance edges. In...
Sequential recommendation seeks to understand user preferences based on their past actions and predict future interactions with items. Recently, several techniques for sequential recommendation have emerged, primarily leveraging graph convolutional networks (GCNs) for their ability to model relationships effectively. However, real-world scenarios o...
In the assembly of consumer electronic products, target detection methods offer details on the target’s location and category, but precise positioning with the robotic arm’s end-effector demands pixel-level edge contour data of the target. As a result, we’ve developed U2Net-MGP, a lightweight and efficient visual perception model. This model effect...
Sequential recommendations make an attempt to predict the next item that a user will interact with based on their historical behavior sequence. Recently, considering the relationship learning ability of graph convolutional network (GCN), a number of GCN-based sequence recommendation models have emerged. However, in real-world applications, sparse i...
Point cloud completion is prevalent due to the insufficient results from current point cloud acquisition equipments, where a large number of point data failed to represent a relatively complete shape. Existing point cloud completion algorithms, mostly encoder‐decoder structures with grids transform (also presented as folding operation), can hardly...
Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted...
Graph neural networks (GNNs) have garnered significant attention for their ability to effectively process graph-related data. Most existing methods assume that the input graph is noise-free; however, this assumption is frequently violated in real-world scenarios, resulting in impaired graph representations. To address this issue, we start from the...
The feature representation learning is the basic task that plays an important role in artificial intelligence, data mining and robotics [...]
Deep dictionary learning (DDL) aims to learn dictionaries at different levels and the deepest level representations. However, existing DDL algorithms impose a
$l_{1}$
-norm constraint on the deepest level representations, ignoring the constraints on different level representations. Meanwhile, they fail to discover effectively the essential discri...
Jianping Gou Xin He Lan Du- [...]
Zhang Yi
Deep dictionary learning (DDL) shows good performance in visual classification tasks. However, almost all existing DDL methods ignore the locality relationships between the input data representations and the learned dictionary atoms, and learn sub-optimal representations in the feature coding stage, which are less conducive to classification. To th...
The application of Auto-Encoder (AE) to multi-view representation learning has gained traction due to advancements in deep learning. While some current AE-based multi-view representation learning algorithms incorporate the geometric structure of the input data into their feature representation learning process, their use of a shallow structured gra...
Recently, deep dictionary learning (DDL) has aroused attention due to its abilities of learning multiple different dictionaries and extracting multi-level abstract feature representations for samples. It has been applied to many intelligent recognition tasks, such as vehicle detection, traffic sign recognition and driver monitoring. Nevertheless, t...
Knowledge distillation (KD) is a powerful and widely applicable technique for the compression of deep learning models. The main idea of knowledge distillation is to transfer knowledge from a large teacher model to a small student model, where the attention mechanism has been intensively explored in regard to its great flexibility for managing diffe...
Knowledge distillation (KD), as an efficient and effective model compression technique, has received considerable attention in deep learning. The key to its success is about transferring knowledge from a large teacher network to a small student network. However, most existing KD methods consider only one type of knowledge learned from either instan...
Learning graph embeddings for high-dimensional data is an important technology for dimensionality reduction. The learning process is expected to preserve the discriminative and geometric information of high-dimensional data in a new low-dimensional subspace via either manual or automatic graph construction. Although both manual and automatic graph...
In recent years, deep dictionary learning (DDL)has attracted a great amount of attention due to its effectiveness for representation learning and visual recognition.~However, most existing methods focus on unsupervised deep dictionary learning, failing to further explore the category information.~To make full use of the category information of diff...
Cross-modal hashing methods have attracted considerable attention due to their low memory usage and high query speed in large-scale cross-modal retrieval. During the encoding process, there still remains two crucial bottlenecks: how to equip hash codes with cross-modal semantic information, and how to rapidly obtain hash codes. In this paper, we pr...
Collaborative representation-based classification (CRC), as a typical kind of linear representation-based classification, has attracted more attention due to the effective and efficient pattern classification performance. However, the existing class-specific representations are not competitively learned from collaborative representation for achievi...
K-nearest neighbor rule (KNN) has been regarded as one of the top 10 methods in the field of data mining. Due to its simplicity and effectiveness, it has been widely studied and applied to various classification tasks. In this article, we develop a novel representation coefficient-based k-nearest centroid neighbor method (RCKNCN), which aims to fur...
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as th...
Cross-modal hashing is an effective cross-modal retrieval approach because of its low storage and high efficiency. However, most existing methods mainly utilize pre-trained networks to extract modality-specific features, while ignore the position information and lack information interaction between different modalities. To address those problems, i...
As a simple yet effective model compression method, knowledge distillation (or KD) is used to learn a small lightweight student network by transferring valuable knowledge from a pretrained cumbersome teacher network. However, existing KD methods usually consider the feature knowledge either in different layers or individual samples, failing to expl...
Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation....
How to represent and classify a testing sample for the representation-based classification (RBC) plays an important role in the filed of pattern recognition. As a typical kind of the representation-based classification with promising performance, collaborative representation-based classification (CRC) adopts all the training samples to collaborativ...
Deep neural networks have achieved a great success in a variety of applications, such as self-driving cars and intelligent robotics. Meanwhile, knowledge distillation has received increasing attention as an effective model compression technique for training very efficient deep models. The performance of the student network obtained through knowledg...
In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with...
Dictionary-based classification has been promising in knowledge discovery from image data, due to its good performance and interpretable theoretical system. Dictionary learning effectively supports both small- and large-scale datasets, while its robustness and performance depends on the atoms of the dictionary most of the time. Empirically, using a...
Representation‐based classification (RBC) has been attracting a great deal of attention in pattern recognition. As a typical extension to RBC, collaborative representation‐based classification (CRC) has demonstrated its superior performance in various image classification tasks. Ideally, we expect that the learned class‐specific representations for...
Knowledge distillation (KD), as an efficient and effective model compression technique, has been receiving considerable attention in deep learning. The key to its success is to transfer knowledge from a large teacher network to a small student one. However, most of the existing knowledge distillation methods consider only one type of knowledge lear...
Component analysis (CA) is a powerful technique for learning discriminative representations in various computer vision tasks. Typical CA methods are essentially based on the covariance matrix of training data. But, the covariance matrix has obvious disadvantages such as failing to model complex relationship among features and singularity in small s...
Graph embedding plays an important role in dimensionality reduction for processing the high-dimensional data. In graph embedding, its keys are the different kinds of graph constructions that determine the performance of dimensionality reduction. Inspired by this fact, in this article we propose a novel graph embedding method named the double graphs...
Canonical correlation analysis (CCA) is a popular and powerful technique for two-view dimension reduction and feature extraction. But, CCA is not able to directly handle more than two view data and has a rigorous assumption that all the samples from two different views are paired. However, practical multiple view data are often semi-paired. To addr...
Image classification is a fundamental component in modern computer vision systems, where sparse representation-based classification has drawn a lot of attention due to its robustness. However, on the optimization of sparse learning systems, regularization and data augmentation are both powerful, but currently isolated. We believe that regularizatio...
Data augmentation has been utilized to improve the accuracy and robustness of face recognition algorithms. However, most of the previous studies focused on using the augmentation techniques to enlarge the feature set, while the diversity produced by the virtual samples lacked sufficient attention. In sparse dictionary learning-based face recognitio...
Image classification is a hot technique applied in many multimedia systems, where both l1 and l2 regularizations have shown potential for robust sparse representation-based image classification. However, previous studies showed that l1 or l2 alone cannot ensure a robust result. The robustness of a classifier depends on the nature of the dataset mos...
Graph embedding has attracted much more research interests in dimensionality reduction. In this study, based on collaborative representation and graph embedding, the authors propose a new linear dimensionality reduction method called collaborative representation‐based locality preserving projection (CRLPP). In the CRLPP, they assume that the simila...
Collaborative representation-based classification (CRC) is one of the famous representation-based classification methods in pattern recognition. However, a testing sample in most of the CRC variants is collaboratively reconstructed by a linear combination of all the training samples from all the classes, the training samples from the class that the...
In this paper, a novel semi-supervised manifold alignment approach via multiple graph embeddings (MA-MGE) is proposed. Different from the traditional manifold alignment algorithms that use a single graph embedding to learn the latent manifold structure of each data set, our approach utilizes multiple graph embeddings to learn a joint latent manifol...
Cross-modal retrieval has been attracted attentively in the past years. Recently, the collective matrix factorization was proposed to learn the common representations for cross-modal retrieval based on assumption that the pairwise data from different modalities should have the same common semantic representations. However, this unified common repre...
In recent years, deep neural networks have been very successful in the fields of both industry and academia, especially for the applications of visual recognition and neural language processing. The great success of deep learning mainly owes to its great scalabilities to both large-scale data samples and billions of model parameters. However, it al...
Cross-modal retrieval aims to search the semantically similar instances from the other modalities by giving a query from one modality. Recently, generative adversarial networks (GANs) has been proposed to model the joint distribution over the data from different modalities and to learn the common representations for cross-modal retrieval. However,...
Cross-modal retrieval aims to search the semantically similar instances from the other modalities given a query from one modality. However, the differences of the distributions and representations between different modalities make that the similarity of different modalities can not be measured directly. To address this problem, in this paper, we pr...
Representation-based classification (RBC) has attracted much attention in pattern recognition. As a linear representative RBC method, collaborative representation-based classification (CRC) is very promising for classification. Although many extensions of CRC have been developed recently, the discriminative and competitive representations of differ...
Collaborative representation-based classification (CRC) is a famous representation-based classification method in pattern recognition. Recently, many variants of CRC have been designed for many classification tasks with the good classification performance. However, most of them ignore the inter-class pattern discrimination among the class-specific...
Currently, object detectors based on CNN, such as RetinaNet, Faster-RCNN, CornerNet series, can achieve good performance, but have some common drawbacks, like large calculation cost, high model complexity and slow detection speed. In this paper, a new lightweight object detector is proposed, which adopted a density-based approach to merge the real...
Recently, collaborative representation-based classification (CRC) and its many variations have been widely applied for various classification tasks in pattern recognition. To further enhance the pattern discrimination of CRC, in this article we propose a novel extension of CRC, entitled discriminative, competitive, and collaborative representation-...
Two-dimensional partial least squares (2DPLS) is an effective two-view data analysis technique. However, conventional 2DPLS only takes into account the column information of two-dimensional images. In this paper, we simultaneously consider the column-wise and row-wise information of two-dimensional face images. We first propose a row-based two-dime...
Canonical correlation analysis (CCA) is a widely used linear unsupervised subspace learning method. However, standard CCA works with vectorized representation of image matrix, which loses the spatial structure information of image data. In addition, a real-world observation often simultaneously belongs to multiple distinct classes with different de...
Graph embedding in dimensionality reduction has attracted much attention in the high-dimensional data analysis. Graph construction in graph embedding plays an important role in the quality of dimensionality reduction. However, the discrimination information and the geometrical distributions of data samples are not fully exploited for discovering th...
Collaborative representation-based classification has shown promising results on cognitive vision tasks like face recognition. It solves a linear problem with \(l_1\) or \(l_2\) norm regularization to obtain a stable sparse representation. Previous studies showed that the collaboration representation assisted the output of optimum sparsity constrai...
Representation-based classification (RBC) methods have recently been the promising pattern recognition technologies for object recognition. The representation coefficients of RBC as the linear reconstruction measure (LRM) can be well used for classifying objects. In this article, we propose two enhanced linear reconstruction measure-based classific...
In this paper, we propose a composite nonlinear multiset canonical correlation projections (CNMCPs) framework where orthogonal constraints are imposed in each set. This makes CNMCP capable of learning uncorrelated low‐dimensional features with minimum redundancy in Hilbert space. With the CNMCP framework, we further present a particular algorithm c...
Visual Question Answering (VQA) is an increasingly popular research area in machine learning. Most of the existing VQA tasks only focus on static images, and only a few models are based on videos. The primary purpose of this project is to develop an innovative model that performs Affective Question Answering on Video (AQAV), a multi-tasking archite...
Multiview nonnegative matrix has shown many promising applications in computer vision and pattern recognition. However, most existing works focus on the view consistency and ignore the discrimination. In this paper, we introduce a novel discriminative multiview nonnegative matrix (DMultiNMF) algorithm to learn discriminative and consistent represen...
The probabilistic collaborative representation-based classification (PCRC), as a novel extension of collaborative representation-based classification (CRC), is a promising method in pattern recognition. In this article, we adopt the coarse to fine representation to propose two-phase probabilistic collaborative representation based-classification (T...
K-nearest neighbor classification method (KNN), as one of the top 10 algorithms in data mining, is a very simple and yet effective nonparametric technique for pattern recognition. However, due to the selective sensitiveness of the neighborhood size k, the simple majority vote, and the conventional metric measure, the KNN-based classification perfor...
As an important task of artificial intelligence, natural language conversation has attracted wide attention of researchers in natural language processing. Existing works in this field mainly focus on consistency of neural response generation whilst ignoring the effect of emotion state on the response generation. In this paper, we propose an Emotion...
Sparse Representation-based Classifier (SRC) and Dictionary Learning (DL), have significantly impacted greatly on the classification performance of image recognition in recent times. In video semantic analysis, the locality structure of video semantic data containing more discriminative information is very essential for classification. However, thi...
K-nearest neighbor rule (KNN) is one of the most widely used methods in pattern recognition. However, the KNN-based classification performance is severely affected by the sensitivity of the neighborhood size k and the simple majority voting in the regions of k-neighborhoods, especially in the case of the small sample size with the existing outliers...
Empirical studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel method of dynamically building diversified sparse ensembles. We first apply a technique known as the canonical correlation to model the re...
Graph embedding is a very useful dimensionality reduction technique in pattern recognition. In this article, we develop a novel discriminative dimensionality reduction technique entitled sparsity and geometry preserving graph embedding (SGPGE). SGPGE can not only capture the sparse reconstructive relationships among training samples, but also disco...
As the representative one of representation-based classification methods, collaborative representation-based classification (CRC) has drawn much attention in pattern recognition and machine learning recently. Moreover, the collaborative representation-based face recognition has been extensively studied because of the effective classification perfor...
Sparse representation-based classification (SRC) and collaborative representation-based classification (CRC) have shown promising classification results. Both methods are distance-based classifiers, and they represent a test sample with coefficients solved by different sparsity regularizations. The reason why the representation coefficient vector c...
In recent years, sparse representation has attracted a blooming interest in the areas of pattern recognition, image processing, and computer vision. In video semantic analysis, the diversity of scene for the same semantic content in video always exists. Using dictionary learning in sparse representation can capture the latent relationship among the...
Collaborative representation (CR) is one of the well-known representation methods and has been widely used in computer vision and pattern recognition. The collaborative representation-based classification (CRC) and its extension called the probabilistic collaborative representation-based classification (PCRC) have obtained promising performance in...
The concept of Visual Question Answering (VQA) has recently attracted the attention of many researchers in the field of machine learning. Different attention models have been proposed in VQA for the purpose of addressing the need to focus on local regions of an image. This paper proposes the concept of Mood-Aware Visual Question Answering (MAVQA) u...
This study proposes a novel dimensionality reduction (DR) method for multi-view datasets. The principal component analysis (PCA) idea of minimising least squares reconstruction errors is extended to consider both data distribution and penalty weights called dictionary to recover outliers free global structures from missing and noisy data points. In...
The K-nearest neighbour classifier is very effective and simple non-parametric technique in pattern classification; however, it only considers the distance closeness, but not the geometricalplacement of the k neighbors. Also, its classification performance is highly influenced by the neighborhood size k and existing outliers. In this paper, we prop...
K-nearest neighbor (KNN) rule is a well-known non-parametric classifier that is widely used in pattern recognition. However, the sensitivity of the neighborhood size k always seriously degrades the KNN-based classification performance, especially in the case of the small sample size with the existing outliers. To overcome this issue, in this articl...
Collaborative representation-based classification (CRC) is a distance based method, and it obtains the original contributions from all samples to solve the sparse representation coefficient. We find out that it helps to enhance the discrimination in classification by integrating other distance based features and/or adding signal preprocessing to th...
This paper presents an angle and density-based data preprocessing method. It can be used to simultaneously identify outliers and boundary points (called uniformly boundary points). Detecting boundary points is often more interesting than detecting normal points, since they represent valid, interesting, and potentially valuable patterns. An efficien...
Ensemble regression method shows better performance than single regression since ensemble regression method can combine several single regression methods together to improve accuracy and stability of a single regressor. In this paper, we propose a novel kernel ensemble regression method by minimizing total least square loss in multiple Reproducing...
One of the biggest problems in deep learning is its difficulty to retain consistent robustness when transferring the model trained on one dataset to another dataset. To conquer the problem, deep transfer learning was implemented to execute various vision tasks by using a pre-trained deep model in a diverse dataset. However, the robustness was often...
In this article we propose several two-phase representation-based classification (RBC) methods that are inspired by the idea of the two-phase test sample sparse representation (TPTSR) method with L2-norm. We first introduce two simple extensions of TPTSR using L1-norm alone and the combination of L1-norm and L2-norm, respectively. We then propose t...
Questions
Questions (2)
I am glad to join in the group and will talk about some questions obout intelligence with other group members.tks!