Ke Cheng’s research while affiliated with Chinese Academy of Sciences and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (5)


MENet: A Memory-Based Network with Dual-Branch for Efficient Event Stream Processing
  • Chapter

November 2022

·

23 Reads

·

1 Citation

Lecture Notes in Computer Science

·

·

Ke Cheng

·

[...]

·

Hanqing Lu

Event cameras are bio-inspired sensors that asynchronously capture per-pixel brightness change and trigger a stream of events instead of frame-based images. Each event stream is generally split into multiple sliding windows for subsequent processing. However, most existing event-based methods ignore the motion continuity between adjacent spatiotemporal windows, which will result in the loss of dynamic information and additional computational costs. To efficiently extract strong features for event streams containing dynamic information, this paper proposes a novel memory-based network with dual-branch, namely MENet. It contains a base branch with a full-sized event point-wise processing structure to extract the base features and an incremental branch equipped with a light-weighted network to capture the temporal dynamics between two adjacent spatiotemporal windows. For enhancing the features, especially in the incremental branch, a point-wise memory bank is designed, which sketches the representative information of event feature space. Compared with the base branch, the incremental branch reduces the computational complexity up to 5 times and improves the speed by 19 times. Experiments show that MENet significantly reduces the computational complexity compared with previous methods while achieving state-of-the-art performance on gesture recognition and object recognition.KeywordsEvent-based modelDual-branch structureMemory bank


PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient

July 2022

·

29 Reads

Knowledge distillation(KD) is a widely-used technique to train compact models in object detection. However, there is still a lack of study on how to distill between heterogeneous detectors. In this paper, we empirically find that better FPN features from a heterogeneous teacher detector can help the student although their detection heads and label assignments are different. However, directly aligning the feature maps to distill detectors suffers from two problems. First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student. Second, the FPN stages and channels with large feature magnitude from the teacher model could dominate the gradient of distillation loss, which will overwhelm the effects of other features in KD and introduce much noise. To address the above issues, we propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features. Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs. Furthermore, it converges faster. With a powerful MaskRCNN-Swin detector as the teacher, ResNet-50 based RetinaNet and FCOS achieve 41.5% and 43.9% mAP on COCO2017, which are 4.1\% and 4.8\% higher than the baseline, respectively.


Towards Fully Sparse Training: Information Restoration with Spatial Similarity

June 2022

·

20 Reads

·

1 Citation

Proceedings of the AAAI Conference on Artificial Intelligence

The 2:4 structured sparsity pattern released by NVIDIA Ampere architecture, requiring four consecutive values containing at least two zeros, enables doubling math throughput for matrix multiplications. Recent works mainly focus on inference speedup via 2:4 sparsity while training acceleration has been largely overwhelmed where backpropagation consumes around 70% of the training time. However, unlike inference, training speedup with structured pruning is nontrivial due to the need to maintain the fidelity of gradients and reduce the additional overhead of performing 2:4 sparsity online. For the first time, this article proposes fully sparse training (FST) where `fully' indicates that ALL matrix multiplications in forward/backward propagation are structurally pruned while maintaining accuracy. To this end, we begin with saliency analysis, investigating the sensitivity of different sparse objects to structured pruning. Based on the observation of spatial similarity among activations, we propose pruning activations with fixed 2:4 masks. Moreover, an Information Restoration block is proposed to retrieve the lost information, which can be implemented by efficient gradient-shift operation. Evaluation of accuracy and efficiency shows that we can achieve 2× training acceleration with negligible accuracy degradation on challenging large-scale classification and detection tasks.


Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition

November 2020

·

138 Reads

·

290 Citations

Lecture Notes in Computer Science

In skeleton-based action recognition, graph convolutional networks (GCNs) have achieved remarkable success. Nevertheless, how to efficiently model the spatial-temporal skeleton graph without introducing extra computation burden is a challenging problem for industrial deployment. In this paper, we rethink the spatial aggregation in existing GCN-based skeleton action recognition methods and discover that they are limited by coupling aggregation mechanism. Inspired by the decoupling aggregation mechanism in CNNs, we propose decoupling GCN to boost the graph modeling ability with no extra computation, no extra latency, no extra GPU memory cost, and less than 10% extra parameters. Another prevalent problem of GCNs is over-fitting. Although dropout is a widely used regularization technique, it is not effective for GCNs, due to the fact that activation units are correlated between neighbor nodes. We propose DropGraph to discard features in correlated nodes, which is particularly effective on GCNs. Moreover, we introduce an attention-guided drop mechanism to enhance the regularization effect. All our contributions introduce zero extra computation burden at deployment. We conduct experiments on three datasets (NTU-RGBD, NTU-RGBD-120, and Northwestern-UCLA) and exceed the state-of-the-art performance with less computation cost.


Citations (3)


... According to the previous work [55] and Ampere architecture equipped with sparse tensor cores [47, 46,45], currently there exists technical support for matrix multiplication with 50% fine-grained sparsity [55] 3 . Therefore, SSAM of 50% sparse perturbation has great potential to achieve true training acceleration via sparse back-propagation. ...

Reference:

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Towards Fully Sparse Training: Information Restoration with Spatial Similarity
  • Citing Article
  • June 2022

Proceedings of the AAAI Conference on Artificial Intelligence

... Similarly, Pose refinement graph convolutional network(PR-GCN) [32] achieved a good balance between accuracy and network parameters through the operation of gradually fusing motion and spatial information. In GCN, the vertex connectivity relations of the skeleton graph contain important information, many researchers are interested in topology-based modeling methods, and such research work can be divided into two categories: 1) whether the topological information can be dynamically adjusted during inference, which can be divided into static methods [10] and dynamic methods [15]; 2) whether the topology is shared in different channels [32][33][34], it can be further divided into topology shared methods and topology unshared methods. Recently, transformer-based networks have show great potential on skeleton-based action recognition tasks. ...

Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition
  • Citing Chapter
  • November 2020

Lecture Notes in Computer Science

... CrossGLG (Yan et al. 2024) uses LLM descriptions of actions and cross-attention to guide a skeleton encoder during training, but only uses the skeleton encoder at inference. Closest to our work is STAR (Chen et al. 2024), which aligns a Shift-GCN (Cheng et al. 2020) skeleton encoder with a pre-trained transformer-based text encoder (Radford et al. 2021) for zero-shot skeleton action recognition. STAR differs from our work in that it does not incorporate the RGB modality or investigate strategies to enhance the representations of VLMs and LVLMs. ...

Skeleton-Based Action Recognition With Shift Graph Convolutional Network
  • Citing Conference Paper
  • June 2020