Chu-Song Chen

Chu-Song Chen
Academia Sinica · Institute of Information Science

About

178
Publications
41,179
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,069
Citations

Publications

Publications (178)
Preprint
Full-text available
Prompt Tuning has been a popular Parameter-Efficient Fine-Tuning method attributed to its remarkable performance with few updated parameters on various large-scale pretrained Language Models (PLMs). Traditionally, each prompt has been considered indivisible and updated independently, leading the parameters increase proportionally as prompt length g...
Chapter
Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods. However, the widespread availability of deepfakes has given rise to a new challenge in the form of adversarial attacks. Adversaries can manipulate deepfake videos with small, imperceptible perturbations th...
Article
Full-text available
We aimed to develop machine learning (ML)-based algorithms to assist physicians in ultrasound-guided localization of cricoid cartilage (CC) and thyroid cartilage (TC) in cricothyroidotomy. Adult female volunteers were prospectively recruited from two hospitals between September and December, 2020. Ultrasonographic images were collected via a modifi...
Article
Full-text available
Adversarial attacks have become one of the most serious security issues in widely used deep neural networks. Even though real-world datasets usually have large intra-variations or multiple modes, most adversarial defense methods, such as adversarial training, which is currently one of the most effective defense methods, mainly focus on the single-m...
Article
Neural scene representation and rendering methods have shown promise in learning the implicit form of scene structure without supervision. However, the implicit representation learned in most existing methods is non-expandable and cannot be inferred online for novel scenes, which makes the learned representation difficult to be applied across diffe...
Preprint
Full-text available
Dense depth and pose estimation is a vital prerequisite for various video applications. Traditional solutions suffer from the robustness of sparse feature tracking and insufficient camera baselines in videos. Therefore, recent methods utilize learning-based optical flow and depth prior to estimate dense depth. However, previous works require heavy...
Preprint
In visual search, the gallery set could be incrementally growing and added to the database in practice. However, existing methods rely on the model trained on the entire dataset, ignoring the continual updating of the model. Besides, as the model updates, the new model must re-extract features for the entire gallery set to maintain compatible featu...
Preprint
Full-text available
Geometry-aware modules are widely applied in recent deep learning architectures for scene representation and rendering. However, these modules require intrinsic camera information that might not be obtained accurately. In this paper, we propose a Spatial Transformation Routing (STR) mechanism to model the spatial properties without applying any geo...
Article
The goal of supervised hashing is to construct hash mappings from collections of images and semantic annotations such that semantically relevant images are embedded nearby in the learned binary hash representations. Existing deep supervised hashing approaches that employ classification frameworks with a classification training objective for learnin...
Preprint
Full-text available
This paper introduces an approach for multi-human 3D pose estimation and tracking based on calibrated multi-view. The main challenge lies in finding the cross-view and temporal correspondences correctly even when several human pose estimations are noisy. Compare to previous solutions that construct 3D poses from multiple views, our approach takes a...
Article
In this article, we propose SemanticHash, a simple and effective deep neural network model, to leverage semantic word embeddings (e.g., BERT) in hash codes learning. Both images and class labels are compressed into $K$ -bit binary vectors by using the visual (or the semantic) hash functions, which are jointly learned and aligned to optimize the s...
Conference Paper
Deep convolutional neural networks are good at accuracy while bad at efficiency. To improve the inference speed, two directions have been explored in the past, lightweight model designing and network weight pruning. Lightweight models have been proposed to improve the speed with good enough accuracy. It is, however, not trivial if we can further sp...
Preprint
Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression with weight pruning, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manne...
Preprint
It is hard to detect on-road objects under various lighting conditions. To improve the quality of the classifier, three techniques are used. We define subclasses to separate daytime and nighttime samples. Then we skip similar samples in the training set to prevent overfitting. With the help of the outside training samples, the detection accuracy is...
Article
Learning effective representations that exhibit semantic content is crucial to image retrieval applications. Recent advances in deep learning have made significant improvements in performance on a number of visual recognition tasks. Studies have also revealed that visual features extracted from a deep network learned on a large-scale image data set...
Chapter
Learning-based hashing has been widely employed for large-scale similarity retrieval due to its efficient computation and compressed storage. In this paper, we propose ResHash, a deep representation hash code learning approach to learning compact and discriminative binary codes. In ResHash, we assume that each semantic label has its own representat...
Conference Paper
Full-text available
Simultaneously running multiple modules is a key requirement for a smart multimedia system for facial applications including face recognition, facial expression understanding, and gender identification. To effectively integrate them, a continual learning approach to learn new tasks without forgetting is introduced. Unlike previous methods growing m...
Article
Image aesthetics have been a popular topic in the recent years. The users can obtain aesthetic score of any image by using the computational approaches based on photography rules. However, most approaches consider the task as an off-line process because those methods usually require high computational complexity. On the other hand, mobile devices (...
Preprint
This paper aims at recognizing partially observed human actions in videos. Action videos acquired in uncontrolled environments often contain corrupt frames, which make actions partially observed. Furthermore, these frames can last for arbitrary lengths of time and appear irregularly. They are inconsistent with training data and degrade the performa...
Preprint
Full-text available
Many face recognition systems boost the performance using deep learning models, but only a few researches go into the mechanisms for dealing with online registration. Although we can obtain discriminative facial features through the state-of-the-art deep model training, how to decide the best threshold for practical use remains a challenge. We deve...
Article
This letter presents a theory of scanning a signal with a sliding window, where the window's mapping function is built upon a convolutional neural network (CNN). When using a CNN as the sliding window, we show that the resultant feature maps are equivalent to the maps obtained by applying another CNN (called EQ-ScanNet) to the whole signal. The EQ-...
Conference Paper
Full-text available
We propose a novel method to merge convolutional neural-nets for the inference stage. Given two well-trained networks that may have different architectures that handle different tasks, our method aligns the layers of the original networks and merges them into a unified model by sharing the representative codes of weights. The shared weights are fur...
Article
Binary descriptors have been widely used for efficient image matching and retrieval. However, most existing binary descriptors are designed with hand-craft sampling patterns or learned with label annotation provided by datasets. In this paper, we propose a new unsupervised deep learning approach, called DeepBit, to learn compact binary descriptor f...
Article
This paper presents a simple yet effective supervised deep hash approach that constructs binary hash codes from labeled data for large-scale image search. We assume that the semantic labels are governed by several latent attributes with each attribute on or off, and classification relies on these attributes. Based on this assumption, our approach,...
Article
This article tackles the problem of joint estimation of human age and facial expression. This is an important yet challenging problem because expressions can alter face appearances in a similar manner to human aging. Different from previous approaches that deal with the two tasks independently, our approach trains a convolutional neural network (CN...
Article
This article addresses the problem of recognizing partially observed human actions. Videos of actions acquired in the real world often contain corrupt frames caused by various factors. These frames may appear irregularly, and make the actions only partially observed. They change the appearance of actions and degrade the performance of pretrained re...
Conference Paper
Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., Im...
Article
This paper presents an algorithm for ego-positioning by using a low-cost monocular camera for systems based on the Internet-of-Vehicles. To reduce the computational and memory requirements, as well as the communication load, we tackle the model compression task as a weighted k-cover problem for better preserving the critical structures. For real-wo...
Conference Paper
Full-text available
Clothing retrieval and clothing style recognition are important and practical problems. They have drawn a lot of attention in recent years. However, the clothing photos collected in existing datasets are mostly of front- or near-front view. There are no datasets designed to study the influences of different viewing angles on clothing retrieval perf...
Conference Paper
Full-text available
In this paper, we propose a new unsupervised deep learning approach called DeepBit to learn compact binary de-scriptor for efficient visual object matching. Unlike most existing binary descriptors which were designed with random projections or linear hash functions, we develop a deep neural network to learn binary descriptors in an unsuper-vised ma...
Article
This paper presents an effective approach for detecting abandoned luggage in surveillance videos. We combine short- and long-term background models to extract foreground objects, where each pixel in an input image is classified as a 2-bit code. Subsequently, we introduce a framework to identify static foreground regions based on the temporal transi...
Article
Full-text available
This paper presents a supervised deep hashing approach that constructs binary hash codes from labeled data for large-scale image search. We assume that semantic labels are governed by a set of latent attributes in which each attribute can be on or off, and classification relies on these attributes. Based on this assumption, our approach, dubbed sup...
Conference Paper
Full-text available
This paper deals with the problem of clothing retrieval in a recommendation system. We develop a hierarchical deep search framework to tackle this problem. We use a pre-trained network model that has learned rich mid-level visual representations in module 1. Then, in module 2, we add a latent layer to the network and have neurons in this layer to l...
Article
Augmented reality (AR) displays become more and more popular recently, because of its high intuitiveness for humans and high-quality head-mounted display have rapidly developed. To achieve such displays with augmented information, highly accurate image registration or ego-positioning are required, but little attention have been paid for out-door en...
Conference Paper
This paper aims at improving the recognition of 3D push-hand gesture, which can trigger a target selection command with our hands in the air. Although general 3D push-gesture recognizers have been developed and widely used for this purpose, a severe weakness of the current push-recognizers is that they are instable to askew-pushing problems that ha...
Conference Paper
Full-text available
Golgi outposts (GOPs) that transport proteins in both the anterograde and retrograde directions play an important role in determining the dendritic morphology in developing neurons. To obtain their heterogeneous motion patterns, we present a data association based framework that first detects the GOPs and then links the detection responses. In the...
Article
This study presents a cost-sensitive ordinal hyperplanes ranking algorithm for human age estimation based on face images. The proposed approach exploits relative-order information among the age labels for rank prediction. In our approach, the age rank is obtained by aggregating a series of binary classification results, where cost sensitivities amo...
Conference Paper
Background subtraction is a crucial component in visual surveillance, which has been studied over years. However, an efficient algorithm that can tolerate the environment changes such as dynamic backgrounds and sudden changes of illumination is still demanding. In this paper, we design an innovative framework called the spatiotemporal background ex...
Conference Paper
Full-text available
We present an abandoned object detection system in this paper. A finite-state-machine model is introduced to extract stationary foregrounds in a scene for visual surveillance, where the state value of each pixel is inferred via the cooperation of short-term and long-term background models constructed in the proposed approach. To identify the left-l...
Article
Sparse representation has been widely used in machine learning, signal processing and communications. K-SVD, which generalizes k-means clustering, is one of the most famous algorithms for sparse representation and dictionary learning. K-SVD is an iterative method that alternates between encoding the data sparsely by using the current dictionary and...
Conference Paper
In this paper, we propose a framework that estimates the discrete intensity rank of a facial expression based on a single image. For most people, judging whether an expression is more intense than others is easier than determining its real-valued intensity degree, and hence the relative order of two expressions is more distinguishable than the exac...
Conference Paper
Nowadays, ever expanding camera network makes it difficult to find the suspect from lengthy video records. This paper proposes a target-driven video summarization framework which provides two-step Filtered Summarized Video (FSV) for tracing suspects. Before the target is identified, users can find the target efficiently using the firststep FSV of a...
Conference Paper
Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. Thus, a powerful feature descriptor with local-deformation tolerance ability and discriminating capability is essential to fulfill all...
Article
We introduce the intrinsic illumination subspace and its application for lighting insensitive face recognition in this paper. The intrinsic illumination subspace is constructed from illumination images of intrinsic images, which is a midlevel description of appearance images and can be useful for many visual inferences. This subspace forms a convex...
Article
Full-text available
To track targets across networked cameras with disjoint views, one of the major problems is to learn the spatio-temporal relationship and the appearance relationship, where the appearance relationship is usually modeled as a brightness transfer function. Traditional methods learning the relationships by using either hand-labeled correspondence or b...
Conference Paper
Full-text available
In this paper, we propose an ordinal hyperplane ranking algorithm called OHRank, which estimates human ages via facial images. The design of the algorithm is based on the relative order information among the age labels in a database. Each ordinal hyperplane separates all the facial images into two groups according to the relative order, and a cost-...
Conference Paper
Full-text available
In this study, we introduce a new cosegmentation approach, MOMI-cosegmentation, to segment multiple objects that repeatedly appear among multiple images. The proposed approach tackles a more general problem than conventional cosegmentation methods. Each of the shared objects may even appear more than one time in one image. The key idea of MOMI-cose...
Conference Paper
Full-text available
Turning Rust into Gold is inspired by a Chinese antique Mao-Kung Ting (cauldron) treasured by the National Palace Museum in Taiwan. Having a five-hundred-character inscription cast inside, and its weathered appearance made the Mao-Kung very unique. Motivated by revealing the great nature of the artifact and interpreting it into a meaningful narrati...
Conference Paper
Full-text available
In our daily life, it is much easier to distinguish which person is elder between two persons than how old a person is. When inferring a person's age, we may compare his or her face with many people whose ages are known, resulting in a series of comparative results, and then we conjecture the age based on the comparisons. This process involves nume...
Article
We propose a novel motion segmentation algorithm based on mixture of Dirichlet process (MDP) models. In contrast to previous approaches, we consider motion segmentation and its model selection regarding to the number of motion models as an inseparable problem. Our algorithm can simultaneously infer the number of motion models, estimate the cluster...