Chien-Yao Wang

Chien-Yao Wang
Academia Sinica · Institute of Information Science

About

47
Publications
59,407
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,444
Citations
Citations since 2017
37 Research Items
2441 Citations
201720182019202020212022202302004006008001,0001,2001,400
201720182019202020212022202302004006008001,0001,2001,400
201720182019202020212022202302004006008001,0001,2001,400
201720182019202020212022202302004006008001,0001,2001,400

Publications

Publications (47)
Preprint
We propose a post-processor, called NeighborTrack, that leverages neighbor information of the tracking target to validate and improve single-object tracking (SOT) results. It requires no additional data or retraining. Instead, it uses the confidence score predicted by the backbone SOT network to automatically derive neighbor information and then us...
Preprint
Full-text available
Designing a high-efficiency and high-quality expressive network architecture has always been the most important research topic in the field of deep learning. Most of today's network design strategies focus on how to integrate features extracted from different layers, and how to design computing units to effectively extract these features, thereby e...
Preprint
Full-text available
The paper presents a new method, SearchTrack, for multiple object tracking and segmentation (MOTS). To address the association problem between detected objects, SearchTrack proposes object-customized search and motion-aware features. By maintaining a Kalman filter for each object, we encode the predicted motion into the motion-aware feature, which...
Preprint
Full-text available
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS...
Article
Purpose: Retinopathy screening via digital imaging is promising for early detection and timely treatment, and tracking retinopathic abnormality over time can help to reveal the risk of disease progression. We developed an innovative physician-oriented artificial intelligence-facilitating diagnosis aid system for retinal diseases for screening multi...
Preprint
Full-text available
People ``understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using the...
Preprint
Full-text available
We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4-large model achiev...
Article
Music information retrieval is of great interest in audio signal processing. However, relatively little attention has been paid to the playing techniques of musical instruments. This work proposes an automatic system for classifying guitar playing techniques (GPTs). Automatic classification for GPTs is challenging because some playing techniques di...
Preprint
Full-text available
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale d...
Article
Full-text available
This paper proposes two novel deep convolutional neural networks (CNN), called sparse coding convolutional neural network (SC-CNN) and multi-channel SC-CNN (MSC-CNN), to address the problem of sound event recognition and retrieval task. Unlike the general framework of a CNN, in which feature learning process is performed hierarchically, the propose...
Preprint
Full-text available
State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this top-down pathway cannot preserve accurate object positions due to the...
Preprint
Full-text available
Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPN...
Article
Automatic sound event recognition (SER) has recently attracted renewed interest. Although practical SER system has many useful applications in everyday life, SER is challenging owing to the variations among sounds and noises in the real-world environment. This work presents a novel feature extraction and classification method to solve the problem o...
Conference Paper
Full-text available
This paper proposes a novel deep convolutional neural network (CNN), called sparse coding convolutional neural network (SC-CNN), to address the problem of sound event recognition and retrieval task. Unlike the general framework of a CNN, in which feature learning process is performed hierarchically, the proposed framework models the whole memorizin...
Article
Full-text available
This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale...
Conference Paper
Full-text available
This research proposes a novel Bayesian sparse representation (BSR) method along with extracting facial parameters of SIFT to create sparse dictionaries, which are invariant to rotation, scale, and shift. By using K-means and information theory, a new dictionary called extended dictionary is developed. Compared with conventional orthogonal matching...
Conference Paper
This paper concerns the development of locality-preserving methods for object recognition. The major purpose is consideration of both descriptor-level locality and image-level locality throughout the recognition process. Two dual-layer locality-preserving methods are developed, in which locality-constrained linear coding (LLC) is used to represent...
Conference Paper
Full-text available
In this paper, we have reviewed the main process of visual lip reading and lip motion password (simply called lip-password hereinafter) verification that is the useful and flexible method to apply in many applications, especially in security field since it can do double checks to verify both the speaker and his/her password. The reviewed content in...
Conference Paper
This paper proposes a modified Bayesian Sensing Hidden Markov Model (BS-HMM) to address the problem of hand gestures recognition on few labeled data. In this work, BS-HMM is investigated based on its success to address the problem of large-vocabulary of continuous speech recognition. We introduced error modeling into BS-HMM basis vector to handle t...
Conference Paper
This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing th...
Conference Paper
In this paper, we propose a visual-based vehicle classification system, in which it involves visual feature representation and classification step. In the feature representation step, we present a center enhanced spatial pyramid matching (CE-SPM) to extract the feature from images. In this work, we defined additional region in the center of each im...
Article
This brief presents an efficient very-large-scale integration architecture design for convolutive blind source separation (CBSS). The CBSS separation network derived from the information maximization (Infomax) approach is adopted. The proposed CBSS chip design consists mainly of Infomax filtering modules and scaling factor computation modules. In a...

Network

Cited By