Siyi Shuai’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Fig. 1: Results of human action obtained using real-time pose estimation algorithm. Different frames show different posture and joints position (a) boxing action and joints movement, (b) pull up action and joints movement
Fig. 2: Coordinate points of human joints
Fig. 3: Outline of the proposed human action classification approach.
Fig. 4: Matrix obtained from all extracted coordinates of all frames
Fig. 5: Proposed convolution neural network model.

+2

Action Classification Based on 2D Coordinates Obtained by Real-time Pose Estimation
  • Conference Paper
  • Full-text available

February 2019

·

3,887 Reads

·

3 Citations

Siyi Shuai

·

·

Junichi Miyao

·

Human action classification is a significant issue in the computer vision field. To retrieve essential information from a large number of videos, understanding the content of the videos is very important. In this study, we propose an approach that classifies human actions based on the coordinate information of the body parts. The extracted key coordinate points from each frame based on the real-time pose estimation algorithm are accumulated as the matrix. Then these accumulated coordinates are used to feed the convolutional neural network (CNN) to classify human actions, that is the main contribution of this study. This study is designed to ignore the background, and just consider the movement information of the joints of the extracted poses. CNN is designed to consist of three convolutional layers, pooling layer and linear layer to extract the most relevant features for classifying the human actions. We use two benchmark dataset to validate the performance of our proposed approach. The human action classification performance of our proposed approach using six different types of actions achieves very high accuracy (100%), which is higher than the other competitive approaches using KTH dataset.

Download

Fig. 2: Outline of the proposed method. 
Fig. 5: Examples of the bag-of-visual-words feature vectors (histograms) for 6 scenes. 
Fig. 6: Scenes classification performances by Linear SVM with different C parameters. 
Scene Classification based on Histogram of Detected Objects

February 2018

·

372 Reads

Video content analysis has been a hot topic for computer vision researchers. As the videos on the Internet grow larger, hundreds of hours of video are generated per minute on the YouTube website. It is necessary to study video-related algorithms to help us to better handle these videos. In this paper, we propose a method to classify the scenes of the video clips based on their context. To extract context information from objects in each frame are detected by using YOLO9000 object detection algorithm [1] and the detected results of all frames in the video clip are accumulated as histogram similar with a bag-of-words method which is one of the most popular methods for text classification. Then support vector machine (SVM) is used to classify the scene of the video clips. For the multi-class classification, one-against-rest approach is used. The proposed algorithm is applied to the dataset of video clips which contains a part of the Hollywood2 dataset [2], a part of YouTube 8M dataset [3], and a data set of video clips from other movies and videos. The dataset contains 7000 video clips for 6 scene categories.

Citations (1)


... A typical structure of CNN is shown in Fig. 2. Usually, CNN consists of several types of layers which perform specialized information processing, e.g., convolution, activation, pooling, normalization, and so on. While it is famous that CNN has extremely high recognition performance in image classification problems, several studies showed that CNN also indicates high accuracy for various data that are not images but have two-dimensional shape [12,13,14]. As shown in [13], CNN functioned well to classify time series matrices which consist of OpenPose's outputs. ...

Reference:

Improving Accuracy and Real-Time Performance of Recognition Methods for Surgical Procedure Recognition
Action Classification Based on 2D Coordinates Obtained by Real-time Pose Estimation