Sicheng Zhao

Harbin Institute of Technology, Charbin, Heilongjiang Sheng, China

Are you Sicheng Zhao?

Claim your profile

Publications (8)8.03 Total impact

  • Sicheng Zhao · Hongxun Yao · Yanhao Zhang · Yasi Wang · Shaohui Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Content-based 3D object retrieval has wide applications in various domains, ranging from virtual reality to computer aided design and entertainment. With the rapid development of digitizing technologies, different views of 3D objects are captured, which requires for effective and efficient view-based 3D object retrieval (V3DOR) techniques. As each object is represented by a set of multiple views, V3DOR becomes a group matching problem. Most of state-of-the-art V3DOR methods use one single feature to describe a 3D object, which is often insufficient. In this paper, we propose a feature fusion method via multi-modal graph learning for view-based 3D object retrieval. Firstly, different visual features, including 2D Zernike moments, 2D Fourier descriptor and 2D Krawtchouk moments, are extracted to describe each view of a 3D object. Then the Hausdorff distance is computed to measure the similarity between two 3D objects with multiple views. Finally we construct multiple graphs based on different features and learn the optimized weights of each graph automatically for feature fusion task. Extensive experiments are conducted on the ETH-80 dataset and the National Taiwan University 3D model dataset. The results demonstrate the superior performance of the proposed method, as compared to the state-of-the-art approaches.
    Signal Processing 07/2015; 112. DOI:10.1016/j.sigpro.2014.09.038 · 2.24 Impact Factor
  • IET Computer Vision 06/2015; DOI:10.1049/iet-cvi.2014.0276 · 0.76 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: 3D depth data, especially dynamic 3D depth data, offer several advantages over traditional intensity videos for expressing objects׳ actions, such as being useful in low light levels, resolving the silhouette ambiguity of actions, and being color and texture invariant. With the wide popularity of somatosensory equipment (Kinect for example), more and more dynamic 3D depth data are shared on the Internet, which results in an urgent need to retrieve these data efficiently and effectively. In this paper, we propose a generalized strategy for dynamic 3D depth data matching and apply this strategy in action retrieval task. Firstly, an improved 3D shape context descriptor (3DSCD) is proposed to extract features of each static depth frame. Then we employ dynamic time warping (DTW) to measure the temporal similarity between two 3D dynamic depth sequences. Experimental results on our collected dataset consisting of 170 dynamic 3D depth video clips show that the proposed 3DSCD has a rich descriptive power on depth data and that the method using 3DSCD and DTW achieves high matching accuracy. Finally, to address the matching efficiency problem, we utilize the bag of word (BoW) model to quantize the 3DSCD of each static depth frame into visual word packages. So the original feature matching problem is simplified into a two-histogram matching problem. The results demonstrate the matching efficiency of our proposed method, while still maintaining high matching accuracy.
    Neurocomputing 03/2015; 151. DOI:10.1016/j.neucom.2014.03.092 · 2.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this paper was to address the problem of dense crowd event recognition in the surveillance video. Previous particle flow-based methods efficiently capture the convolutional motion in the crowded scene. However, the group-level description was rarely studied due to huge loss of group structure and intra-class variability. To address these issues, we present a novel crowd behavior representation called bag of trajectory graphs (BoTG). Firstly, we design a group-level representation beyond particle flow. From the observation that crowd particles are composed of atomic subgroups corresponding to informative behavior patterns, particle trajectories that simulate motion of individuals will be clustered to form groups. Secondly, we connect nodes in each group as a trajectory graph and propose 3 informative features to encode the graphs, namely, graph structure, group attribute, and dynamic motion, which characterize the structure, the motion within, and among the trajectory graphs. Finally, each clip of crowd event can be further described by BoTG as the occurrences of behavior patterns, which provides critical clues for categorizing specific crowd event. We conduct extensive experiments on public datasets for abnormality detection and event recognition. The results demonstrate the effectiveness of our BoTG on characterizing the group behaviors in dense crowd.
    Signal Image and Video Processing 12/2014; 8(S1):173-181. DOI:10.1007/s11760-014-0669-9 · 1.02 Impact Factor
  • Sicheng Zhao · Hongxun Yao · Xiaoshuai Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: Most previous works on video classification and recommendation were only based on video contents, without considering the affective analysis of viewers. In this paper, we presented a novel method to classify and recommend videos based on affective analysis, mainly on facial expression recognition of viewers, by fusing the spatio-temporal features. For spatial features, we integrate Haar-like features into compositional ones according to the features' correlation, and train a mid classifier. Then this process is embedded into the improved AdaBoost learning algorithm to obtain spatial features. And for temporal feature fusion, we adopt HDCRFs based on HCRFs by introducing a time dimension variable. The spatial features are embedded into HDCRFs to recognize facial expressions. Experiments on the Cohn-Kanada database show that the proposed method has a promising performance. Then viewers' changing facial expressions are collected frame by frame from the camera when they are watching videos. Finally, we draw affective curves which tell the process of affection changes. Through the curves, we segment each video into affective sections, classify videos into categories, and list recommendation scores. Experimental results on our collected database show that most subjects are satisfied with the classification and recommendation results.
    Neurocomputing 11/2013; 119:101-110. DOI:10.1016/j.neucom.2012.04.042 · 2.01 Impact Factor
  • Sendong Zhao · Ding Wang · Sicheng Zhao · Wu Yang · Chunguang Ma
    [Show abstract] [Hide abstract]
    ABSTRACT: A new Man-in-the-Middle (MitM) attack called SSLStrip poses a serious threat to the security of secure socket layer protocol. Although some researchers have presented some schemes to resist such attack, until now there is still no practical countermeasure. To withstand SSLStrip attack, in this paper we propose a scheme named Cookie-Proxy, including a secure cookie protocol and a new topology structure. The topology structure is composed of a proxy pattern and a reverse proxy pattern. Experiment results and formal security proof using SVO logic show that our scheme is effective to prevent SSLStrip attack. Besides, our scheme spends little extra time cost and little extra communication cost comparing with previous secure cookie protocols.
    Proceedings of the 14th international conference on Information and Communications Security; 10/2012
  • Source
    Sicheng Zhao · Hongxun Yao · Xiaoshuai Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel affective video classification method based on facial expression recognition by learning the spatio-temporal feature fusion of actors' and viewers' facial expressions. For spatial features, we integrate Haar-like features into compositional ones according to the features' correlation, and train a mid classifier during the period. Then this process is embedded into improved AdaBoost learning algorithm to obtain spatial features. And for temporal feature fusion, we adopt hidden dynamic conditional random fields (HDCRFs) based on HCRFs by introducing time dimension variable. Finally spatial features are embedded into HDCRFs to recognize facial expressions. Experiments on the well-known Cohn-Kanada database show that the proposed method has a promising recognition performance. And affective classification experimental results on our own videos show that most subjects are satisfied with the classification results.
    Image and Graphics (ICIG), 2011 Sixth International Conference on; 09/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most previous works on video indexing and recommendation were only based on the content of video itself, without considering the affective analysis of viewers, which is an efficient and important way to reflect viewers' attitudes, feelings and evaluations of videos. In this paper, we propose a novel method to index and recommend videos based on affective analysis, mainly on facial expression recognition of viewers. We first build a facial expression recognition classifier by embedding the process of building compositional Haar-like features into hidden conditional random fields (HCRFs). Then we extract viewers' facial expressions frame by frame through the videos, collected from the camera when viewers are watching videos, to obtain the affections of viewers. Finally, we draw the affective curve which tells the process of affection changes. Through the curve, we segment each video into affective sections, give the indexing result of the videos, and list recommendation points from views' aspect. Experiments on our collected database from the web show that the proposed method has a promising performance.
    Proceedings of the 19th International Conference on Multimedea 2011, Scottsdale, AZ, USA, November 28 - December 1, 2011; 01/2011