Sicheng Zhao

Harbin Institute of Technology, Charbin, Heilongjiang Sheng, China

Are you Sicheng Zhao?

Claim your profile

Publications (17)16.98 Total impact

  • Yasi Wang · Hongxun Yao · Sicheng Zhao
    [Show abstract] [Hide abstract]
    ABSTRACT: Auto-encoder — a tricky three-layered neural network, known as auto-association before, constructs the “building block” of deep learning, which has been demonstrated to achieve good performance in various domains. In this paper, we try to investigate the dimensionality reduction ability of auto-encoder, and see if it has some kind of good property that might accumulate when being stacked and thus contribute to the success of deep learning.
    No preview · Article · Nov 2015 · Neurocomputing
  • Yinghao Huang · Hongxun Yao · Sicheng Zhao · Yanhao Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent years have witnessed great progress in image deblurring. However, as an important application case, the deblurring of face images has not been well studied. Most existing face deblurring methods rely on exemplar set construction and candidate matching, which not only cost much computation time but also are vulnerable to possible complex or exaggerated face variations. To address the aforementioned problems, we propose a novel face deblurring method by integrating classical L 0 deblurring approach with face landmark detection. A carefully tailored landmark detector is used to detect the main face contours. Then the detected contours are used as salient edges to guide the blind image deconvolution. Extensive experimental results demonstrate that the proposed method can better handle various complex face poses, shapes and expressions while greatly reducing computation time, as compared with existing state-of-the-art approaches.
    No preview · Article · Oct 2015 · Multimedia Tools and Applications
  • Sendong Zhao · Ting Liu · Sicheng Zhao · Yiheng Chen · Jian-Yun Nie
    [Show abstract] [Hide abstract]
    ABSTRACT: Causality is an important type of relation which is crucial in numerous tasks, such as predicting future events, generating scenario, question answering, textual entailment and discourse comprehension. Therefore, causality extraction is a fundamental task in text mining. Many efforts have been dedicated to extracting causality from texts utilizing patterns, constraints and machine learning techniques. This paper presents a new Restricted Hidden Naive Bayes model to extract causality from texts. Besides some commonly used features, such as contextual features, syntactic features, position features, we also utilize a new category feature of causal connectives. This new feature is obtained from the tree kernel similarity of sentences containing connectives. In previous studies, the features have been usually assumed to be independent, which is not the case in reality. The advantage of our model lies in its ability to cope with partial interactions among features so as to avoid over-fitting problem on Hidden Naive Bayes model, especially the interaction between the connective category and the syntactic structure of sentences. Evaluation on a public dataset shows that our method goes beyond all the baselines.
    No preview · Article · Oct 2015 · Neurocomputing
  • Sicheng Zhao · Hongxun Yao · Yanhao Zhang · Yasi Wang · Shaohui Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Content-based 3D object retrieval has wide applications in various domains, ranging from virtual reality to computer aided design and entertainment. With the rapid development of digitizing technologies, different views of 3D objects are captured, which requires for effective and efficient view-based 3D object retrieval (V3DOR) techniques. As each object is represented by a set of multiple views, V3DOR becomes a group matching problem. Most of state-of-the-art V3DOR methods use one single feature to describe a 3D object, which is often insufficient. In this paper, we propose a feature fusion method via multi-modal graph learning for view-based 3D object retrieval. Firstly, different visual features, including 2D Zernike moments, 2D Fourier descriptor and 2D Krawtchouk moments, are extracted to describe each view of a 3D object. Then the Hausdorff distance is computed to measure the similarity between two 3D objects with multiple views. Finally we construct multiple graphs based on different features and learn the optimized weights of each graph automatically for feature fusion task. Extensive experiments are conducted on the ETH-80 dataset and the National Taiwan University 3D model dataset. The results demonstrate the superior performance of the proposed method, as compared to the state-of-the-art approaches.
    No preview · Article · Jul 2015 · Signal Processing
  • [Show abstract] [Hide abstract]
    ABSTRACT: Along with the rapid development of digital information technology, video surveillance systems have been widely used in numerous public places, such as squares, shopping malls and banks, to monitor crowd in case of anomalous events. Meanwhile, great challenges have been posed to worldwide researchers because the analysis of the exponentially growing crowd activity data is an arduous task. In this paper, we develop a novel unsupervised crowd activity discovery algorithm aiming to automatically explore latent action patterns among crowd activities and partition them into meaningful clusters. Inspired by the computational model of human vision system, we present a spatio-temporal saliency-based representation to simulate visual attention mechanism and encode human-focused components in an activity stream. Combining with feature pooling, we can obtain a more compact and robust activity representation. Based on affinity matrix of activities, N-cut is performed to generate clusters with meaningful activity patterns. We carry out experiments on our HIT-BJUT dataset and the UMN dataset. The experimental results demonstrate that the proposed unsupervised discovery method is fast and capable of automatically mining meaningful activities from large-scale and unbalanced video data with mixed crowd activities.
    No preview · Article · Jul 2015 · Neurocomputing
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this study, the authors propose a collaborative composition model for automatically recommending suitable positions and poses in the scene of photography taken by amateurs. By analysing aesthetic-aware features, the authors' strategy jointly takes attention and geometry composition into account to learn the aesthetic manifestation knowledge of professional photographers. Firstly, aesthetic composition representation exploits the strength of visual saliency to explicitly encode the spatial correlation of the professional photos. Secondly, ℓ2 regularised least square is adopted to constrain the representation coefficients, which provides a fast solution in selecting aesthetic candidates collaboratively. In addition, a novel confidence measure scheme is further designed based on reconstruction errors and the reference photos are updated adaptively according to the composition rules. Both qualitative and quantitative evaluations show that the model performs well for the portrait photographing recommendation.
    No preview · Article · Jun 2015 · IET Computer Vision
  • [Show abstract] [Hide abstract]
    ABSTRACT: 3D depth data, especially dynamic 3D depth data, offer several advantages over traditional intensity videos for expressing objects׳ actions, such as being useful in low light levels, resolving the silhouette ambiguity of actions, and being color and texture invariant. With the wide popularity of somatosensory equipment (Kinect for example), more and more dynamic 3D depth data are shared on the Internet, which results in an urgent need to retrieve these data efficiently and effectively. In this paper, we propose a generalized strategy for dynamic 3D depth data matching and apply this strategy in action retrieval task. Firstly, an improved 3D shape context descriptor (3DSCD) is proposed to extract features of each static depth frame. Then we employ dynamic time warping (DTW) to measure the temporal similarity between two 3D dynamic depth sequences. Experimental results on our collected dataset consisting of 170 dynamic 3D depth video clips show that the proposed 3DSCD has a rich descriptive power on depth data and that the method using 3DSCD and DTW achieves high matching accuracy. Finally, to address the matching efficiency problem, we utilize the bag of word (BoW) model to quantize the 3DSCD of each static depth frame into visual word packages. So the original feature matching problem is simplified into a two-histogram matching problem. The results demonstrate the matching efficiency of our proposed method, while still maintaining high matching accuracy.
    No preview · Article · Mar 2015 · Neurocomputing
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this paper was to address the problem of dense crowd event recognition in the surveillance video. Previous particle flow-based methods efficiently capture the convolutional motion in the crowded scene. However, the group-level description was rarely studied due to huge loss of group structure and intra-class variability. To address these issues, we present a novel crowd behavior representation called bag of trajectory graphs (BoTG). Firstly, we design a group-level representation beyond particle flow. From the observation that crowd particles are composed of atomic subgroups corresponding to informative behavior patterns, particle trajectories that simulate motion of individuals will be clustered to form groups. Secondly, we connect nodes in each group as a trajectory graph and propose 3 informative features to encode the graphs, namely, graph structure, group attribute, and dynamic motion, which characterize the structure, the motion within, and among the trajectory graphs. Finally, each clip of crowd event can be further described by BoTG as the occurrences of behavior patterns, which provides critical clues for categorizing specific crowd event. We conduct extensive experiments on public datasets for abnormality detection and event recognition. The results demonstrate the effectiveness of our BoTG on characterizing the group behaviors in dense crowd.
    No preview · Article · Dec 2014 · Signal Image and Video Processing
  • Sicheng Zhao · Hongxun Yao · You Yang · Yanhao Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Images can convey rich emotions to viewers. Recent research on image emotion analysis mainly focused on affective image classification, trying to find features that can classify emotions better. We concentrate on affective image retrieval and investigate the performance of different features on different kinds of images in a multi-graph learning framework. Firstly, we extract commonly used features of different levels for each image. Generic features and features derived from elements-of-art are extracted as low-level features. Attributes and interpretable principles-of-art based features are viewed as mid-level features, while semantic concepts described by adjective noun pairs and facial expressions are extracted as high-level features. Secondly, we construct single graph for each kind of feature to test the retrieval performance. Finally, we combine the multiple graphs together in a regularization framework to learn the optimized weights of each graph to efficiently explore the complementation of different features. Extensive experiments are conducted on five datasets and the results demonstrate the effectiveness of the proposed method.
    No preview · Article · Nov 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Emotions can be evoked in humans by images. Most previous works on image emotion analysis mainly used the elements-of-artbased low-level visual features. However, these features are vulnerable and not invariant to the different arrangements of elements. In this paper, we investigate the concept of principles-of-art and its influence on image emotions. Principles-of-art-based emotion features (PAEF) are extracted to classify and score image emotions for understanding the relationship between artistic principles and emotions. PAEF are the unified combination of representation features derived from different principles, including balance, emphasis, harmony, variety, gradation, and movement. Experiments on the International Affective Picture System (IAPS), a set of artistic photography and a set of peer rated abstract paintings, demonstrate the superiority of PAEF for affective image classification and regression (with about 5% improvement on classification accuracy and 0.2 decrease in mean squared error), as compared to the stateof-the-art approaches. We then utilize PAEF to analyze the emotions of master paintings, with promising results.
    No preview · Article · Nov 2014
  • Sicheng Zhao · Hongxun Yao · Fanglin Wang · Xiaolei Jiang · Wei Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Playing appropriate music when watching images can make the images vivid and bring people into their intrinsic world. In this paper, we propose to musicalize images based on their emotions. Most of previous works on image emotion analysis mainly used elements-of-art based low-level visual features, which are vulnerable to the arrangements of elements. Here we propose to extract visual features, inspired by the concept of principles-of-art, to recognize image emotions. To enrich the descriptive power, a dimensional perspective is introduced to emotion modeling. Experiments on the IAPS dataset demonstrate the superiority of the proposed method in comparison to the state-of-the-art methods for emotion regression. The music in MST dataset with approximate emotions to the recognized image emotions is selected to musicalize these images. The user study results show its effectiveness and popularity of the image musicalization method.
    No preview · Article · Sep 2014
  • Fanglin Wang · Shuhan Qi · Ge Gao · Sicheng Zhao · Xiangyu Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent years have shown us the quick development of social network. For companies, microblog platform is more and more important as one source to disseminate brand information and monitor their development. Compared with the frequently used text information existing in traditional media, microblog platform provides information about brands in more types such as images and other related information forms. According to the statistics, microblogs posted on social network contain more and more percentage of images. Hence how to recognize logos in images from social network is of high value. To address this problem, we propose a novel learning-based logo detection method with social network information assistance. A new dense histogram type feature is proposed to classify logo and non-logo image patches. To increase the detection precision, social network content is analyzed and employed to do filtering to reduce detection window candidates. Through the evaluation on large-scale data collected from Sina Weibo platform, the proposed method is demonstrated effective.
    No preview · Article · Jun 2014 · Multimedia Systems
  • Sicheng Zhao · Hongxun Yao · Xiaoshuai Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: Most previous works on video classification and recommendation were only based on video contents, without considering the affective analysis of viewers. In this paper, we presented a novel method to classify and recommend videos based on affective analysis, mainly on facial expression recognition of viewers, by fusing the spatio-temporal features. For spatial features, we integrate Haar-like features into compositional ones according to the features' correlation, and train a mid classifier. Then this process is embedded into the improved AdaBoost learning algorithm to obtain spatial features. And for temporal feature fusion, we adopt HDCRFs based on HCRFs by introducing a time dimension variable. The spatial features are embedded into HDCRFs to recognize facial expressions. Experiments on the Cohn-Kanada database show that the proposed method has a promising performance. Then viewers' changing facial expressions are collected frame by frame from the camera when they are watching videos. Finally, we draw affective curves which tell the process of affection changes. Through the curves, we segment each video into affective sections, classify videos into categories, and list recommendation scores. Experimental results on our collected database show that most subjects are satisfied with the classification and recommendation results.
    No preview · Article · Nov 2013 · Neurocomputing
  • [Show abstract] [Hide abstract]
    ABSTRACT: The explosion of multimedia contents has resulted in a great demand of video presentation. While most previous works focused on presenting certain type of videos or summarizing videos by event detection, we propose a novel method to present general videos of different genres based on affective content analysis. We first extract rich audio-visual affective features and select discriminative ones. Then we map effective features into corresponding affective states in an improved categorical emotion space using hidden conditional random fields (HCRFs). Finally we draw affective curves which tell the types and intensities of emotions. With the curves and related affective visualization techniques, we select the most affective shots and concatenate them to construct affective video presentation with a flexible and changeable type and length. Experiments on representative video database from the web demonstrate the effectiveness of the proposed method.
    No preview · Chapter · Jan 2013
  • Sendong Zhao · Ding Wang · Sicheng Zhao · Wu Yang · Chunguang Ma
    [Show abstract] [Hide abstract]
    ABSTRACT: A new Man-in-the-Middle (MitM) attack called SSLStrip poses a serious threat to the security of secure socket layer protocol. Although some researchers have presented some schemes to resist such attack, until now there is still no practical countermeasure. To withstand SSLStrip attack, in this paper we propose a scheme named Cookie-Proxy, including a secure cookie protocol and a new topology structure. The topology structure is composed of a proxy pattern and a reverse proxy pattern. Experiment results and formal security proof using SVO logic show that our scheme is effective to prevent SSLStrip attack. Besides, our scheme spends little extra time cost and little extra communication cost comparing with previous secure cookie protocols.
    No preview · Conference Paper · Oct 2012
  • Source
    Sicheng Zhao · Hongxun Yao · Xiaoshuai Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel affective video classification method based on facial expression recognition by learning the spatio-temporal feature fusion of actors' and viewers' facial expressions. For spatial features, we integrate Haar-like features into compositional ones according to the features' correlation, and train a mid classifier during the period. Then this process is embedded into improved AdaBoost learning algorithm to obtain spatial features. And for temporal feature fusion, we adopt hidden dynamic conditional random fields (HDCRFs) based on HCRFs by introducing time dimension variable. Finally spatial features are embedded into HDCRFs to recognize facial expressions. Experiments on the well-known Cohn-Kanada database show that the proposed method has a promising recognition performance. And affective classification experimental results on our own videos show that most subjects are satisfied with the classification results.
    Preview · Conference Paper · Sep 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most previous works on video indexing and recommendation were only based on the content of video itself, without considering the affective analysis of viewers, which is an efficient and important way to reflect viewers' attitudes, feelings and evaluations of videos. In this paper, we propose a novel method to index and recommend videos based on affective analysis, mainly on facial expression recognition of viewers. We first build a facial expression recognition classifier by embedding the process of building compositional Haar-like features into hidden conditional random fields (HCRFs). Then we extract viewers' facial expressions frame by frame through the videos, collected from the camera when viewers are watching videos, to obtain the affections of viewers. Finally, we draw the affective curve which tells the process of affection changes. Through the curve, we segment each video into affective sections, give the indexing result of the videos, and list recommendation points from views' aspect. Experiments on our collected database from the web show that the proposed method has a promising performance.
    Full-text · Conference Paper · Jan 2011