Richang Hong

National University of Singapore, Singapore, Singapore

Are you Richang Hong?

Claim your profile

Publications (11)4.66 Total impact

  • Article: Interactive Video Indexing With Statistical Active Learning.
    IEEE Transactions on Multimedia. 01/2012; 14:17-27.
  • Article: Joint Learning of Labels and Distance Metric.
    IEEE Transactions on Systems, Man, and Cybernetics, Part B. 01/2010; 40:973-978.
  • Conference Proceeding: iComics: automatic conversion of movie into comics.
    Proceedings of the 18th International Conference on Multimedea 2010, Firenze, Italy, October 25-29, 2010; 01/2010
  • Source
    Conference Proceeding: Dynamic captioning: video accessibility enhancement for hearing impairment.
    Proceedings of the 18th International Conference on Multimedea 2010, Firenze, Italy, October 25-29, 2010; 01/2010
  • Source
    Chapter: Learning Cooking Techniques from YouTube
    [show abstract] [hide abstract]
    ABSTRACT: Cooking is a human activity with sophisticated process. Underlying the multitude of culinary recipes, there exist a set of fundamental and general cooking techniques, such as cutting, braising, slicing, and sauntering, etc. These skills are hard to learn through cooking recipes, which only provide textual instructions about certain dishs. Although visual instructions such as videos are more direct and intuitive for user to learn these skills, they mainly focus on certain dishes but not general cooking techniques. In this paper, we explore how to leverage YouTube video collections as a source to automatically mine videos of basic cooking techniques. The proposed approach first collects a group of videos by searching YouTube, and then leverages the trajectory bag of words model to represent human motion. Furthermore, the approach clusters the candidate shots into motion similar groups, and selects the most representative cluster and shots of the cooking technique to present to the user. The testing on 22 cooking techniques shows the feasibility of our proposed framework. KeywordsCooking techniques-video mining-YouTube
    12/2009: pages 713-718;
  • Article: Joint learning of labels and distance metric.
    [show abstract] [hide abstract]
    ABSTRACT: Machine learning algorithms frequently suffer from the insufficiency of training data and the usage of inappropriate distance metric. In this paper, we propose a joint learning of labels and distance metric (JLLDM) approach, which is able to simultaneously address the two difficulties. In comparison with the existing semi-supervised learning and distance metric learning methods that focus only on label prediction or distance metric construction, the JLLDM algorithm optimizes the labels of unlabeled samples and a Mahalanobis distance metric in a unified scheme. The advantage of JLLDM is multifold: 1) the problem of training data insufficiency can be tackled; 2) a good distance metric can be constructed with only very few training samples; and 3) no radius parameter is needed since the algorithm automatically determines the scale of the metric. Extensive experiments are conducted to compare the JLLDM approach with different semi-supervised learning and distance metric learning methods, and empirical results demonstrate its effectiveness.
    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society 12/2009; 40(3):973-8. · 3.01 Impact Factor
  • Source
    Article: Unified Video Annotation via Multigraph Learning
    [show abstract] [hide abstract]
    ABSTRACT: Learning-based video annotation is a promising approach to facilitating video retrieval and it can avoid the intensive labor costs of pure manual annotation. But it frequently encounters several difficulties, such as insufficiency of training data and the curse of dimensionality. In this paper, we propose a method named optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme. We show that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs. Therefore, these factors can be simultaneously dealt with by learning with multiple graphs, namely, the proposed OMG-SSL approach. Different from the existing graph-based semi-supervised learning methods that only utilize one graph, OMG-SSL integrates multiple graphs into a regularization framework in order to sufficiently explore their complementation. We show that this scheme is equivalent to first fusing multiple graphs and then conducting semi-supervised learning on the fused graph. Through an optimization approach, it is able to assign suitable weights to the graphs. Furthermore, we show that the proposed method can be implemented through a computationally efficient iterative process. Extensive experiments on the TREC video retrieval evaluation (TRECVID) benchmark have demonstrated the effectiveness and efficiency of our proposed approach.
    IEEE Transactions on Circuits and Systems for Video Technology 06/2009; · 1.65 Impact Factor
  • Article: Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation.
    IEEE Transactions on Multimedia. 01/2009; 11:465-476.
  • Article: Semi-supervised kernel density estimation for video annotation.
    Computer Vision and Image Understanding. 01/2009; 113:384-396.
  • Source
    Conference Proceeding: Salience Preserving Multi-Focus Image Fusion.
    Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007, July 2-5, 2007, Beijing, China; 01/2007
  • Conference Proceeding: Lazy Learning Based Efficient Video Annotation.
    [show abstract] [hide abstract]
    ABSTRACT: Eager learning methods, such as SVM, are widely applied in video annotation task for their substantial performance. However, their computational costs are usually prohibitive when a large dataset is faced, especially when annotating a large lexicon of semantic concepts. This paper proposes a video annotation scheme based on lazy learning, and shows that this scheme is much more computationally efficient and flexible. Based on a recently proposed improved Parzen window method, we provide a lazy learning based video annotation scheme. After building the pairwise relationships in dataset, the annotation can be finished rapidly for each concept. Experiments show that the proposed method is much more efficient than SVM while retaining comparable performance.
    Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007, July 2-5, 2007, Beijing, China; 01/2007