Conference Paper

A Hybrid Approach to Improving Semantic Extraction of News Video

Carnegie Mellon University, USA
DOI: 10.1109/ICSC.2007.68 Conference: Semantic Computing, 2007. ICSC 2007. International Conference on
Source: DBLP

ABSTRACT In this paper we describe a hybrid approach to improving semantic extraction from news video. Experiments show the value of careful parameter tuning, exploiting multiple feature sets and multilingual linguistic resources, applying text retrieval approaches for image features, and establishing synergy between multiple concepts through undirected graphical models. No single approach provides a consistently better result for every concept detection, which suggests that extracting video semantics should exploit multiple resources and techniques rather than a single approach.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the development of content-based multimedia services, the personalization task has become increasingly important. There is a need for semantic information knowledge, extracted from multimedia streams, in order to achieve the benefits of automatic matching user preferences with multimedia content meaning. Text-based classification techniques may be used in closed-captions captured from news programs, which can define the subject of each piece of news. Latent Semantic Indexing (LSI)-based systems are widely used for classification tasks; however, some drawbacks of the technique may impose limitations, mainly when considering multiple collections. In this paper, we compare an LSI implementation with a Genetic Algorithm (GA)-based system which was designed with the same objective. The classification is made based on high level semantic information extracted from the news video streams. We show that the GA alternative achieves better results when used to automatically classify pieces of news video programs.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic video annotation is to bridge the semantic gap and facilitate concept based video retrieval by detecting high level concepts from video data. Recently, utilizing context information has emerged as an important direction in such domain. In this paper, we present a novel video annotation refinement approach by utilizing extrinsic semantic context extracted from video subtitles and intrinsic context among candidate annotation concepts. The extrinsic semantic context is formed by identifying a set of key terms from video subtitles. The semantic similarity between those key terms and the candidate annotation concepts is then exploited to refine initial annotation results, while most existing approaches utilize textual information heuristically. Similarity measurements including Google distance and WordNet distance have been investigated for such a refinement purpose, which is different with approaches deriving semantic relationship among concepts from given training datasets. Visualness is also utilized to discriminate individual terms for further refinement. In addition, Random Walk with Restarts (RWR) technique is employed to perform final refinement of the annotation results by exploring the inter-relationship among annotation concepts. Comprehensive experiments on TRECVID 2005 dataset have been conducted to demonstrate the effectiveness of the proposed annotation approach and to investigate the impact of various factors.
    Multimedia Tools and Applications 12/2013; DOI:10.1007/s11042-012-1060-x · 1.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation.
    Multimedia Tools and Applications 03/2013; 72(2). DOI:10.1007/s11042-013-1363-6 · 1.06 Impact Factor


Available from