Conference Paper

A Hybrid Approach to Improving Semantic Extraction of News Video

Carnegie Mellon University, USA
DOI: 10.1109/ICSC.2007.68 Conference: Semantic Computing, 2007. ICSC 2007. International Conference on
Source: DBLP


In this paper we describe a hybrid approach to improving semantic extraction from news video. Experiments show the value of careful parameter tuning, exploiting multiple feature sets and multilingual linguistic resources, applying text retrieval approaches for image features, and establishing synergy between multiple concepts through undirected graphical models. No single approach provides a consistently better result for every concept detection, which suggests that extracting video semantics should exploit multiple resources and techniques rather than a single approach.

Full-text preview

Available from:
  • Source
    • "Some approaches tried to use Ontology to detect visual concepts. For example, in [4], Ontology was built by learning concepts' relationships based on analyzing co-occurrences between concepts. Other direction was to use association mining techniques to indicate the existence of high-level concept from simultaneously existence of other concepts, as an attempt to enhance accuracy of semantic concepts detection [5]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The rapidly increasing amount of video collections, available on the web or via broadcasting, motivated research towards building intelligent tools for searching, rating, indexing and retrieval purposes. Establishing a semantic representation of visual data, mainly in textual form, is one of the important tasks. The time needed for building and maintaining Ontologies and knowledge, especially for wide domain, and the efforts for integrating several approaches emphasize the need for unified generic commonsense knowledgebase for visual applications. In this paper, we propose a novel commonsense knowledgebase that forms the link between the visual world and its semantic textual representation. We refer to it as "VisualNet". VisualNet is obtained by our fully automated engine that constructs a new unified structure concluding the knowledge from two commonsense knowledgebases, namely WordNet and ConceptNet. This knowledge is extracted by performing analysis operations on WordNet and ConceptNet contents, and then only useful knowledge in visual domain applications is considered. Moreover, this automatic engine enables this knowledgebase to be developed, updated and maintained automatically, synchronized with any future enhancement on WordNet or ConceptNet. Statistical properties of the proposed knowledgebase, in addition to an evaluation of a sample application results, show coherency and effectiveness of the proposed knowledgebase and its automatic engine.
    Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on; 12/2009
  • Source
    • "Many approaches tried to use Ontology in event detection in various forms. In [16], Ontology was built by concepts' relationships' learning based on analysing co-occurrences between concepts. Other approaches have directly included visual knowledge in Multimedia domain-specific Ontology, in the form of low-level visual descriptors for concept instances, to perform semantic annotation [1]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce a novel framework for automatic Semantic Video Annotation. As this framework detects possible events occurring in video clips, it forms the annotating base of a video search engine. To achieve this purpose, the system has to able to operate on uncontrolled wide-domain videos. Thus, all layers have to be based on generic features. The aim is to help bridge the “semantic gap“, which is the difference between the low-level visual features and the human's perception, by finding videos with similar visual events, then analyzing their free text annotation to find the best description for this new video using commonsense knowledgebases. Experiments were performed on wide-domain video clips from the TRECVID 2005 BBC rush standard database. Results from these experiments show promising integrity between those two layers in order to find expressing annotations for the input video. These results were evaluated based on retrieval performance.
    Signal and Image Processing Applications (ICSIPA), 2009 IEEE International Conference on; 12/2009
  • Source
    • "In the case of news programmes, a scene could be seen as a piece of news, which is represented by semantic-related shots. Some work (Xu et al., 2006; Li et al., 2004; Snoek and Worring, 2005; Hauptmann et al., 2007) use audiovisual information from the video stream in order to extract semantic data. This high-level information is then used to automatically relate shots with the same semantic meaning in a specific domain. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Personalisation tasks require the use of semantic information, extracted from multimedia streams, in order to achieve the benefits of automatic matching user preferences with multimedia content meaning. Text-based classification techniques may be used in closed-captions captured from news programmes, which can define the subject of each piece of news. Latent Semantic Indexing (LSI)-based systems are widely used for information retrieval purposes, and may be adapted to classification tasks; however, some drawbacks of the technique may impose limitations, mainly when considering multiple collections. In this paper, we compare an LSI implementation with a Genetic Algorithm (GA)-based system which was designed with the same objective. We show that the GA alternative achieves better results when used to automatically classify pieces of news video programmes.
    International Journal of Advanced Media and Communication 01/2009; 3(4):383-403. DOI:10.1504/IJAMC.2009.028709
Show more