A Rule-Based Video Annotation System

Dept. of Electron. Eng., Univ. of London, UK
IEEE Transactions on Circuits and Systems for Video Technology (Impact Factor: 2.62). 06/2004; 14(5):622 - 633. DOI: 10.1109/TCSVT.2004.826764
Source: IEEE Xplore


A generic system for automatic annotation of videos is introduced. The proposed approach is based on the premise that the rules needed to infer a set of high-level concepts from low-level descriptors cannot be defined a priori. Rather, knowledge embedded in the database and interaction with an expert user is exploited to enable system learning. Underpinning the system at the implementation level is preannotated data that dynamically creates signification links between a set of low-level features extracted directly from the video dataset and high-level semantic concepts defined in the lexicon. The lexicon may consist of words, icons, or any set of symbols that convey the meaning to the user. Thus, the lexicon is contingent on the user, application, time, and the entire context of the annotation process. The main system modules use fuzzy logic and rule mining techniques to approximate human-like reasoning. A rule-knowledge base is created on a small sample selected by the expert user during the learning phase. Using this rule-knowledge base, the system automatically assigns keywords from the lexicon to nonannotated video clips in the database. Using common low-level video representations, the system performance was assessed on a database containing hundreds of broadcasting videos. The experimental evaluation showed robust and high annotation accuracy. The system architecture offers straightforward expansion to relevance feedback and autonomous learning capabilities.

Download full-text


Available from: Ebroul Izquierdo, Mar 25, 2015
  • Source
    • "The fuzzy system has been used to model the human decision making process [29]. Dorado et al. [12] used a fuzzy rule-based system to approximate human-like reasoning in video annotation problems, and Yamashita [30] proposed an effective support system for students making career choices using fuzzy reasoning and fuzzy modeling. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Information summarization and retrieval are significant research topics associated with recent advancements in sensor devices, data compression and storage techniques, and high-speed internet. As a result of these advances, it is possible for people to collect huge life-logs. Video is one of the most important life information sources. This paper describes a method of summarizing video life-logs in an office environment with a multi-camera system. Previously, multi-camera systems have been used to track moving objects or to cover a wide area. This paper focuses on capturing diverse views of each office event using a multi-camera system with several cameras observing the same area. The summarization process includes camera view selection and event sequence summarization. View selection produces a single event sequence from multiple event sequences by selecting an optimal view at each time, for which domain knowledge based on the elements of the office environment and rules from questionnaire surveys have been used. Summarization creates a summary sequence from whole sequences by using a fuzzy rule-based system to approximate human decision making. The user-entered degrees of interest in objects, persons, and events are used for a personalized summarization. We confirmed experimentally that the proposed method provides promising results.
    Preview · Article · Dec 2011 · Information Systems
  • Source
    • "Others, like in [13], use association mining techniques to indicate the existence of one high-level concept from the simultaneously existence of other concepts, trying to enhance accuracy of semantic concepts detection. In [14], decision trees were used to infer high-level concepts from low-level video visual features. Also in [3] a rule learning process, depending on both low-level and middlelevel features, build a decision tree to mark rare events and concepts. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce a novel framework for automatic Semantic Video Annotation. As this framework detects possible events occurring in video clips, it forms the annotating base of a video search engine. To achieve this purpose, the system has to able to operate on uncontrolled wide-domain videos. Thus, all layers have to be based on generic features. The aim is to help bridge the “semantic gap“, which is the difference between the low-level visual features and the human's perception, by finding videos with similar visual events, then analyzing their free text annotation to find the best description for this new video using commonsense knowledgebases. Experiments were performed on wide-domain video clips from the TRECVID 2005 BBC rush standard database. Results from these experiments show promising integrity between those two layers in order to find expressing annotations for the input video. These results were evaluated based on retrieval performance.
    Full-text · Conference Paper · Dec 2009
  • Source
    • "• Rule-Based Annotation: Another type of solutions, namely rule-based methods, is to discover the associated concepts hidden in the sequential images. Dorado et al. [8] combined fuzzy logic and rule mining techniques to effectively assign keywords to the video clips. Nevertheless, annotating video by using association rules directly may lead to high errors since the rules may not be sufficient. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.
    Preview · Article · Mar 2008 · IEEE Transactions on Multimedia
Show more