-
TRECVID 2011 Workshop, Gaithersburg, MD, USA; 12/2011
-
[show abstract]
[hide abstract]
ABSTRACT: In this work the contribution of automatically-extracted (thus, imperfect) video structural semantics towards improving interactive video retrieval is examined. First, the automatic extraction of video structural semantics, i.e. the decomposition of the video into scenes that correspond to the different sub-stories or high-level events, is performed. Then, these are introduced to the interactive video retrieval paradigm. Finally, their potential contribution is experimentally evaluated. To this end, different members of a family of scene segmentation algorithms are applied to an extensive professional video collection coming from the TRECVID benchmarking activity, subsequently, a large number of user interactions with a retrieval system that exploits these structural semantics is simulated. The experimental results document the contribution of state-of-the-art automatically-extracted video structural semantics to the efficient and effective interactive video retrieval.
Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on; 10/2011
-
[show abstract]
[hide abstract]
ABSTRACT: In this letter, mixture subclass discriminant analysis (MSDA) that alleviates two shortcomings of subclass discriminant analysis (SDA) is proposed. In particular, it is shown that for data with Gaussian homoscedastic subclass structure a) SDA does not guarantee to provide the discriminant subspace that minimizes the Bayes error, and, b) the sample covariance matrix can not be used as the minimization metric of the discriminant analysis stability criterion (DSC). Based on this analysis MSDA modifies the objective function of SDA and utilizes a novel partitioning procedure to aid discrimination of data with Gaussian homoscedastic subclass structure. Experimental results confirm the improved classification performance of MSDA.
IEEE Signal Processing Letters 06/2011; · 1.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a probabilistic approach to combining spatial context with visual and co-occurrence information for semantic image analysis is presented. Overall, the examined image is segmented and subsequently an initial classification of the resulting image regions to semantic concepts is performed based solely on visual information. Then, a Genetic Algorithm (GA) is introduced for deciding on the optimal semantic image interpretation, realizing image analysis as a global optimization problem. The fundamental novelty of this work is that the GA incorporates in its evolutionary procedure a set of Bayesian Networks (BNs), which probabilistically learn the impact of the available spatial, visual and co-occurrence information on the final outcome for every possible pair of semantic concepts. Experimental results on two publicly available datasets demonstrate the efficiency of the proposed approach.
Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes the use of feature tracks for the detection of concepts in video, particularly dynamic concepts. Feature tracks are defined as sets of local interest points found in different frames of a video shot that exhibit spatio-temporal and visual continuity, defining a trajectory in the 2D+Time space. The extraction of feature tracks and the selection and representation of an appropriate subset of them allow the generation of a Bag-of-Spatiotemporal-Words model for the shot, which facilitates capturing the dynamics of video content. The experimental evaluation of the proposed approach highlights how the selection of such feature tracks for the definition of the Bag-of-Spatiotemporal-Words model enhances the results of traditional keyframe-based concept detection techniques.
Image Processing (ICIP), 2010 17th IEEE International Conference on; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a joint content-event model for indexing multimedia data is proposed. The event part of the model follows a number of formal principles to represent several aspects of real-life events, whereas the content part is used to describe the decomposition of any type of multimedia data to content segments. In contrast to other event models for multimedia indexing, the proposed model treats events as first class entities and provides a referencing mechanism to link real-life event elements with content segments at multiple granularity levels. This referencing mechanism has been defined with the objective to facilitate the automatic enrichment of event elements with information extracted by automatic analysis of content segments, enabling event-centric multimedia indexing in large-scale multimedia collections.
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This work examines the possibility of exploiting, for the purpose of video segmentation to scenes, semantic information coming from the analysis of the visual modality. This information, in contrast to the low-level visual features typically used in previous approaches, is obtained by application of trained visual concept detectors such as those developed and evaluated as part of the TRECVID High-Level Feature Extraction Task. A large number of non-binary detectors is used for defining a high dimensional semantic space. In this space, each shot is represented by the vector of detector confidence scores, and the similarity of two shots is evaluated by defining an appropriate shot semantic similarity measure. Evaluation of the proposed approach is performed on two test datasets, using baseline concept detectors trained on a dataset completely different from the test ones. The results show that the use of such semantic information, which we term "visual soft semantics'', contributes to improved video decomposition to scenes.
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on; 10/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents the video retrieval engine VERGE, which combines indexing, analysis and retrieval techniques in various modalities (i.e. textual, visual and concept search). The functionalities of the search engine are demonstrated through the supported user interaction modes.
Content-Based Multimedia Indexing (CBMI), 2010 International Workshop on; 07/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This paper builds upon previous work on local interest point detection and description to propose the extraction and representation of novel Local Invariant Feature Tracks (LIFT). These features compactly capture not only the spatial attributes of 2D local regions, as in SIFT and related techniques, but also their long-term trajectories in time. This and other desirable properties of LIFT allow the generation of Bags-of-Spatiotemporal-Words models that facilitate capturing the dynamics of video content, which is necessary for detecting high-level video features that by definition have a strong temporal dimension. Preliminary experimental evaluation and comparison of the proposed approach reveals promising results.
Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on; 05/2010
-
[show abstract]
[hide abstract]
ABSTRACT: This work deals with the problem of automatic temporal segmentation of a video into elementary semantic units known as scenes. Its novelty lies in the use of high-level audio information in the form of audio events for the improvement of scene segmentation performance. More specifically, the proposed technique is built upon a recently proposed audio-visual scene segmentation approach that involves the construction of multiple scene transition graphs (STGs) that separately exploit information coming from different modalities. In the extension of the latter approach presented in this work, audio event detection results are introduced to the definition of an audio-based scene transition graph, while a visual-based scene transition graph is also defined independently. The results of these two types of STGs are subsequently combined. The application of the proposed technique to broadcast videos demonstrates the usefulness of audio events for scene segmentation.
Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on; 05/2010
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a graphical modeling-based approach to semantic video analysis is presented for jointly realizing modality fusion and temporal context exploitation. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for every modality. Subsequently, an integrated Bayesian Network (BN) is introduced for simultaneously performing information fusion and temporal contextual knowledge exploitation, contrary to the usual practice of performing each task separately. The final outcome of the overall video analysis approach is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach in the domain of news broadcast video are presented.
Image Processing (ICIP), 2009 16th IEEE International Conference on; 12/2009
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, an approach to semantic video analysis that is based on the statistical processing and representation of the motion signal is presented. Overall, the examined video is temporally segmented into shots and for every resulting shot appropriate motion features are extracted; using these, hidden Markov models (HMMs) are employed for performing the association of each shot with one of the semantic classes that are of interest. The novel contributions of this paper lie in the areas of motion information processing and representation. Regarding the motion information processing, the kurtosis of the optical flow motion estimates is calculated for identifying which motion values originate from true motion rather than measurement noise. Additionally, unlike the majority of the approaches of the relevant literature that are mainly limited to global- or camera-level motion representations, a new representation for providing local-level motion information to HMMs is also presented. It focuses only on the pixels where true motion is observed. For the selected pixels, energy distribution-related information, as well as a complementary set of features that highlight particular spatial attributes of the motion signal, are extracted. Experimental results, as well as comparative evaluation, from the application of the proposed approach in the domains of Tennis , News and Volleyball broadcast video, and Human Action video demonstrate the efficiency of the proposed method.
IEEE Transactions on Circuits and Systems for Video Technology 11/2009; · 1.65 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, two approaches to utilizing contextual information in semantic image analysis are presented and comparatively evaluated. Both approaches make use of spatial context in the form of fuzzy directional relations. The first one is based on a Genetic Algorithm (GA), which is employed in order to decide upon the optimal semantic image interpretation by treating semantic image analysis as a global optimization problem. On the other hand, the second method follows a Binary Integer Programming (BIP) technique for estimating the optimal solution. Both spatial context techniques are evaluated with several different combinations of classifiers and low-level features, in order to demonstrate the improvements attained using spatial context in a number of different image analysis schemes.
Image Analysis for Multimedia Interactive Services, 2009. WIAMIS '09. 10th Workshop on; 06/2009
-
[show abstract]
[hide abstract]
ABSTRACT: Binary relevance (BR) learns a single binary model for each different label of multi-label data. It has linear complexity with respect to the number of labels, but does not take into account label correlations and may fail to accurately predict label combinations and rank labels according to relevance with a new instance. Stacking the models of BR in order to learn a model that associates their output to the true value of each label is a way to alleviate this problem. In this paper we propose the pruning of the models participating in the stacking process, by explicitly measuring the degree of label correlation using the phi coefficient. Exploratory analysis of phi shows that the correlations detected are meaningful and useful. Empirical evaluation of the pruning approach shows that it leads to substantial reduction of the computational cost of stacking and occasional improvements in predictive performance.
Proceedings of the ECML/PKDD 2009 Workshop on Learning from Multi-Label Data (MLD 2009), Bled, Slovenia; 01/2009
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a motion-based approach for detecting high-level semantic events in video sequences is presented. Its main characteristic is its generic nature, i.e. it can be directly applied to any possible domain of concern without the need for domain-specific algorithmic modifications or adaptations. For realizing event detection, the examined video sequence is initially segmented into shots and for every resulting shot appropriate motion features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing the association of each shot with one of the high-level semantic events that are of interest in any given domain. Regarding the motion feature extraction procedure, a new representation for providing local-level motion information to HMMs is presented, while motion characteristics from previous frames are also exploited. Experimental results as well as comparative evaluation from the application of the proposed approach in the domain of news broadcast video are presented.
Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on; 11/2008
-
[show abstract]
[hide abstract]
ABSTRACT: Shot segmentation provides the basis for almost all high-level video content analysis approaches, validating it as one of the major prerequisites for efficient video semantic analysis, indexing and retrieval. The successful detection of both gradual and abrupt transitions is necessary to this end. In this paper a new gradual transition detection algorithm is proposed, that is based on novel criteria such as color coherence change that exhibit less sensitivity to local or global motion than previously proposed ones. These criteria, each of which could serve as a standalone gradual transition detection approach, are then combined using a machine learning technique, to result in a meta-segmentation scheme. Besides significantly improved performance, advantage of the proposed scheme is that there is no need for threshold selection, as opposed to what would be the case if any of the proposed features were used by themselves and as is typically the case in the relevant literature. Performance evaluation and comparison with four other popular algorithms reveals the effectiveness of the proposed technique.
Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on; 11/2008
-
[show abstract]
[hide abstract]
ABSTRACT: Shot boundaries provide the basis for almost all high-level video content analysis approaches, validating it as one of the major prerequisites for efficient video indexing and retrieval in large video databases. The successful detection of both gradual and abrupt transitions is necessary to this end. In this paper a new gradual transition detection algorithm is proposed, based on novel features exhibiting less sensitivity to local or global motion than previously proposed ones. These features, each of which could serve as a stand-alone transition detection approach, are then combined using a machine learning technique, to result in a meta-segmentation scheme. Besides significantly improved performance, advantage of the proposed scheme is that there is no need for threshold selection, as opposed to what would be the case if any of the proposed features were used by themselves and as is typically the case in the relevant literature. Comparison of the proposed approach with four popular algorithms of the literature reveals the significantly improved performance of it.
Content-Based Multimedia Indexing, 2008. CBMI 2008. International Workshop on; 07/2008
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a complete architecture for knowledge-assisted cross-media analysis of News-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately, and proposes a fusion technique based on the particular characteristics of News-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic video annotations.
Content-Based Multimedia Indexing, 2008. CBMI 2008. International Workshop on; 07/2008
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, four individual approaches to region classification for knowledge-assisted semantic image analysis are presented and comparatively evaluated. All of the examined approaches realize knowledge-assisted analysis via implicit knowledge acquisition, i.e. are based on machine learning techniques such as support vector machines (SVMs), self organizing maps (SOMs), genetic algorithm (GA)and particle swarm optimization (PSO). Under all examined approaches, each image is initially segmented and suitable low-level descriptors are extracted for every resulting segment. Then, each of the aforementioned classifiers is applied to associate every region with a predefined high-level semantic concept. An appropriate evaluation framework has been employed for the comparative evaluation of the above algorithms under varying experimental conditions.
Image Analysis for Multimedia Interactive Services, 2008. WIAMIS '08. Ninth International Workshop on; 06/2008
-
Q Zhang,
M Corvaglia,
S Aksoy,
U Naci,
N Adami,
N Aginako,
A Alatan,
L A Alexandre,
P Almeida,
Y Avrithis, [......],
P Mylonas,
S Nikolopoulos,
T Piatrik,
A M G Pinheiro,
B Reljin,
E Spyrou,
G Tolias,
S Vrochidis,
G Yakın,
G Zajic
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we give an overview of the four tasks submitted to TRECVID 2007 by COST292. In shot boundary (SB) detection task, four SB detectors have been developed and the results are merged using two merging algorithms. The framework developed for the high-level feature extraction task comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a Bayesian classifier trained with a "bag of subregions". The third system uses a multi-modal classifier based on SVMs and several descriptors. The fourth system uses two image classifiers based on ant colony optimisation and particle swarm optimisation respectively. The system submitted to the search task is an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. Finally, the rushes task submission is based on a video summarisation and browsing system comprising two different interest curve algorithms and three features.
03/2008;