-
[show abstract]
[hide abstract]
ABSTRACT: Segmentation algorithms traditionally employ low-level features to divide images into different regions that show a certain degree of homogeneity. However, low-level features, spatial or temporal, are not always reliable when processing real-world video sequences, because of issues like illuminations or complex backgrounds. Furthermore, real world objects can be composed of different regions with heterogeneous features. Although the inclusion of motion can mitigate some of these effects, many problems are still present. This paper proposes the utilization of some spatio-temporal mid-level features that are related, on the one hand, to geometric properties of real objects and, on the other, to well-known motion patterns. Specifically, the proposed algorithm uses a mid-level module that controls the subsequent segmentation using these kinds of features. Some experiments and evaluations show that the inclusion of mid-level features can help to obtain perceptually more meaningful segmentations, thus resulting in regions that are closer to semantic concepts.
Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on; 11/2008
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper we present two fusion methods for the task of high-level feature detection in multimedia content. Successful approaches to high-level feature detection typically leverage the techniques learned from Machine Learning utilized through ensemble architectures to achieve strong performance. However these approaches whilst successful are computationally expensive, and depending on the task require the use of significant computational resources. We propose two fusion methods that aim to combine the output of an initial basic machine learning approach with a lower-quality information source in order to gain diversity in the classified results whilst only requiring modest computing resources.
Content-Based Multimedia Indexing, 2007. CBMI '07. International Workshop on; 07/2007
-
P. Wilkins, T. Adamek,
D. Byrne,
G. J.F.Jones,
H. Lee,
G. Keenan,
K. McGuinness,
N. E. O'Connor,
A. F. Smeaton,
A. Amin,
Z. Obrenovic,
R. Benmokhtar,
E. Galmar,
B. Huet,
S. Essid,
R. Landais,
G F. Vallet
[show abstract]
[hide abstract]
ABSTRACT: In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high- level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results.
Our high-level feature submission utilized multi-modal low-level features which included visual, audio and tempo- ral elements. Specific concept detectors (such as Face de- tectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination.
This year we also participated in interactive search, sub- mitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a ‘shot’ based interface, where the results from a query were presented as a ranked list of shots. The second interface was ‘broadcast’ based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level fea- ture submission as well as low-level visual features.
TRECVid Workshop; 01/2007
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we discuss the criteria that should be satisfied by a descriptor for nonrigid shapes with a single closed contour. We then propose a shape representation method that fulfills these criteria. In the proposed approach, contour convexities and concavities at different scale levels are represented using a two-dimensional (2-D) matrix. The representation can be visualized as a 2-D surface, where "hills" and "valleys" represent contour convexities and concavities, respectively. The optimal matching of two shape representations is achieved using dynamic programming and a dissimilarity measure is defined based on this matching. The proposed algorithm is very efficient and invariant to several kinds of transformations including some articulations and modest occlusions. The retrieval performance of the approach is illustrated using the MPEG-7 shape database, which is one of the most complete shape databases currently available. Our experiments indicate that the proposed representation is well suited for object indexing and retrieval in large databases. Furthermore, the representation can be used as a starting point to obtain more compact descriptors.
IEEE Transactions on Circuits and Systems for Video Technology 06/2004; · 1.65 Impact Factor
-
P. Wilkins,
D. Byrne,
G Jones,
H Lee,
G. Keenan,
K. McGuinness,
N. O'Connor,
N. O'Hare,
A.F. Smeaton, T. Adamek, [......],
A. Cobet,
T. Sikora,
P. Praks,
D. Hannah,
M. Halvey,
F. Hopfgartner,
R. Villa,
P. Punitha,
A. Goyal,
J.M. Jose
[show abstract]
[hide abstract]
ABSTRACT: In this paper we describe K-Space’s participation in
TRECVid 2008 in the interactive search task. For 2008
the K-Space group performed one of the largest interactive
video information retrieval experiments conducted
in a laboratory setting. We had three institutions participating
in a multi-site multi-system experiment. In
total 36 users participated, 12 each from Dublin City
University (DCU, Ireland), University of Glasgow (GU,
Scotland) and Centrum Wiskunde and Informatica (CWI,
the Netherlands). Three user interfaces were developed,
two from DCU which were also used in 2007 as well as
an interface from GU. All interfaces leveraged the same
search service. Using a latin squares arrangement, each
user conducted 12 topics, leading in total to 6 runs per
site, 18 in total. We officially submitted for evaluation 3
of these runs to NIST with an additional expert run using
a 4th system. Our submitted runs performed around
the median. In this paper we will present an overview of
the search system utilized, the experimental setup and a
preliminary analysis of our results.
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper we present an overview of a software platform that has been developed within the aceMedia project, termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 experimental Model (XM), with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images.
Integration of Knowledge, Semantics and Digital Media Technology, 2005. EWIMT 2005. The 2nd European Workshop on the (Ref. No. 2005/11099);
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a new method for segmentation of images into large regions that reflect the real world objects present in a scene. It explores the feasibility of utilizing spatial configuration of regions and their geometric properties (the so-called syntactic visual features by C. Ferran Bennstrom and JR Casas (2004)) for improving the correspondence of segmentation results produced by the well-known recursive shortest spanning tree (RSST) algorithm by O.J. Morris et al. (1986) to semantic objects present in the scene. The main contribution of this paper is a novel framework for integration of evidence from multiple sources with the region merging process based on the Dempster-Shafer (DS) theory by P. Smets (1988) that allows integration of sources providing evidence with different accuracy and reliability. Extensive experiments indicate that the proposed solution limits formation of regions spanning more than one semantic object.
Image Processing, 2007. ICIP 2007. IEEE International Conference on;
-
L. Goldmann, T. Adamek,
P. Vajda,
M. KARAMAN,
R. Mörzinger,
E. Galmar,
T. Sikora,
N. O'Connor,
T. Ha-Minh,
T. Ebrahimi,
P. Schallauer,
B. Huet
[show abstract]
[hide abstract]
ABSTRACT: Spatial region (image) segmentation is a fundamental step for many computer vision applications. Although many methods have been proposed, less work has been done in developing suitable evaluation methodologies for comparing different approaches. The main problem of general purpose segmentation evaluation is the dilemma between objectivity and generality. Recently, figure ground segmentation evaluation has been proposed to solve this problem by defining an unambiguous ground truth using the most salient foreground object. Although the annotation of a single foreground object is less complex than the annotation of all regions within an image, it is still quite time consuming, especially for videos. A novel framework incorporating background subtraction for automatic ground truth generation and different foreground evaluation measures is proposed, that allows to effectively and efficiently evaluate the performance of image segmentation approaches. The experiments show that the objective measures are comparable to the subjective assessment and that there is only a slight difference between manually annotated and automatically generated ground truth.