Article

Video retrieval within a browsing framework using keyframes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In SL0, sparse Homotopy, sparse PALM, and HER (Li et al., 2015), Heesch et al. (2004) and SCFV (Araujo et al., 2015) methods in the CC_WEB_VIDEO dataset have been shown. In this experiment, the best performance 84.82%. ...
... 10 rate for the proposed method via Intensity-HDWT with the center or random queries is being tested. In comparing the performances, the P(1) of the proposed method via Intensity-HDWT, sparse SL0, sparse DALM sparse Homotopy, sparse PALM, HER,Heesch et al. (2004), and SCFV methods values fall in the 95.67] respectively. The average is calculated by taking the mean value of the P(1) from the video categories. ...
Article
Full-text available
Video retrieval has recently attracted a lot of research attention due to the exponential growth of video datasets and the internet. Content based video retrieval (CBVR) systems are very useful for a wide range of applications with several type of data such as visual, audio and metadata. In this paper, we are only using the visual information from the video. Shot boundary detection, key frame extraction, and video retrieval are three important parts of CBVR systems. In this paper, we have modified and proposed new methods for the three important parts of our CBVR system. Meanwhile, the local and global color, texture, and motion features of the video are extracted as features of key frames. To evaluate the applicability of the proposed technique against various methods, the P(1) metric and the CC_WEB_VIDEO dataset are used. The experimental results show that the proposed method provides better performance and less processing time compared to the other methods.
... Details of and evaluative studies on each of the three techniques have been presented elsewhere (e.g. [5], [7], [4]). This paper will place greater emphasis on interface design. ...
... The network structure was used extensively for TRECVID 2003 and proved instrumental for the success of the interactive runs. In one interactive run we restricted interaction to browsing only, and although the performance remained below that of other runs that employed some form of query-by-example, it was comparable to a large number of other interactive runs submitted by the other participants, and was significantly above the performance of our automated search run that used a fixed set of weights without any further user interaction (for details see [4]). ...
Conference Paper
This paper describes interfaces for a suite of three recently developed techniques to facilitate content-based access to large image and video repositories. Two of these techniques involve content-based retrieval while the third technique is centered around a new browsing structure and forms a useful complement to the traditional query-by- example paradigm. Each technique is associated with its own user in- terface and allows for a different set of user interactions. The user can move between interfaces whilst executing a particular search and thus may combine the particular strengths of the different techniques. We il- lustrate each of the techniques using topics from the TRECVID 2003 contest.
... Inter-frame distances are calculated using often simple vector-distance measures to compare corresponding histograms [31]. Localised histograms, used in conjunction with additional features such as edge-detection, perform well when applied in the trecvid environment [1,8]. ...
... The twin-comparison algorithm first proposed by Zhang et al. [31] is the basis of several proposed approaches for detection of gradual transitions [8,25,30]. Here, a low threshold is applied to detect groups of frames that belong to a possible gradual transition. The accumulative inter-frame distance is calculated for these frames. ...
Conference Paper
Full-text available
Segmenting digital video into its constituent basic semantic entities, or shots, is an important step for effective management and retrieval of video data. Recent automated techniques for detecting transitions between shots are highly effective on abrupt transitions. However, automated detection of gradual transitions, and the precise determination of the corresponding start and end frames, remains problematic. In this paper, we present a gradual transition detection approach based on average frame similarity and adaptive thresholds. We report good detection results on the TREC video track collections - particularly for dissolves and fades - and very high accuracy in identifying transition boundaries. Our technique is a valuable new tool for transition detection.
... The second stage consists of describing the shot contents at various levels. The most instinctive method for shot segmentation uses inter-frame difference metric while calculating the pixel difference between two consecutive frames like histograms [5], [6], [7], applying transformations to the frame data [8], [9] or motionbased methods [10], [11], [12]. In the second stage, most approaches use low levels criteria such as color, texture and motion to describe the video contents so as to extract the representative frame. ...
Article
Full-text available
In this paper, we propose a video summarization algorithm by multiple extractions of key frames in each shot. This algorithm is based on the k partition algorithms. We choose the ones based on k-medoid clustering methods so as to find the best representative object for each partitions. In order to find the number of partition (i.e. the number of representative frames of each shot), we introduce a quantity based on the distance between frames and on the size of the video shot. This algorithm, which is applicable to all types of descriptors, consists of extracting key frames by similarity clustering according to the given index (histogram features, motion features, texture features, or a combination of these features). In our proposal, the distance between frames is calculated using a fast full search block matching algorithm based on the frequency domain. The proposed approach is computationally tractable and robust with respect to sudden changes in mean intensity within a shot. Additionally, this approach produces different key frames even in the presence of a large motion. The experiment results show that our algorithm extracts multiple representative frames in each video shot without visual redundancy, and thus it is an effective tool for video indexing and retrieval.
... features and the size of the collection, these results are quite respectable and demonstrate that browsing in general and the proposed structure in particular have a potential for CBIR that should not be left unexploited. A summary of the results is given inTable 2 and more details can be found in [10]. ...
Conference Paper
This paper describes a novel interaction technique to support content-based image search in large image collections. The idea is to represent each image as a vertex in a directed graph. Given a set of image features, an arc is established between two images if there exists at least one combination of features for which one image is retrieved as the nearest neighbour of the other. Each arc is weighted by the proportion of feature combinations for which the nearest neighour relationship holds. By thus integrating the retrieval results over all possible feature combinations, the resulting network helps expose the semantic richness of images and thus provides an elegant solution to the problem of feature weighting in content-based image retrieval. We give details of the method used for network generation and describe the ways a user can interact with the structure. We also provide an analysis of the network’s topology and provide quantitative evidence for the usefulness of the technique.
... Because the set of neighbours in the graph is visually rather heterogenous, users can quickly navigate to different parts of the collection. We have previously shown [9] that the resulting networks exhibit so called small-world properties, a combination of low average distance between vertices even for large collections, and a high degree of local clustering, and have employed the structures very successfully in the search task of TRECVID [8]. In this paper, we continue our topological analysis by looking specifically at how semantically related images are distributed across the network. ...
Conference Paper
Given a collection of images and a set of image features, we can build what we have previously termed NN k networks by representing images as vertices of the network and by establishing arcs between any two images if and only if one is most similar to the other for some weighted combination of features. An earlier analysis of its structural properties revealed that the networks exhibit small-world properties, that is a small distance between any two vertices and a high degree of local structure. This paper extends our analysis. In order to provide a theoretical explanation of its remarkable properties, we investigate explicitly how images belonging to the same semantic class are distributed across the network. Images of the same class correspond to subgraphs of the network. We propose and motivate three topological properties which we expect these subgraphs to possess and which can be thought of as measures of their compactness. Measurements of these properties on two collections indicate that these subgraphs tend indeed to be highly compact.
... As the vocabulary used for automatically annotating images is inherently limited, we use NN k image networks to enable unlimited exploration of the image collection based on inter-image visual similarity. NN k networks have proven to be a powerful browsing methodology for large collections of diverse images [3]. The idea is to connect an image to all those images in the collection to which it is most similar under some instantiation of a parametrised distance metric (where parameters correspond to feature weights). ...
Conference Paper
This paper outlines the technical details of a prototype system for searching and browsing over a million images from the World Wide Web using their visual contents. The system relies on two modalities for accessing images — automated image annotation and NNk image network browsing. The user supplies the initial query in the form of one or more keywords and is then able to locate the desired images more precisely using a browsing interface.
... Such local information supports a greedy search process whereby users decide on the most favourable direction by selecting the image that comes closest to their target. That this method seems to work remarkably well in NN k networks [38] might lie in the fact that at each step users can select from a varied set of neighbours. The process is similar to that of genetic algorithms. ...
Article
The problem of content based image retrieval (CBIR) has traditionally been investigated within a framework that emphasises the explicit formulation of a query: users initiate an automated search for relevant images by submitting an image or draw a sketch that exemplifies their information need. Often, relevance feedback is incorporated as a post-retrieval step for optimising the way evidence from different visual features is combined. While this sustained methodological focus has helped CBIR to mature, it has also brought out its limitations more clearly: There is often little support for exploratory search and scaling to very large collections is problematic. Moreover, the assumption that users are always able to formulate an appropriate query is questionable. An effective, albeit much less studied, method of accessing image collections based on visual content is that of browsing. The aim of this survey paper is to provide a structured overview of the different models that have been explored over the last one to two decades, to highlight the particular challenges of the browsing approach and to focus attention on a few interesting issues that warrant more intense research.
... See [2] for more details on computing these texture features. For the color features we compute an HSV (Hue Saturation Value) 3D histogram [3] such that there are 8 bins for hue and 5 each for value and saturation. The lowest value bin is not partitioned into hues since they are not easy for people to distinguish. ...
We present a Bayesian framework for content-based image retrieval which models the distribution of color and texture features within sets of related images. Given a userspecified text query (e.g. "penguins") the system first extracts a set of images, from a labelled corpus, corresponding to that query. The distribution over features of these images is used to compute a Bayesian score for each image in a large unlabelled corpus. Unlabelled images are then ranked using this score and the top images are returned. Although the Bayesian score is based on computing marginal likelihoods, which integrate over model parameters, in the case of sparse binary data the score reduces to a single matrix-vector multiplication and is therefore extremely efficient to compute. We show that our method works surprisingly well despite its simplicity and the fact that no relevance feedback is used. We compare different choices of features, and evaluate our results using human subjects.
Conference Paper
In the medical domain, digital images are produced in ever-increasing quantities and used for diagnostics and therapy. However, it is not sufficient to capture the semantic content of an image and difficult to provide good medical image retrieval results according to the predefined categories in the medical domain for less using the medical knowledge. This paper proposes a semantic approach by focusing on the medical image and report fusion. Firstly the feature vectors of color and texture are extracted by analyzing the image low-level features. Secondly, a set of disjoint semantic tokens with appearance in medical images is selected to define a visual and medical vocabulary. Here we use the diagnosis report from the doctor to represent each token in the medical domain. Finally the low-level features are used as input vector of SVM to training. And the semantic correlation from low-level features to high-level features is constructed according to the Visual and Medical vocabulary. This semantic correlation method has been used in retrieval experiment, which concerns the brain CT images. And the result demonstrates the improvement, effectiveness, and efficiency achieved by the proposed framework.
Conference Paper
Full-text available
Segmentation is the rst step in managing data for many information retrieval tasks. Automatic audio transcriptions and digital video footage are typically continuous data sources that must be pre-processed for segmentation into logical entities that can be stored, queried, and retrieved. Shot boundary detec- tion is a common low-level video segmentation tech- nique, where a video stream is divided into shots that are typically composed of similar frames. In this paper, we propose a new technique for nding cuts | abrupt transitions that delineate shots | that combines evidence from a xed size window of video frames. We experimentally show that our techniques are accurate using the well-known trec experimental testbed.
ResearchGate has not been able to resolve any references for this publication.