Article

Semantics of Video Shots for Content-based Retrieval

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... This is because they do not take into account the spatial and temporal orientation of the extracted features. The method discussed in [9,10] uses color intensities to segment the action by manually selecting a region. Using this approach a region must be selected every time when the scene changes; this undesirably requires human intervention. ...
Article
Full-text available
Recognition of human actions is an emerging need. Various researchers have endeavored to provide a solution to this problem. Some of the current state-of-the-art solutions are either inaccurate or computationally intensive while others require human intervention. In this paper a sufficiently accurate while computationally inexpensive solution is provided for the same problem. Image moments which are translation, rotation, and scale invariant are computed for a frame. A dynamic neural network is used to identify the patterns within the stream of image moments and hence recognize actions. Experiments show that the proposed model performs better than other competitive models.
Article
Full-text available
Abstract Development of health information technology has had a dramatic impact to improve the efficiency and quality of medical care. Developing interoperable health information systems for healthcare providers has the potential to improve the quality and equitability of patient-centered healthcare. In this article, we describe an automated content-based medical video analysis and management service that provides convenience and ease in accessing the relevant medical video content without sequential scanning. The system facilitates effective temporal video segmentation and content-based visual information retrieval that enable a more reliable understanding of medical video content. The system is implemented as a Web- and mobile-based service and has the potential to offer a knowledge-sharing platform for the purpose of efficient medical video content access.
Conference Paper
The paper proposes a novel semi-automatic soft collaborative annotation scheme for video semantic indexing. To annotate video data effectively and accurately, a video collaborative soft annotation within users' judgment modeling is first proposed in this paper. We, then, introduce a semiautomatic annotation strategy which combines the active learning and self-training in order to reduce the annotators' effort. Experiments conducted in TRECVID benchmark show that the proposed approach significantly improves the performance of video annotation.
Conference Paper
Full-text available
Vast quantities of video data are distributed around the world every day. Video content owners would like to be able to automatically detect any use of their material, in any media or representation. We investigate techniques for identifying similar video content in large collections. Current methods are based on related technology, such as image retrieval, but the effectiveness of these techniques has not been demonstrated for the task of locating video clips that are derived from the same original. We propose a new method for locating video clips, shot-length detection, and compare it to methods based on image retrieval. We test the methods in a variety of contexts and show that they have different strengths and weaknesses. Our results show that the shot-based approach is promising, but is not yet sufficiently robust for practical application.
Conference Paper
Full-text available
This paper describes the contribution of the TZI to the shot detection task of the TREC 2003 video analysis track (TRECVID). The approach comprises a feature extraction step and a shot detec- tion step. In the feature extraction, three features are extracted: a frequency-domain approach based on FFT-features, a spatial-domain ap- proach based on changes in the image luminance values, and another spatial domain approach based on gray level histogram dierences. Shot boundary detection uses then adaptive thresholds based on all extracted features of the complete video. The final shot list is a combination of shots which result from an independent examination of all three features.
Article
Full-text available
Our shot boundary determination system consists of three components, including a FOI detector, a generalized CUT detector, and a long gradual transition detector. One support vector machine, taking score vector calculated with graph partition model as input, is used to detect CUT. Long gradual transition is determined by another three support vector machines with multi-resolution score vectors as input. After these detectors make decision successively, the locations of shot boundaries and the corresponding types are obtained. It is found in the experiments on development data that by tuning penalty ratio of loss of misclassifying the positive and the negative samples, it is possible to control the trade-off between precision and recall. 31 runs are generated from the same system with the 4 support vector machines being trained with different parameters. Among them, 10 runs are submitted for evaluation. And the results show that our system is among the best.
Article
Full-text available
0. STRUCTURED ABSTRACT Story segmentation 1. Briefly, what approach or combination of approaches did you test in each of your submitted runs? 1_kddi_ss_base1_5: "Baseline" method based on SVM, which discriminates shots that contain story boundaries. 1_kddi_ss_c+k1_4: Baseline + section-specialized segmentation (SS-S). 1_kddi_ss_all1_3: Baseline + SS-S + anchor shot segmentation (ASS) based on audio classification results 1_kddi_ss_all1_pfil_1: Baseline + SS-S + ASS and post-filtering (PF) based on audio classification results 1_kddi_ss_all2_pfil_2: Extended baseline + SS-S + ASS + PF based on audio classification results. 1_kddi_ss_all1nsp07_pfil_6: Baseline + SS-S + ASS + PF by HMM-based non-speech detection. 1_kddi_ss_all2nsp07_pfil_7: Extended baseline + SS-S + ASS + PF by HMM-based non-speech detection. 2_kddi_ss2_all1_pfil_8: Baseline + SS-S + ASS and PF based on "speech segment" information from LIMSI ASR results[1]. 2_kddi_ss2_all2_pfil_9: Extended baseline + SS-S + ASS and PF based on "speech segment" information from LIMSI ASR results. 3_kddi_ss3_10: Naive TextTiling based story segmentation based on LIMSI ASR data. 2. What if any significant differences (in terms of what measures) did you find among the runs? Overall, section-specialized segmentation worked effectively to detect story boundaries that were overlooked by the baseline method. Anchor shot segmentation enabled the detection of story boundaries that were impossible to detect by the baseline method. 3. Based on the results, can you estimate the relative contribution of each component of your system/approach to its effectiveness? Our estimation of the contribution of each component in our system is as the following: Section-specialized segmentation: Improved both recall and precision, especially for CNN. Anchor shot segmentation: Enabled the extraction of story boundaries that occur within a single shot, thus improved recall. Post-filtering: Successful in deleting some obviously erroneous story boundary candidates, but also mistakenly omits correct story boundaries. Improvement in terms of F-measure was scarce, if any. 4. Overall, what did you learn about runs/approaches and the research question(s) that motivated them? The major motivation of our participation was to develop a story segmentation method that can be used not only for segmentation of broadcast news, but also for video from non-news domain. By comparison with the results of the other official runs, we proved that the effective use of general low-level features achieves highly accurate story segmentation for news programs. Due to the generality of the extracted features, our method is theoretically applicable to segmentation of non-news video. Another notable point is that, also due to the generality of the features, it was fairly easy to develop various components, such as section-specialized segmentation, which contributed to the overall improvement of story segmentation accuracy. Shot boundary determination 1. Briefly, what approach or combination of approaches did you test in each of your submitted runs? kddi_labs_sb_run_07: "Baseline", which corresponds to the TRECVID 2003 approach with newly introduced edge features and a color layout feature. kddi_labs_sb_run_01: "Baseline" with post-processing for deleting non-CUT candidates, which is based on non-CUT learning method using development data. kddi_labs_sb_run_06: "Baseline" with post-processing for deleting non-CUT candidates (see above) and for adding OTH candidates, which is based on SVM. kddi_labs_sb_run_09: SVM-based method. Two SVMs were built: one based on color histograms, the other based on edge-energy. Results from the two SVMs were fused by another SVM. kddi_labs_sb_run_10: SVM-based method similar to kddi_labs_sb_run_09. This run fused the results from the two SVMs by linear classification. 2. What if any significant differences (in terms of what measures) did you find among the runs? Compared with our TRECVID 2003 approach, using edge features gives a significant improvement, especially for gradual shot boundary (GRAD) detection. Among the above three "Baseline" runs, there is no significant difference. The SVM-based methods actually achieved higher accuracy compared to the "Baseline" methods on cross-validation evaluation on TRECVID 2003 experiment data, but could not achieve high accuracy on TRECVID 2004 data. 3. Based on the results, can you estimate the relative contribution of each component of your system/approach to its effectiveness? Edge features contributes to improve recall of GRAD determination. The maximum improvement rate compared to the TRECVID 2003 method is approximately 20%. 4. Overall, what did you learn about runs/approaches and the research question(s) that motivated them? Basically, the "Baseline" approaches are based on compressed domain feature analysis, and in this 2nd attempt, it becomes clear that extracting edge features on compressed domain, i.e. from DC image, is easy way to enhance system performance, even though additional computational cost is very small. For the SVM-based approaches, the generated SVMs seem to have over-adapted to the TRECVID 2003 data, which we consider as the main cause of the poor results.
Article
Full-text available
TRECVID (TREC Video Retrieval Evaluation) is sponsored by NIST to encourage research in digital video indexing and retrieval. It was initiated in 2001 as a "video track" of TREC and became an independent evaluation in 2003. AT&T participated in three tasks in TRECVID 2006: shot boundary determination (SBD), search, and rushes exploitation. The proposed SBD algorithm contains a set of finite state machine (FSM) based detectors for pure cut, fast dissolve, fade in, fade out, dissolve, and wipe. Support vector machine (SVM) is applied to cut and dissolve detectors to further boost the SBD performance. AT&T collaborated with Columbia University in the search and rushes exploitation tasks. In this paper, we mainly focus on the SBD system and briefly introduce our effort on the search and the rushes exploitation. The AT&T SBD system is highly effective and its evaluation results are among the best.
Article
Full-text available
We propose a generic and robust framework for news video indexing which we founded on a broadcast news production model. We identify within this model four production phases, each providing useful metadata for annotation. In contrast to semiautomatic indexing approaches which exploit this information at production time, we adhere to an automatic data-driven approach. To that end, we analyze a digital news video using a separate set of multimodal detectors for each production phase. By combining the resulting production-derived features into a statistical classifier ensemble, the framework facilitates robust classification of several rich semantic concepts in news video; rich meaning that concepts share many similarities in their production process. Experiments on an archive of 120 hours of news video from the 2003 TRECVID benchmark show that a combined analysis of production phases yields the best results. In addition, we demonstrate that the accuracy of the proposed style analysis framework for classification of several rich semantic concepts is state-of-the-art.
Article
Full-text available
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
Article
The standard method for making the full content of audio and video material searchable is to annotate it with human-generated meta-data that describes the content in a way that the search can understand, as is done in the creation of multimedia CD-ROMs. However, for the huge amounts of data that could usefully be included in digital video and audio libraries, the cost of producing this meta-data is prohibitive. In the Informedia Digital Video Library, the production of the meta-data supporting the library interface is automated using techniques derived from artificial intelligence (AI) research. By applying speech recognition together with natural language processing, information retrieval, and image analysis, an interface has been produced that helps users locate the information they want, and navigate or browse the digital video library more effectively. Specific interface components include automatic titles, filmstrips, video skims, word location marking, and representative frames for shots. Both the user interface and the information retrieval engine within Informedia are designed for use with automatically derived meta-data, much of which depends on speech recognition for its production. Some experimental information retrieval results will be given, supporting a basic premise of the Informedia project: That speech recognition generated transcripts can make multimedia material searchable. The Informedia project emphasizes the integration of speech recognition, image processing, natural language processing, and information retrieval to compensate for deficiencies in these individual technologies.
Article
We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate a relevance model: probabilities of words in the relevant class. We propose a novel technique for estimating these probabilities using the query alone. We demonstrate that our technique can produce highly accurate relevance models, addressing important notions of synonymy and polysemy. Our experiments show relevance models outperforming baseline language modeling systems on TREC retrieval and TDT tracking tasks. The main contribution of this work is an effective formal method for estimating a relevance model with no training data.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.
Article
In texture classification and segmentation, the objective is to partition the given image into a set of homogeneous textured regions. This chapter presents schemes for texture classification and segmentation using features computed from Gabor-filtered images. The texture feature set is derived by filtering the image through a bank of modified Gabor kernels. The particular set of filters forms a multiresolution decomposition of the image. Although there are several viable options, including orthogonal wavelet transforms, Gabor wavelets are chosen for their desirable properties: Gabor functions achieve the theoretical minimum space frequency bandwidth product; a narrow-band Gabor function closely approximates an analytic function; the magnitude response of a Gabor function in the frequency domain is well behaved, having no side lobes; and Gabor functions appear to share many properties with the human visual system.
Conference Paper
This chapter presents a new framework of video analysis and associated techniques to automatically parse long programs, to extract story structures, and identify story units. Content-based browsing and navigation in digital video collections have been centered on sequential and linear presentation of images. To facilitate such applications, nonlinear and nonsequential access into video documents is essential, especially with long programs. For many programs, this can be achieved by identifying underlying story structures, which are reflected both by visual content and temporal organization of composing elements. The proposed analysis and representation contribute to the extraction of scenes and story units, each representing a distinct locale or event that cannot be achieved by shot boundary detection alone. Analysis is performed on MPEG-compressed video and without prior models. In addition, the building of story structure gives nonsequential and nonlinear access to a featured program and facilitates browsing and navigation. The result is a compact representation that serves as a summary of the story and allows hierarchical organization of video documents. Story units, which represent distinct events or locales from several types of video programs, have been successfully segmented and the results are promising. The video into the hierarchy of story units and scenes, clusters of similar shots, and shots at the lowest have been decomposed, which helps in further organization.
Article
This paper suggests a theoretical basis for identifying and classifying the kinds of subjects a picture may have, using previously developed principles of cataloging and classification, and concepts taken from the philosophy of art, from meaning in language, and from visual perception. The purpose of developing this theoretical basis is to provide the reader with a means for evaluating, adapting, and applying presently existing indexing languages, or for devising new languages for pictorial materials; this paper does not attempt to invent or prescribe a particular indexing language.
Article
Image and video indexing techniques are crucial in multimedia applications. A number of indexing techniques that operate in the pixel domain have been reported in the literature. The advent of compression standards has led to the proliferation of indexing techniques in the compressed domain. In this paper, we present a critical review of the compressed domain indexing techniques proposed in the literature. These include transform domain techniques using Fourier transform, cosine transform, Karhunen–Loeve transform, Subbands and wavelets, and spatial domain techniques using vector quantization and fractals. In addition, temporal indexing techniques using motion vectors are also discussed.
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
The comparison of two treatments generally falls into one of the following two categories: (a) we may have a number of replications for each of the two treatments, which are unpaired, or (b) we may have a number of paired comparisons leading to a series of differences, some of which may be positive and some negative. The appropriate methods for testing the significance of the differences of the means in these two cases are described in most of the textbooks on statistical methods.
Article
MPEG-7 Visual Standard specifies a set of descriptors that can be used to measure similarity in images or video. Among them, the Edge Histogram Descriptor describes edge distribution with a histogram based on local edge dis- tribution in an image. Since the Edge Histogram Descrip- tor recommended for the MPEG-7 standard represents only local edge distribution in the image, the matching per- formance for image retrieval may not be satisfactory. This paper proposes the use of global and semi-local edge histo- grams generated directly from the local histogram bins to increase the matching performance. Then, the global, semi- global, and local histograms of images are combined to measure the image similarity and are compared with the MPEG-7 descriptor of the local-only histogram. Since we exploit the absolute location of the edge in the image as well as its global composition, the proposed matching method can retrieve semantically similar images. Experiments on MPEG-7 test images show that the proposed method yields better retrieval performance by an amount of 0.04 in ANMRR, which shows a significant difference in visual in- spection.
Article
The TREC-8 Web Track defined ad hoc retrieval tasks over the 100 gigabyte VLC2 collection(Large Web Task) and a selected 2 gigabyte subset known as WT2g (Small Web Task). Here, theguidelines and resources for both tasks are described and results presented and analysed.Performance on the Small Web was strongly correlated with performance on the regular TRECAd Hoc task. Little benefit was derived from the use of link-based methods, for standard TRECmeasures on the WT2g collection. The...
Article
In meetings just prior to the 1997 AIC Congress in Kyoto, CIE TC1-37, chaired by M. Fairchild, established the CIE 1997 Interim Colour appearance Model (Simple Version), known as CIECAM97s. CIECAM97s was formally published in 1998 in CIE publication 131. CIE TC1-37 was dissolved shortly after publication of CIECAM97s at which time, a reportership, R1- 24 held by M. Fairchild, was established to monitor ongoing developments in color appearance modeling and notify CIE Division 1 if it became necessary to form a new TC to consider revision or replacement of CIECAM97s. In the four years between AIC Congresses, there has been much activity, both by individual researchers and within the CIE, aimed at furthering our understanding of color appearance models and deriving improved models for consideration. The aim of this paper is to summarize these activities, report on the current status of CIE efforts on color appearance models, and suggest what the future might hold for CIE color appearance models.
Patent
This patent relates to a method and means for recognizing a complex pattern in a picture. The picture is divided into framelets, each framelet being sized so that any segment of the complex pattern therewithin is essentially a straight line. Each framelet is scanned to produce an electrical pulse for each point scanned on the segment therewithin. Each of the electrical pulses of each segment is then transformed into a separate strnight line to form a plane transform in a pictorial display. Each line in the plane transform of a segment is positioned laterally so that a point on the line midway between the top and the bottom of the pictorial display occurs at a distance from the left edge of the pictorial display equal to the distance of the generating point in the segment from the left edge of the framelet. Each line in the plane transform of a segment is inclined in the pictorial display at an angle to the vertical whose tangent is proportional to the vertical displacement of the generating point in the segment from the center of the framelet. The coordinate position of the point of intersection of the lines in the pictorial display for each segment is determined and recorded. The sum total of said recorded coordinate positions being representative of the complex pattern. (AEC)
Article
In a full-text natural-language retrieval system, local feedback is the process of formulating a new improved search based on clustering terms from the documents returned in a previous search of any given query. Experiments were run on a database of US patents. It is concluded that in contrast to global clustering, where the size of matrices limits applications to small databases and improvements are doubtful, local clustering is practical also for large databases and appears to improve overall performance, especially if metrical constraints and weighting by proximity are embedded in the local feedback. The local methods adapt themselves to each individual search and produce useful searchonyms - terms which are ″synonymous″ in the context of one query. Searchonyms lead to new improved search formulations both via manual and via automated feedback.
Article
It is the intention of this book to provide an introduction to the special theory of relativity that is mathematically rigorous and yet spells out in considerable detail the physical significance of the mathematics. In addition to the material on kinematics, particle dynamics and electromagnetic fields that one would expect to find in any introduction to special relativity, the book contains careful treatment of many topics not ordinarily discussed at the elementary level. These include the Reversed Triangle Inequality, Zeeman's Theorem characterizing causal automorphisms as compositions of translations, dilations and orthochronous orthogonal transformations, Penrose's Theorem on the apparent shape of a relativistically moving sphere, the purely algebraic characterization of null and regular electromagnetic fields and an elementary introduction to the theory of spinors. The only prerequisite for this material is a solid course in linear algebra. This book offers a presentation of the special theory of relativity that is mathematically rigorous and treats, in addition to the menu of topics one is accustomed to finding in introductions to special relativity, a wide variety of results of more contemporary origin. The treatment presumes only a knowledge of linear algebra and, in two appendices, elementary point-set topology.
Article
Today a considerable amount of video data in multimedia databases requires sophisticated indices for its effective use. Manual indexing is the most effective method to do this, but it is also the slowest and the most expensive. Automated methods have then to be developed. This paper surveys several approaches and algorithms that have been recently proposed to automatically structure audio–visual data, both for annotation and access.Copyright 1999 Academic Press.
Article
A subjective scale for the measurement of pitch was constructed from determinations of the half-value of pitches at various frequencies. This scale differs from both the musical scale and the frequency scale, neither of which is subjective. Five observers fractionated tones of 10 different frequencies and the values were used to construct a numerical scale which is proportional to the perceived magnitude of subjective pitch. The close agreement of this pitch scale with an integration of the DL's for pitch shows that, unlike the DL's for loudness, all DL's for pitch are of uniform subjective magnitude. The agreement further implies that pitch and differential sensitivity to pitch are both rectilinear functions of extent on the basilar membrane, and that in cutting a pitch in half, the observer adjusts the tone until it stimulates a position half-way from the original locus to the apical end of the membrane. Measurement of the subjective size of musical intervals (such as octaves) in terms of the pitch scale shows that the intervals become larger as the frequency of the midpoint of the interval increases (except for very high tones). (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Methods for measuring rater agreement and making inferences about the accuracy of dichotomous ratings from agreement data are described. The first section presents a probability model related to latent class analysis that is applicable when ratings are based on a discrete trait. The second section extends these methods to situations in which ratings are based on a continuous trait, using a model related to signal detection theory and item response theory. The values obtained by these methods provide either direct or upper-bounds estimates of rating accuracy, depending upon the nature of the rating process. Formulas are shown for combining the opinions of multiple raters to classify cases with greater accuracy than simple majority or unanimous opinion decision rules allow. Additional technical refinements of the probabilty modeling approach are possible, and it promises to lead to many improvements in the ways that ratings by multiple raters are analyzed and used. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Using a new trichromatic colorimeter a series of colour matches through the spectrum has been made by ten observers. The results have been averaged and a mean set of trichromatic coefficients for the spectral colours derived. These results are compared with previous determinations made by König and Abney. The variations in the coefficients that have been found amongst the ten observers must, as a consequence of a new method of basing the trichromatic units, be attributed to variations in the process of reception, but their magnitude appears to be of a small order. On the other hand, there are big differences in the amount of the macular pigment in different eyes and probably some variation in its dominant hue. These variations have been investigated by matches on a standard white, results for 36 observers being given in the paper and a mean value determined. This value, combined with the mean spectral coefficients, has been used to compute an average locus for the spectral colours in the colour triangle, with white at the centre. Other points discussed in the paper include the technique of colour matching, the range of intensity over which matches remained valid, and variations of luminosity.
Article
An efficient method for the calculation of the interactions of a 2' factorial ex- periment was introduced by Yates and is widely known by his name. The generaliza- tion to 3' was given by Box et al. (1). Good (2) generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series. In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices, where m is proportional to log N. This results inma procedure requiring a number of operations proportional to N log N rather than N2. These methods are applied here to the calculation of complex Fourier series. They are useful in situations where the number of data points is, or can be chosen to be, a highly composite number. The algorithm is here derived and presented in a rather different form. Attention is given to the choice of N. It is also shown how special advantage can be obtained in the use of a binary computer with N = 2' and how the entire calculation can be performed within the array of N data storage locations used for the given Fourier coefficients. Consider the problem of calculating the complex Fourier series N-1 (1) X(j) = EA(k)-Wjk, j = 0 1, * ,N- 1, k=0
Article
This paper describes the shot boundary detection and determination system developed at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, used for the evaluation at TRECVID 2004. The system detects and determines the position of hard cuts, dissolves, fades, and wipes. It is very fast and has proved to have a very good detection performance. As input for our system, we use luminance pixel values of sub-sampled video data. The hard cut detector uses pixel and edge differences with an adaptive thresholding scheme. Flash detection and slow motion detection lower the false positive rate. Dissolve and fade detection is done with edge energy statistics, pixel and histogram differences, and a linearity measure. Wipe detection works with an evenness factor and double Hough transform. The difference between the submitted runs is basically only different threshold settings in the detectors, resulting in different recall and precision values.