Repetition density-based approach for TV program extraction

Conference Paper · May 2009with24 Reads
DOI: 10.1109/WIAMIS.2009.5031463 · Source: DBLP
Conference: 10th Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2009, London, United Kingdom, May 6-8, 2009
Abstract
This paper addresses the problem of automatic TV broad- casted program extraction. It consists firstly of precisely de- termining the start and the end of each broadcasted TV pro- gram, and then of properly giving them a name. The extracted programs can be used to build novel services like TV-on- Demand. The proposed solution is based on the density study of repeated audiovisual sequences. This study allows to sort out most of the inter-programs from the repeated sequences. The effectiveness of our solution has been shown on two dis- tinct real TV streams lasting 5 days. A comparative eval- uation with traditional approaches has also been performed (metadata-based and silences-and-monochrome-frames-based).
REPETITION DENSITY-BASED APPROACH FOR TV PROGRAM EXTRACTION
Ga¨
el Manson and Sid-Ahmed Berrani
Orange Labs - France Telecom R&D
4, rue du Clos Courtel. BP 91226
35510 Cesson-S´evign´e. France.
ABSTRACT
This paper addresses the problem of automatic TV broad-
casted program extraction. It consists firstly of precisely de-
termining the start and the end of each broadcasted TV pro-
gram, and then of properly giving them a name. The extracted
programs can be used to build novel services like TV-on-
Demand. The proposed solution is based on the density study
of repeated audiovisual sequences. This study allows to sort
out most of the inter-programs from the repeated sequences.
The effectiveness of our solution has been shown on two dis-
tinct real TV streams lasting 5 days. A comparative eval-
uation with traditional approaches has also been performed
(metadata-based and silences-and-monochro me-frames-based).
1. INTRODUCTION
TV-on-Demand is a novel service that aims to make previ-
ously broadcasted long TV programs available anytime and
anywhere. Basically, this service needs to extract and store
TV programs. Manual TV program extraction from TV streams
is a hard, tedious and very time consuming task. As a conse-
quence, automatic and efficient techniques are required.
It is possible for TV channels to know the accurate start
and end times of their broadcasted programs, though unfor-
tunately, most TV broadcast chains are too complex and not
standardized. The included metadata information does not
remain coherent and complete until the end of the chain. On
the other hand TV channels can refuse to give this informa-
tion for commercial purposes. As an example, the metadata
broadcasted with the TV stream and included by the TV chan-
nels, namely EPG (Electronic Program Guide) or EIT (Event
Information Table), provide approximate start and end times
and titles of some TV programs. They are however not always
available, imprecise and incomplete [1].
Basically, TV program extraction aims to precisely deter-
mine the the start and the end times of each broadcasted TV
programs. This paper addresses how to perform this extrac-
tion automatically. Its main contribution is an efficient and
unsupervised approach that relies on studying the density of
repeated audiovisual sequences in the TV stream.
A set of supervised and unsupervised techniques related to
TV program extraction have already been proposed. Most of
these techniques rely on detecting inter-programs (like com-
mercials or trailers) which are broadcasted between two parts
of a TV program or between two TV programs. If all inter-
programs are properly detected, TV programs (or parts of)
can be easily deduced.
The supervised techniques require a set of manually an-
notated data. This can be annotated broadcasted video se-
quences [2] used for perceptual hashing-based recognition.
Equally, audio or video fingerprinting can be used [3]. An-
notated data can also be more than one year of past manually
created TV program guides, which are used to learn and to
model the TV program guide [4]. The main drawbacks are
that the annotated database has to be manually created for
each TV channel and then periodically updated.
There are two kinds of unsupervised techniques:
1. The detection-based techniques use the intrinsic fea-
tures of the inter-programs like separating monochrome
frames, audio changes, action and presence of logos [5,
6]. All these approaches are limited to one kind of
inter-program (mainly commercials) and are thus not
sufficient to achieve a good TV stream segmentation.
2. The repetition-based techniques detect inter-programs
as near-identical audiovisual sequences in the TV stream.
Indeed, most of inter-programs are broadcasted several
times. In [7], a hashing-based solution is proposed to
detect repeated shots. In [8], a correlation study of au-
dio features is used to find near-identical sequences of
a pre-defined size within a buffer. In [9], a clustering-
based approach is proposed. It relies on grouping simi-
lar keyframes using visual features.
These last unsupervised techniques are the most promis-
ing. However, a post-processing step is required to select
from the detected repeated sequences those that are actually
inter-programs and that can lead to perform an accurate auto-
matic TV stream segmentation.
The rest of the paper is organized as follows. Section 2
presents our repetition density-based TV stream segmentation
for TV program extraction. The experimental study we con-
ducted to show the effectiveness of our approach is presented
in Section 3. And finally, Section 4 concludes the paper and
discusses future extensions.
2. THE PROPOSED SOLUTION
The general working scheme of our solution is the following:
the TV stream is first accumulated to a sufficient amount and
then it is continuously received and periodically processed.
This process of performing unsupervised TV program extrac-
tion is composed of three steps: repeated sequence detection,
TV stream segmentation using the repetition density, and seg-
ment annotation.
The main contributions of this paper concern the TV stream
segmentation step and the experiments validating our approach.
2.1. Repeated sequence detection
The repeated sequence detection technique we propose to use
is the one presented in [9]. Repeated audiovisual sequences
are in this context near-identical audiovisual sequences. Re-
peated sequences are detected from a micro-clustering ap-
proach that first groups near identical keyframes using DCT-
based 30 dimensional visual descriptors. The similarities of
temporal diversity of keyframes within the micro-clusters are
then analysed to create the repeated sequences.
A repeated sequence consists of a set of occurrences. We
note O, the set of all the occurrences of all the detected re-
peated sequences and R, the set of all the repeated sequences:
for each x∈ O, RS(x)∈ R is the repeated sequence to which belongs x
for each x∈ O, I P (x) = (1if xis an inter-program
0otherwise
2.2. TV stream segmentation using the repetition density
The goal of our solution is to segment the TV stream into pro-
grams. We represent the TV stream as a succession of con-
secutive TV programs. Two consecutive TV programs may
(or not) be separated by a break and each program may (or
not) contain a break. A break is composed of a succession
of one or more inter-programs which can be trailers, jingle
logos, opening/closing commercial break credits or commer-
cials. With this representation of the TV stream, our objec-
tive is to determine the start (resp. end) of each TV program,
i.e. the start (resp. end) of its first (resp. last) part.
Our solution detects parts of TV programs by detecting
breaks in the TV stream that are detected by their inter-programs.
As explained in the introduction, the most promising approach
to detect inter-programs is to use their repetition property. In-
deed, almost all inter-programs are broadcasted several times
in the stream. This hypothesis is validated in [10] where
relevant statistical data on repetition of inter-programs are
provided. However, the existing technique for repeated se-
quences detection detects only sequences that repeat in the
stream. Some of the detected sequences are actually inter-
programs and others belong to programs (e.g. flashbacks, open-
ing credits and news reports). Therefore, inter-programs have
to be sorted out from the whole set of detected repeated se-
quences.
We propose a technique to classify most of the repeated
sequences. The main idea behind our work is based on prior
knowledge on TV streams. It follows three hypotheses:
(H1) An occurrence xof a repeated sequence that is
surrounded by a lot of other occurrences of repeated
sequences is most likely inside a break with other inter-
programs. It is then considered as an inter-program.
We defined dw(x)the repetition density around xas
the number of repeated sequence occurrences within a
centered given time window w. Given the predefined
threshold td, we propose the following classification
rule:
for each x∈ O, dw(x)> tdI P (x) = 1
(H2) The repeated sequence occurrences in the neigh-
borhood (defined by tl) of an inter-program sequence
are also inter-programs:
for each x∈ O such as I P (x) = 1
y∈ O,kxyk< tlI P (y) = 1
(H3) If an occurrence of a repeated sequence has been
classified as an inter-program than all the other occur-
rences of the repeated sequence are also inter-programs:
for each x∈ O such as I P (x) = 1
y∈ O, RS (x) = RS(y)I P (y) = 1
From these hypotheses, we have built a repetition density-
based inter-program filter. For each occurrence of each re-
peated sequence, the repetition density is computed on the
given time window wand the occurrences with a density greater
than a threshold tdare considered as inter-programs (H1). By
extension (H3), all occurrences of a repeated sequence which
contain an inter-program occurrence are inter-programs. More-
over (H2), the neighboring repeated sequences of an inter-
program occurrence are also inter-programs. Neighboring oc-
currences of xare occurrences ywhose distance kxykin
the stream is less than tlseconds.
Parameters tdand tlhave to be set from prior knowledge
for each TV channel. They are then empirically adjusted.
Figure 1 shows the repetition density computed on 6 hours
of a real TV stream. The grey negative rectangles represent
the breaks in the stream. The black positive histograms rep-
resent the computed repetition density. The dashed-line rep-
resents the density threshold td. This figure shows that high
repetition density regions match with real breaks (H1). The
breaks which do not match with any high repetition density
regions can be detected using hypothesis H3.
Most inter-program s are detected by our repetition density-
based filter. Neighboring detected inter-programs (H2) are
merged to build the breaks in the TV stream. As a result, gaps
between two breaks create program segments.
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6
Density
Time (in hours)
Fig. 1. Repetition density computed on 6 hours of TV stream
(black positive histograms). Gray negative rectangles are the
real positions of breaks.
2.3. Segment annotation
The previous segmentation step provides an over-segmentation
of the TV stream. The resulting segments have then to be
merged and annotated in order to extract the full TV pro-
grams. For automatically labeling the segments, the straight-
forward approach is to use the metadata information broad-
casted with the TV stream like EPG or EIT. Algorithms such
as Dynamic Time Warping [2] can be used to merge and anno-
tate the segments from the metadata. However, this approach
heavily relies on the metadata. Its effectiveness mainly de-
pends on the reliability of the metadata. It requires at least
complete and consistent metadata which is not the case. A
deeper analysis of weaknesses of the TV metadata informa-
tion is given in [1].
Therefore, in order to reduce the reliance on metadata,
only three simple rules are used to perform segment annota-
tion: (1) three consecutive segments are merged if the dura-
tion of the middle segment lasts less than 60 seconds, (2) a
detected segment is labeled with the name of the metadata
segment that has the best overlap with the detected segment,
(3) consecutive segments with the same label are merged.
Experiments will show that these basic rules are sufficient
to achieve a very accurate TV program extraction.
3. EXPERIMENTS
To evaluate our approach, we have performed a set of exper-
iments using real TV broadcast streams from two different
channels recorded during 5 days: a French public TV chan-
nel (Cpub), and a French private TV channel (Cpv). In or-
der to conduct the following experiments, we have created a
ground-truth on Cpub and Cpv in which TV programs have
been precisely segmented and annotated. A set of 47 TV Pro-
grams has been labeled on Cpub and 56 on Cpv . On the 120 h
of recorded TV stream, the total duration of breaks has been
12h 02m 15s on Cpub and 17h 06m 09s on Cpv .
The results are evaluated using the following criteria:
1. the number of extracted programs (All),
2. the number of valid extracted programs (Ok) which are
correctly labeled,
3. the number of valid extracted programs (2s) with an
imprecision of the start and of the end less than 2 sec,
4. the number of valid extracted programs (10s) with an
imprecision of the start and of the end less than 10 sec,
5. the mean (µ) and the standard deviation (σ) of the im-
precision of the extracted programs.
The imprecision here means the absolute difference be-
tween the obtained start (resp. the end) time w.r.t. the ac-
curate start (resp. the end) time given by the ground-truth.
The imprecision is only evaluated on the valid extracted pro-
grams. Within both Cpv and Cpub, we have focused on the
long TV programs in the period between 11 am and 12 pm
that contains the most interesting programs (series, movies
and prime-time TV shows).
In order to perform a comparative study of our solution
(Our. Sol.), we have considered two other solutions: (1) a
metadata-based solution (Meta.) and (2) a monochrome-frames-
based solution (Monoch.).
The metadata-based solution uses the approximate starts,
ends and names given when available in the EPG.
The monochrome-frames-b ased solution first computes the
intersection between silences in the audio TV stream and mono-
chrome frames in the video TV stream as in [2]. Then, all the
detected intersections separated by more than 60 seconds are
considered as program segments.
We have also built a merged solution (Both.) that com-
bines the monochrome-frames-based detected breaks and the
repetition density-based detected breaks.
3.1. Evaluation of our solution on Cpub
The repeated sequence detection technique has first been ap-
plied. A set of 477 repeated sequences has been discovered
with a total number of 2001 occurrences on Cpub . We have
counted 210 commercial repeated sequences with an average
number of 4.92 occurrences. We have also counted 34 trailers
with an average number of 7.41 occurrences.
Table 1 shows the obtained results on Cpub . This re-
sult shows first that metadata-based solution is outperformed
by the other techniques. Then, the ratio of valid extracted
programs to the extracted programs (Ok/All) is almost the
same between our solution, monochrome-frames-based solu-
tion and the merged solution. As for the imprecision, our so-
lution is more accurate than the monochrome-frames-based
technique. This table also shows that our solution can be im-
proved by the use of silences and monochrome frames de-
tected breaks.
The detection of the start is more accurate than the detec-
tion of the end because of the reliance on metadata which are
more accurate on the start times. We note that TV programs
that have generated the most imprecision have been due to
“Le tour de France” which is a live sport program for which
the metadata is completely wrong.
start end
Ok/All 2s10s µ σ µ σ
Our. Sol 43/48 11 17 11.2s 11.3s 38.5s 134.1s
Monoch. 44/47 2 11 34.8s 55.6s 56.3s 102.8s
Both. 43/47 9 24 9.1s 8.7s 19.1s 90.9s
Meta. 46/47 0 0 186.1s 270s 439.4s 641.9s
Table 1. Evaluation results on Cpub .
3.2. Evaluation of our solution on Cpv
The repeated sequence detection technique has first been ap-
plied. A set of 656 repeated sequences has been discovered
with a total number of 2679 occurrences on Cpv . We have
counted 316 commercial repeated sequences with an average
number of 5.72 occurrences. We have also counted 86 trail-
ers with an average number of 5.32 occurrences. This illus-
trates the main differences between private and public chan-
nels. Private channels tend to have more commercials which
are repeated more often. However, other non-commercial
inter-programs on private channels tend to be repeated less.
For TV stream segmentation, non-commercial inter-programs
have a greater impact on the imprecision.
Table 2 shows the obtained results. As non-commercial
inter-programs are repeated less than on Cpub, our solution
has been less effective than on Cpub. However, it is still better
than the metadata-based solution. Our solution merged with
the monochrome-frames-based breaks has also been the most
effective.
start end
Ok/All 2s10s µ σ µ σ
Our. Sol. 56/58 2 15 46.7s 244.3s 80.3s 226.5s
Monoch. 56/58 2 9 30.6s 78.9s 104.9s 198.4s
Both. 56/58 11 23 12.9s 32.7s 61.8s 201.8s
Meta. 55/58 0 0 180.5s 120.1s 469.7s 289.1s
Table 2. Evaluation results on Cpv .
The obtained results on Cpub and Cpv show that automatic
TV program extraction is a complex problem. Our best results
show that about 45.6% of the TV programs have been effec-
tively extracted with an imprecision less than 10 seconds.
The successfully extracted TV progr ams have been mainly
prime time shows, movies and daily programs such series,
news, or games shows. The TV programs that have been ex-
tracted with a greater imprecision have been series inside a
succession of episodes. For two programs on Cpv , impreci-
sion has been due to the mis-detection of a sponsoring that
does not repeat in the stream.
4. CONCLUSION
This paper shows the importance of the repetition density
of inter-programs and how it can be used in a TV stream
segmentation process for TV program extraction. Experi-
ments show that the traditional approaches (metadata-based
or monochrome-frames-based) are not sufficiently effective
in order to perform an accurate TV segmentation. These can
be, however, greatly improved by our merged solution that
can achieve very accurate TV program extraction.
Future extension will study how our approach can be ex-
tended to be applied on-line, that is, how to segment the TV
stream on-line. This will require performing the repetition de-
tection on-line. We will also address how to remove breaks
from programs.
5. ACKNOWLEDGMENT
The authors would like to gratefully acknowledge X. Naturel
for his help with the monochrome-frames-based solution.
6. REFERENCES
[1] S.-A. Berrani, P. Lechat, and G. Manson, “TV broadcast macro-
segmentation: Metadata-based vs. content-based approaches,” in Proc.
of the ACM Int. Conf. on Image and Video Retrieval, Amsterdam, The
Netherlands, July 2007.
[2] X. Naturel, G. Gravier, and P. Gros, “Fast structuring of large television
streams using program guides, in Proc. of the 4th Int. Workshop on
Adaptive Multimedia Retrieval, Geneva, Switzerland, July 2006.
[3] J. Oostveen, T. Kalker, and J. Haitsma, “Feature extraction and a
database strategy for video fingerprinting, in Proc. of the 5th Int. Conf.
on Recent Advances in Visual Information Systems, Hsin Chu, Taiwan,
March 2002.
[4] J.-P. Poli and J. Carrive, “Modeling television schedules for television
stream structuring, in Proc. of ACM Int. MultiMedia Modeling Conf.,
Singapore, January 2007.
[5] R. Lienhart, C. Kuhmunch, and W. Effelsberg, “On the detection and
recognition of television commercials,” in Proc. of the IEEE Int. Conf.
on Multimedia Computing and Systems, Ottawa, Ontario, Canada,
June 1997.
[6] A. Albiol, M.J. Ch, F.A. Albiol, and L. Torres, “Detection of TV com-
mercials, in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and
Signal Processing (vol. 3), Montreal, Quebec, Canada, May 2004.
[7] J. M. Gauch and A. Shivadas, “Finding and identifying unknown com-
mercials using repeated video sequence detection,” Journal of Com-
puter vision and image understanding, vol. 103, pp. 80–88, 2006.
[8] C. Herley, “Argos: automatically extracting repeating objects from
multimedia streams, IEEE Transactions on Multimedia, vol. 8, no.
1, pp. 115–129, 2006.
[9] S.-A. Berrani, G. Manson, and P. Lechat, “A non-supervised approach
for repeated sequence detection in tv broadcast streams, Signal Pro-
cessing: Image Communication, spec. iss. on ”Semantic Analysis for
Interactive Multimedia Services”, vol. 23, no. 7, pp. 525–537, 2008.
[10] G. Manson and S.-A. Berrani, “Tv broadcast macro-segmentation us-
ing the repetition property of inter-programs, in Proc. of the IASTED
Int. Conf. on Signal Processing, Pattern Recognition and Applications,
Innsbruck, Austria, February 2009.
    • Using the results of detections, and the data from the TV electronic programme guides, we would like to generate an accurate timing of the starting and ending of programmes, and of the -usually undocumented -inter-programmes. Authors such as Benezeth [15], Manson [16], Abduraman [17], Wu [18], Gauch [19], have undertaken such work with promising results, but we estimate that the reliability and scale of the obtained data should substantially help this work.
    [Show abstract] [Hide abstract] ABSTRACT: Using specifically-designed lightweight audio and video fingerprints, we were able to detect repeated contents over a quasi-uninterrupted recording of 10+ TV channels, over more than 4 years, starting January 2010 (380,000 hours); the detection independently uses audio and video fingerprints. The results are stored into a database that holds more than 20 million detected repeats. Detections range from a few seconds up to one hour. The database can be explored using a standard web browser. There are a many potential applications, e.g. for structuring and documenting contents.
    Conference Paper · Jun 2014
    • Consecutive segments that are labeled using the same label are then fused. The straightforward approach presented in [18] associates each segment with the metadata segment that it overlaps the most with. In [19] , a Dynamic Time Warping (DTW) algorithm is used to label and fuse the program segments.
    [Show abstract] [Hide abstract] ABSTRACT: This paper addresses the problem of automatic broadcasted TV program extraction from the low-level video data without using any metadata. In this context, the TV stream is first segmented. Segments are then classified into two categories: segments of inter-programs (e.g. commercials) and segments of programs that are parts of broadcasted TV programs (e.g. films, news, shows). One TV program can hence be split into several parts over a set of consecutive program segments. Consecutive program segments of the same TV program thus have to be reunified or fused in order to retrieve the entire TV program. This consecutive program segment reunification is the main concern of the paper. We focus in particular on the case where no metadata is available. We assume that the different parts of a same TV program share a set of features. Hence, our solution relies on analyzing the visual content and characteristics of each pair of consecutive segments in order to decide if they have to be reunified or not. It uses, amongst others, content-based descriptors like the color distribution, the number of faces in each segment and also the number of near-identical shots between the two segments. These descriptors are then used within an SVM classifier which makes the final decision. The effectiveness of the solution has been shown experimentally using a real TV stream of three weeks.
    Full-text · Conference Paper · Jan 2009
Conference Paper
February 2009
This paper addresses the problem of TV stream macro-seg-mentation which consists of automatically determining the start and the end of each broadcasted program and inter-program (e.g. commercial, trailer). As programs do not share any common features, this paper focuses in partic-ular on detecting inter-programs. Programs are then ex-tracted as the rest of the stream. More precisely,... [Show full abstract]
Conference Paper
December 2008
This paper proposes a method for classifying TV stream segments as long programs or inter-programs (IP). As al- mostallIPsarebroadcastedseveraltimesintheTVstream, afirst segmentation step based on a repeated sequence de- tection is performed. Resulting segments (the occurrences ofrepeatedsequencesandthe rest ofthestream)haveto be classified. TheproposedsolutionforthatisbasedonInduc- tiveLogic... [Show full abstract]
Conference Paper
September 2009
Les flux télévisés sont des documents audiovisuels bien structurés : ils sont en effet composés de programmes successifs (films, séries, documentaires, jeux, journaux, ...). Dès que les flux sont diffusés sur les ondes, ils perdent malheureusement toute information de structuration. La problématique est de retrouver automatiquement cette structuration, c’est-à-dire le début et la fin de chaque... [Show full abstract]
Conference Paper
January 2009
This paper addresses the problem of automatic broadcasted TV program extraction from the low-level video data without using any metadata. In this context, the TV stream is first segmented. Segments are then classified into two categories: segments of inter-programs (e.g. commercials) and segments of programs that are parts of broadcasted TV programs (e.g. films, news, shows). One TV program... [Show full abstract]
Discover more