Repetition density-based approach for TV program extraction.
ABSTRACT This paper addresses the problem of automatic TV broad- casted program extraction. It consists firstly of precisely de- termining the start and the end of each broadcasted TV pro- gram, and then of properly giving them a name. The extracted programs can be used to build novel services like TV-on- Demand. The proposed solution is based on the density study of repeated audiovisual sequences. This study allows to sort out most of the inter-programs from the repeated sequences. The effectiveness of our solution has been shown on two dis- tinct real TV streams lasting 5 days. A comparative eval- uation with traditional approaches has also been performed (metadata-based and silences-and-monochrome-frames-based).
Conference Paper: Content-Based Video Segment Reunification for TV Program Extraction.[Show abstract] [Hide abstract]
ABSTRACT: This paper addresses the problem of automatic broadcasted TV program extraction from the low-level video data without using any metadata. In this context, the TV stream is first segmented. Segments are then classified into two categories: segments of inter-programs (e.g. commercials) and segments of programs that are parts of broadcasted TV programs (e.g. films, news, shows). One TV program can hence be split into several parts over a set of consecutive program segments. Consecutive program segments of the same TV program thus have to be reunified or fused in order to retrieve the entire TV program. This consecutive program segment reunification is the main concern of the paper. We focus in particular on the case where no metadata is available. We assume that the different parts of a same TV program share a set of features. Hence, our solution relies on analyzing the visual content and characteristics of each pair of consecutive segments in order to decide if they have to be reunified or not. It uses, amongst others, content-based descriptors like the color distribution, the number of faces in each segment and also the number of near-identical shots between the two segments. These descriptors are then used within an SVM classifier which makes the final decision. The effectiveness of the solution has been shown experimentally using a real TV stream of three weeks.ISM 2009, 11th IEEE International Symposium on Multimedia, San Diego, California, USA, December 14-16, 2009; 01/2009
REPETITION DENSITY-BASED APPROACH FOR TV PROGRAM EXTRACTION
Ga¨ el Manson and Sid-Ahmed Berrani
Orange Labs - France Telecom R&D
4, rue du Clos Courtel. BP 91226
35510 Cesson-S´ evign´ e. France.
This paper addresses the problem of automatic TV broad-
casted program extraction. It consists firstly of precisely de-
termining the start and the end of each broadcasted TV pro-
gram,andthenof properlygivingthema name. Theextracted
programs can be used to build novel services like TV-on-
Demand. The proposed solution is based on the density study
of repeated audiovisual sequences. This study allows to sort
out most of the inter-programs from the repeated sequences.
The effectiveness of our solution has been shown on two dis-
tinct real TV streams lasting 5 days. A comparative eval-
uation with traditional approaches has also been performed
TV-on-Demand is a novel service that aims to make previ-
ously broadcasted long TV programs available anytime and
anywhere. Basically, this service needs to extract and store
is a hard, tedious and very time consuming task. As a conse-
quence, automatic and efficient techniques are required.
It is possible for TV channels to know the accurate start
and end times of their broadcasted programs, though unfor-
tunately, most TV broadcast chains are too complex and not
standardized. The included metadata information does not
remain coherent and complete until the end of the chain. On
the other hand TV channels can refuse to give this informa-
tion for commercial purposes. As an example, the metadata
broadcastedwiththe TVstreamandincludedbytheTV chan-
nels, namely EPG (Electronic Program Guide) or EIT (Event
Information Table), provide approximate start and end times
andtitles ofsomeTVprograms. Theyarehowevernotalways
available, imprecise and incomplete .
Basically, TV program extraction aims to precisely deter-
mine the the start and the end times of each broadcasted TV
programs. This paper addresses how to perform this extrac-
tion automatically. Its main contribution is an efficient and
unsupervised approach that relies on studying the density of
repeated audiovisual sequences in the TV stream.
TV program extraction have already been proposed. Most of
these techniques rely on detecting inter-programs (like com-
mercials or trailers) which are broadcasted between two parts
of a TV program or between two TV programs. If all inter-
programs are properly detected, TV programs (or parts of)
can be easily deduced.
The supervised techniques require a set of manually an-
notated data. This can be annotated broadcasted video se-
quences  used for perceptual hashing-based recognition.
Equally, audio or video fingerprinting can be used . An-
notated data can also be more than one year of past manually
created TV program guides, which are used to learn and to
model the TV program guide . The main drawbacks are
that the annotated database has to be manually created for
each TV channel and then periodically updated.
There are two kinds of unsupervised techniques:
1. The detection-based techniques use the intrinsic fea-
frames, audio changes, action and presence of logos [5,
6]. All these approaches are limited to one kind of
inter-program (mainly commercials) and are thus not
sufficient to achieve a good TV stream segmentation.
2. The repetition-based techniques detect inter-programs
Indeed, most of inter-programs are broadcasted several
times. In , a hashing-based solution is proposed to
detect repeated shots. In , a correlation study of au-
dio features is used to find near-identical sequences of
a pre-defined size within a buffer. In , a clustering-
based approach is proposed. It relies on grouping simi-
lar keyframes using visual features.
These last unsupervised techniques are the most promis-
ing. However, a post-processing step is required to select
from the detected repeated sequences those that are actually
inter-programs and that can lead to perform an accurate auto-
matic TV stream segmentation.
The rest of the paper is organized as follows. Section 2
for TV program extraction. The experimental study we con-
ducted to show the effectiveness of our approach is presented
in Section 3. And finally, Section 4 concludes the paper and
discusses future extensions.
2. THE PROPOSED SOLUTION
The general working scheme of our solution is the following:
the TV stream is first accumulated to a sufficient amount and
then it is continuously received and periodically processed.
This process of performingunsupervisedTV programextrac-
tion is composed of three steps: repeated sequence detection,
TV stream segmentationusing the repetition density, and seg-
2.1. Repeated sequence detection
The repeated sequence detection technique we propose to use
is the one presented in . Repeated audiovisual sequences
are in this context near-identical audiovisual sequences. Re-
peated sequences are detected from a micro-clustering ap-
proach that first groups near identical keyframes using DCT-
based 30 dimensional visual descriptors. The similarities of
temporal diversity of keyframes within the micro-clusters are
then analysed to create the repeated sequences.
A repeated sequence consists of a set of occurrences. We
note O, the set of all the occurrences of all the detected re-
peatedsequencesand R, the set of all the repeatedsequences:
for each x ∈ O,RS(x) ∈ R is the repeated sequence to which belongs x
for each x ∈ O,IP(x) =
if x is an inter-program
2.2. TV stream segmentation using the repetition density
The goal of oursolution is to segmentthe TV stream into pro-
grams. We represent the TV stream as a succession of con-
secutive TV programs. Two consecutive TV programs may
(or not) be separated by a break and each program may (or
not) contain a break. A break is composed of a succession
of one or more inter-programs which can be trailers, jingle
logos, opening/closing commercial break credits or commer-
cials. With this representation of the TV stream, our objec-
tive is to determine the start (resp. end) of each TV program,
i.e. the start (resp. end) of its first (resp. last) part.
Our solution detects parts of TV programs by detecting
to detect inter-programsis to use their repetition property. In-
deed, almost all inter-programs are broadcasted several times
in the stream. This hypothesis is validated in  where
relevant statistical data on repetition of inter-programs are
provided. However, the existing technique for repeated se-
quences detection detects only sequences that repeat in the
stream. Some of the detected sequences are actually inter-
ing credits and news reports). Therefore, inter-programshave
to be sorted out from the whole set of detected repeated se-
We propose a technique to classify most of the repeated
sequences. The main idea behind our work is based on prior
knowledge on TV streams. It follows three hypotheses:
• (H1) An occurrence x of a repeated sequence that is
surrounded by a lot of other occurrences of repeated
sequences is most likely inside a break with other inter-
programs. It is then considered as an inter-program.
We defined dw(x) the repetition density around x as
the number of repeated sequence occurrences within a
centered given time window w. Given the predefined
threshold td, we propose the following classification
for each x ∈ O,dw(x) > td⇒ IP(x) = 1
• (H2) The repeated sequence occurrences in the neigh-
borhood (defined by tl) of an inter-program sequence
are also inter-programs:
for each x ∈ O such as IP(x) = 1
∀y ∈ O,?x − y? < tl⇒ IP(y) = 1
• (H3) If an occurrence of a repeated sequence has been
classified as an inter-program than all the other occur-
for each x ∈ O such as IP(x) = 1
∀y ∈ O,RS(x) = RS(y) ⇒ IP(y) = 1
From these hypotheses,we have built a repetitiondensity-
based inter-program filter. For each occurrence of each re-
peated sequence, the repetition density is computed on the
than a threshold tdare considered as inter-programs(H1). By
extension (H3), all occurrences of a repeated sequence which
over (H2), the neighboring repeated sequences of an inter-
programoccurrenceare also inter-programs. Neighboringoc-
currences of x are occurrences y whose distance ?x − y? in
the stream is less than tlseconds.
Parameters tdand tlhave to be set from prior knowledge
for each TV channel. They are then empirically adjusted.
of a real TV stream. The grey negative rectangles represent
the breaks in the stream. The black positive histograms rep-
resent the computed repetition density. The dashed-line rep-
resents the density threshold td. This figure shows that high
repetition density regions match with real breaks (H1). The
breaks which do not match with any high repetition density
regions can be detected using hypothesis H3.
based filter. Neighboring detected inter-programs (H2) are
mergedto build the breaks in the TV stream. As a result, gaps
between two breaks create program segments.
0 1 2 3 4 5 6
Time (in hours)
Fig. 1. Repetition density computed on 6 hours of TV stream
(black positive histograms). Gray negative rectangles are the
real positions of breaks.
2.3. Segment annotation
of the TV stream. The resulting segments have then to be
merged and annotated in order to extract the full TV pro-
grams. For automatically labeling the segments, the straight-
forward approach is to use the metadata information broad-
casted with the TV stream like EPG or EIT. Algorithms such
tate the segments from the metadata. However, this approach
heavily relies on the metadata. Its effectiveness mainly de-
pends on the reliability of the metadata. It requires at least
complete and consistent metadata which is not the case. A
deeper analysis of weaknesses of the TV metadata informa-
tion is given in .
Therefore, in order to reduce the reliance on metadata,
only three simple rules are used to perform segment annota-
tion: (1) three consecutive segments are merged if the dura-
tion of the middle segment lasts less than 60 seconds, (2) a
detected segment is labeled with the name of the metadata
segment that has the best overlap with the detected segment,
(3) consecutive segments with the same label are merged.
Experimentswill show that these basic rules are sufficient
to achieve a very accurate TV program extraction.
To evaluate our approach, we have performed a set of exper-
iments using real TV broadcast streams from two different
channels recorded during 5 days: a French public TV chan-
nel (Cpub), and a French private TV channel (Cpv). In or-
der to conduct the following experiments, we have created a
ground-truth on Cpuband Cpvin which TV programs have
been precisely segmented and annotated. A set of 47 TV Pro-
grams has been labeled on Cpuband 56 on Cpv. On the 120 h
of recorded TV stream, the total duration of breaks has been
12h 02m 15s on Cpuband 17h 06m 09s on Cpv.
The results are evaluated using the following criteria:
1. the number of extracted programs (All),
2. the number of valid extracted programs (Ok) which are
3. the number of valid extracted programs (2s) with an
imprecision of the start and of the end less than 2 sec,
4. the number of valid extracted programs (10s) with an
imprecision of the start and of the end less than 10 sec,
5. the mean (µ) and the standard deviation (σ) of the im-
precision of the extracted programs.
The imprecision here means the absolute difference be-
tween the obtained start (resp. the end) time w.r.t. the ac-
curate start (resp. the end) time given by the ground-truth.
The imprecision is only evaluated on the valid extracted pro-
grams. Within both Cpvand Cpub, we have focused on the
long TV programs in the period between 11 am and 12 pm
that contains the most interesting programs (series, movies
and prime-time TV shows).
In order to perform a comparative study of our solution
(Our. Sol.), we have considered two other solutions: (1) a
based solution (Monoch.).
The metadata-based solution uses the approximate starts,
ends and names given when available in the EPG.
chrome frames in the video TV stream as in . Then, all the
detected intersections separated by more than 60 seconds are
considered as program segments.
We have also built a merged solution (Both.) that com-
bines the monochrome-frames-baseddetected breaks and the
repetition density-based detected breaks.
3.1. Evaluation of our solution on Cpub
The repeated sequence detection technique has first been ap-
plied. A set of 477 repeated sequences has been discovered
with a total number of 2001 occurrences on Cpub. We have
counted 210 commercial repeated sequences with an average
numberof 4.92 occurrences. We have also counted34 trailers
with an average number of 7.41 occurrences.
Table 1 shows the obtained results on Cpub. This re-
sult shows first that metadata-based solution is outperformed
by the other techniques. Then, the ratio of valid extracted
programs to the extracted programs (Ok/All) is almost the
same between our solution, monochrome-frames-basedsolu-
tion and the merged solution. As for the imprecision, our so-
lution is more accurate than the monochrome-frames-based
technique. This table also shows that our solution can be im-
proved by the use of silences and monochrome frames de-
The detection of the start is more accurate than the detec-
tion of the end because of the reliance on metadata which are
more accurate on the start times. We note that TV programs
that have generated the most imprecision have been due to
“Le tour de France” which is a live sport program for which
the metadata is completely wrong.
Table 1. Evaluation results on Cpub.
3.2. Evaluation of our solution on Cpv
The repeated sequence detection technique has first been ap-
plied. A set of 656 repeated sequences has been discovered
with a total number of 2679 occurrences on Cpv. We have
counted 316 commercial repeated sequences with an average
number of 5.72 occurrences. We have also counted 86 trail-
ers with an average number of 5.32 occurrences. This illus-
trates the main differences between private and public chan-
nels. Private channels tend to have more commercials which
are repeated more often. However, other non-commercial
inter-programs on private channels tend to be repeated less.
ForTV stream segmentation,non-commercialinter-programs
have a greater impact on the imprecision.
Table 2 shows the obtained results. As non-commercial
inter-programs are repeated less than on Cpub, our solution
has been less effectivethan on Cpub. However,it is still better
than the metadata-based solution. Our solution merged with
the monochrome-frames-basedbreaks has also been the most
Table 2. Evaluation results on Cpv.
TVprogramextractionis acomplexproblem. Ourbestresults
show that about 45.6% of the TV programs have been effec-
tively extracted with an imprecision less than 10 seconds.
prime time shows, movies and daily programs such series,
news, or games shows. The TV programs that have been ex-
tracted with a greater imprecision have been series inside a
succession of episodes. For two programs on Cpv, impreci-
sion has been due to the mis-detection of a sponsoring that
does not repeat in the stream.
This paper shows the importance of the repetition density
of inter-programs and how it can be used in a TV stream
segmentation process for TV program extraction. Experi-
ments show that the traditional approaches (metadata-based
or monochrome-frames-based) are not sufficiently effective
in order to perform an accurate TV segmentation. These can
be, however, greatly improved by our merged solution that
can achieve very accurate TV program extraction.
Future extension will study how our approach can be ex-
tended to be applied on-line, that is, how to segment the TV
streamon-line. This will requireperformingtherepetitionde-
tection on-line. We will also address how to remove breaks
The authors would like to gratefully acknowledge X. Naturel
for his help with the monochrome-frames-basedsolution.
 S.-A. Berrani, P. Lechat, and G. Manson,
segmentation: Metadata-based vs. content-based approaches,” in Proc.
of the ACM Int. Conf. on Image and Video Retrieval, Amsterdam, The
Netherlands, July 2007.
“TV broadcast macro-
 X. Naturel, G. Gravier, and P. Gros, “Fast structuring of large television
streams using program guides,” in Proc. of the 4th Int. Workshop on
Adaptive Multimedia Retrieval, Geneva, Switzerland, July 2006.
 J. Oostveen, T. Kalker, and J. Haitsma,
database strategy for video fingerprinting,” in Proc. of the 5th Int. Conf.
on Recent Advances in Visual Information Systems, Hsin Chu, Taiwan,
“Feature extraction and a
 J.-P. Poli and J. Carrive, “Modeling television schedules for television
stream structuring,” in Proc. of ACM Int. MultiMedia Modeling Conf.,
Singapore, January 2007.
 R. Lienhart, C. Kuhmunch, and W. Effelsberg, “On the detection and
recognition of television commercials,” in Proc. of the IEEE Int. Conf.
on Multimedia Computing and Systems, Ottawa, Ontario, Canada,
 A. Albiol, M.J. Ch, F.A. Albiol, and L. Torres, “Detection of TV com-
mercials,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and
Signal Processing (vol. 3), Montreal, Quebec, Canada, May 2004.
 J. M. Gauch and A. Shivadas, “Finding and identifying unknown com-
mercials using repeated video sequence detection,” Journal of Com-
puter vision and image understanding, vol. 103, pp. 80–88, 2006.
 C. Herley,
multimedia streams,” IEEE Transactions on Multimedia, vol. 8, no.
1, pp. 115–129, 2006.
“Argos: automatically extracting repeating objects from
 S.-A. Berrani, G. Manson, and P. Lechat, “A non-supervised approach
for repeated sequence detection in tv broadcast streams,” Signal Pro-
cessing: Image Communication, spec. iss. on ”Semantic Analysis for
Interactive Multimedia Services”, vol. 23, no. 7, pp. 525–537, 2008.
 G. Manson and S.-A. Berrani, “Tv broadcast macro-segmentation us-
ing the repetition property of inter-programs,” in Proc. of the IASTED
Int. Conf. on Signal Processing, Pattern Recognition and Applications,
Innsbruck, Austria, February 2009.