Repetition density-based approach for TV program extraction
This paper addresses the problem of automatic TV broad- casted program extraction. It consists firstly of precisely de- termining the start and the end of each broadcasted TV pro- gram, and then of properly giving them a name. The extracted programs can be used to build novel services like TV-on- Demand. The proposed solution is based on the density study of repeated audiovisual sequences. This study allows to sort out most of the inter-programs from the repeated sequences. The effectiveness of our solution has been shown on two dis- tinct real TV streams lasting 5 days. A comparative eval- uation with traditional approaches has also been performed (metadata-based and silences-and-monochrome-frames-based).
REPETITION DENSITY-BASED APPROACH FOR TV PROGRAM EXTRACTION
el Manson and Sid-Ahmed Berrani
Orange Labs - France Telecom R&D
4, rue du Clos Courtel. BP 91226
35510 Cesson-S´evign´e. France.
This paper addresses the problem of automatic TV broad-
casted program extraction. It consists ﬁrstly of precisely de-
termining the start and the end of each broadcasted TV pro-
gram, and then of properly giving them a name. The extracted
programs can be used to build novel services like TV-on-
Demand. The proposed solution is based on the density study
of repeated audiovisual sequences. This study allows to sort
out most of the inter-programs from the repeated sequences.
The effectiveness of our solution has been shown on two dis-
tinct real TV streams lasting 5 days. A comparative eval-
uation with traditional approaches has also been performed
(metadata-based and silences-and-monochro me-frames-based).
TV-on-Demand is a novel service that aims to make previ-
ously broadcasted long TV programs available anytime and
anywhere. Basically, this service needs to extract and store
TV programs. Manual TV program extraction from TV streams
is a hard, tedious and very time consuming task. As a conse-
quence, automatic and efﬁcient techniques are required.
It is possible for TV channels to know the accurate start
and end times of their broadcasted programs, though unfor-
tunately, most TV broadcast chains are too complex and not
standardized. The included metadata information does not
remain coherent and complete until the end of the chain. On
the other hand TV channels can refuse to give this informa-
tion for commercial purposes. As an example, the metadata
broadcasted with the TV stream and included by the TV chan-
nels, namely EPG (Electronic Program Guide) or EIT (Event
Information Table), provide approximate start and end times
and titles of some TV programs. They are however not always
available, imprecise and incomplete .
Basically, TV program extraction aims to precisely deter-
mine the the start and the end times of each broadcasted TV
programs. This paper addresses how to perform this extrac-
tion automatically. Its main contribution is an efﬁcient and
unsupervised approach that relies on studying the density of
repeated audiovisual sequences in the TV stream.
A set of supervised and unsupervised techniques related to
TV program extraction have already been proposed. Most of
these techniques rely on detecting inter-programs (like com-
mercials or trailers) which are broadcasted between two parts
of a TV program or between two TV programs. If all inter-
programs are properly detected, TV programs (or parts of)
can be easily deduced.
The supervised techniques require a set of manually an-
notated data. This can be annotated broadcasted video se-
quences  used for perceptual hashing-based recognition.
Equally, audio or video ﬁngerprinting can be used . An-
notated data can also be more than one year of past manually
created TV program guides, which are used to learn and to
model the TV program guide . The main drawbacks are
that the annotated database has to be manually created for
each TV channel and then periodically updated.
There are two kinds of unsupervised techniques:
1. The detection-based techniques use the intrinsic fea-
tures of the inter-programs like separating monochrome
frames, audio changes, action and presence of logos [5,
6]. All these approaches are limited to one kind of
inter-program (mainly commercials) and are thus not
sufﬁcient to achieve a good TV stream segmentation.
2. The repetition-based techniques detect inter-programs
as near-identical audiovisual sequences in the TV stream.
Indeed, most of inter-programs are broadcasted several
times. In , a hashing-based solution is proposed to
detect repeated shots. In , a correlation study of au-
dio features is used to ﬁnd near-identical sequences of
a pre-deﬁned size within a buffer. In , a clustering-
based approach is proposed. It relies on grouping simi-
lar keyframes using visual features.
These last unsupervised techniques are the most promis-
ing. However, a post-processing step is required to select
from the detected repeated sequences those that are actually
inter-programs and that can lead to perform an accurate auto-
matic TV stream segmentation.
The rest of the paper is organized as follows. Section 2
presents our repetition density-based TV stream segmentation
for TV program extraction. The experimental study we con-
ducted to show the effectiveness of our approach is presented
in Section 3. And ﬁnally, Section 4 concludes the paper and
discusses future extensions.
2. THE PROPOSED SOLUTION
The general working scheme of our solution is the following:
the TV stream is ﬁrst accumulated to a sufﬁcient amount and
then it is continuously received and periodically processed.
This process of performing unsupervised TV program extrac-
tion is composed of three steps: repeated sequence detection,
TV stream segmentation using the repetition density, and seg-
The main contributions of this paper concern the TV stream
segmentation step and the experiments validating our approach.
2.1. Repeated sequence detection
The repeated sequence detection technique we propose to use
is the one presented in . Repeated audiovisual sequences
are in this context near-identical audiovisual sequences. Re-
peated sequences are detected from a micro-clustering ap-
proach that ﬁrst groups near identical keyframes using DCT-
based 30 dimensional visual descriptors. The similarities of
temporal diversity of keyframes within the micro-clusters are
then analysed to create the repeated sequences.
A repeated sequence consists of a set of occurrences. We
note O, the set of all the occurrences of all the detected re-
peated sequences and R, the set of all the repeated sequences:
for each x∈ O, RS(x)∈ R is the repeated sequence to which belongs x
for each x∈ O, I P (x) = (1if xis an inter-program
2.2. TV stream segmentation using the repetition density
The goal of our solution is to segment the TV stream into pro-
grams. We represent the TV stream as a succession of con-
secutive TV programs. Two consecutive TV programs may
(or not) be separated by a break and each program may (or
not) contain a break. A break is composed of a succession
of one or more inter-programs which can be trailers, jingle
logos, opening/closing commercial break credits or commer-
cials. With this representation of the TV stream, our objec-
tive is to determine the start (resp. end) of each TV program,
i.e. the start (resp. end) of its ﬁrst (resp. last) part.
Our solution detects parts of TV programs by detecting
breaks in the TV stream that are detected by their inter-programs.
As explained in the introduction, the most promising approach
to detect inter-programs is to use their repetition property. In-
deed, almost all inter-programs are broadcasted several times
in the stream. This hypothesis is validated in  where
relevant statistical data on repetition of inter-programs are
provided. However, the existing technique for repeated se-
quences detection detects only sequences that repeat in the
stream. Some of the detected sequences are actually inter-
programs and others belong to programs (e.g. ﬂashbacks, open-
ing credits and news reports). Therefore, inter-programs have
to be sorted out from the whole set of detected repeated se-
We propose a technique to classify most of the repeated
sequences. The main idea behind our work is based on prior
knowledge on TV streams. It follows three hypotheses:
•(H1) An occurrence xof a repeated sequence that is
surrounded by a lot of other occurrences of repeated
sequences is most likely inside a break with other inter-
programs. It is then considered as an inter-program.
We deﬁned dw(x)the repetition density around xas
the number of repeated sequence occurrences within a
centered given time window w. Given the predeﬁned
threshold td, we propose the following classiﬁcation
for each x∈ O, dw(x)> td⇒I P (x) = 1
•(H2) The repeated sequence occurrences in the neigh-
borhood (deﬁned by tl) of an inter-program sequence
are also inter-programs:
for each x∈ O such as I P (x) = 1
∀y∈ O,kx−yk< tl⇒I P (y) = 1
•(H3) If an occurrence of a repeated sequence has been
classiﬁed as an inter-program than all the other occur-
rences of the repeated sequence are also inter-programs:
for each x∈ O such as I P (x) = 1
∀y∈ O, RS (x) = RS(y)⇒I P (y) = 1
From these hypotheses, we have built a repetition density-
based inter-program ﬁlter. For each occurrence of each re-
peated sequence, the repetition density is computed on the
given time window wand the occurrences with a density greater
than a threshold tdare considered as inter-programs (H1). By
extension (H3), all occurrences of a repeated sequence which
contain an inter-program occurrence are inter-programs. More-
over (H2), the neighboring repeated sequences of an inter-
program occurrence are also inter-programs. Neighboring oc-
currences of xare occurrences ywhose distance kx−ykin
the stream is less than tlseconds.
Parameters tdand tlhave to be set from prior knowledge
for each TV channel. They are then empirically adjusted.
Figure 1 shows the repetition density computed on 6 hours
of a real TV stream. The grey negative rectangles represent
the breaks in the stream. The black positive histograms rep-
resent the computed repetition density. The dashed-line rep-
resents the density threshold td. This ﬁgure shows that high
repetition density regions match with real breaks (H1). The
breaks which do not match with any high repetition density
regions can be detected using hypothesis H3.
Most inter-program s are detected by our repetition density-
based ﬁlter. Neighboring detected inter-programs (H2) are
merged to build the breaks in the TV stream. As a result, gaps
between two breaks create program segments.
0 1 2 3 4 5 6
Time (in hours)
Fig. 1. Repetition density computed on 6 hours of TV stream
(black positive histograms). Gray negative rectangles are the
real positions of breaks.
2.3. Segment annotation
The previous segmentation step provides an over-segmentation
of the TV stream. The resulting segments have then to be
merged and annotated in order to extract the full TV pro-
grams. For automatically labeling the segments, the straight-
forward approach is to use the metadata information broad-
casted with the TV stream like EPG or EIT. Algorithms such
as Dynamic Time Warping  can be used to merge and anno-
tate the segments from the metadata. However, this approach
heavily relies on the metadata. Its effectiveness mainly de-
pends on the reliability of the metadata. It requires at least
complete and consistent metadata which is not the case. A
deeper analysis of weaknesses of the TV metadata informa-
tion is given in .
Therefore, in order to reduce the reliance on metadata,
only three simple rules are used to perform segment annota-
tion: (1) three consecutive segments are merged if the dura-
tion of the middle segment lasts less than 60 seconds, (2) a
detected segment is labeled with the name of the metadata
segment that has the best overlap with the detected segment,
(3) consecutive segments with the same label are merged.
Experiments will show that these basic rules are sufﬁcient
to achieve a very accurate TV program extraction.
To evaluate our approach, we have performed a set of exper-
iments using real TV broadcast streams from two different
channels recorded during 5 days: a French public TV chan-
nel (Cpub), and a French private TV channel (Cpv). In or-
der to conduct the following experiments, we have created a
ground-truth on Cpub and Cpv in which TV programs have
been precisely segmented and annotated. A set of 47 TV Pro-
grams has been labeled on Cpub and 56 on Cpv . On the 120 h
of recorded TV stream, the total duration of breaks has been
12h 02m 15s on Cpub and 17h 06m 09s on Cpv .
The results are evaluated using the following criteria:
1. the number of extracted programs (All),
2. the number of valid extracted programs (Ok) which are
3. the number of valid extracted programs (2s) with an
imprecision of the start and of the end less than 2 sec,
4. the number of valid extracted programs (10s) with an
imprecision of the start and of the end less than 10 sec,
5. the mean (µ) and the standard deviation (σ) of the im-
precision of the extracted programs.
The imprecision here means the absolute difference be-
tween the obtained start (resp. the end) time w.r.t. the ac-
curate start (resp. the end) time given by the ground-truth.
The imprecision is only evaluated on the valid extracted pro-
grams. Within both Cpv and Cpub, we have focused on the
long TV programs in the period between 11 am and 12 pm
that contains the most interesting programs (series, movies
and prime-time TV shows).
In order to perform a comparative study of our solution
(Our. Sol.), we have considered two other solutions: (1) a
metadata-based solution (Meta.) and (2) a monochrome-frames-
based solution (Monoch.).
The metadata-based solution uses the approximate starts,
ends and names given when available in the EPG.
The monochrome-frames-b ased solution ﬁrst computes the
intersection between silences in the audio TV stream and mono-
chrome frames in the video TV stream as in . Then, all the
detected intersections separated by more than 60 seconds are
considered as program segments.
We have also built a merged solution (Both.) that com-
bines the monochrome-frames-based detected breaks and the
repetition density-based detected breaks.
3.1. Evaluation of our solution on Cpub
The repeated sequence detection technique has ﬁrst been ap-
plied. A set of 477 repeated sequences has been discovered
with a total number of 2001 occurrences on Cpub . We have
counted 210 commercial repeated sequences with an average
number of 4.92 occurrences. We have also counted 34 trailers
with an average number of 7.41 occurrences.
Table 1 shows the obtained results on Cpub . This re-
sult shows ﬁrst that metadata-based solution is outperformed
by the other techniques. Then, the ratio of valid extracted
programs to the extracted programs (Ok/All) is almost the
same between our solution, monochrome-frames-based solu-
tion and the merged solution. As for the imprecision, our so-
lution is more accurate than the monochrome-frames-based
technique. This table also shows that our solution can be im-
proved by the use of silences and monochrome frames de-
The detection of the start is more accurate than the detec-
tion of the end because of the reliance on metadata which are
more accurate on the start times. We note that TV programs
that have generated the most imprecision have been due to
“Le tour de France” which is a live sport program for which
the metadata is completely wrong.
Ok/All 2s10s µ σ µ σ
Our. Sol 43/48 11 17 11.2s 11.3s 38.5s 134.1s
Monoch. 44/47 2 11 34.8s 55.6s 56.3s 102.8s
Both. 43/47 9 24 9.1s 8.7s 19.1s 90.9s
Meta. 46/47 0 0 186.1s 270s 439.4s 641.9s
Table 1. Evaluation results on Cpub .
3.2. Evaluation of our solution on Cpv
The repeated sequence detection technique has ﬁrst been ap-
plied. A set of 656 repeated sequences has been discovered
with a total number of 2679 occurrences on Cpv . We have
counted 316 commercial repeated sequences with an average
number of 5.72 occurrences. We have also counted 86 trail-
ers with an average number of 5.32 occurrences. This illus-
trates the main differences between private and public chan-
nels. Private channels tend to have more commercials which
are repeated more often. However, other non-commercial
inter-programs on private channels tend to be repeated less.
For TV stream segmentation, non-commercial inter-programs
have a greater impact on the imprecision.
Table 2 shows the obtained results. As non-commercial
inter-programs are repeated less than on Cpub, our solution
has been less effective than on Cpub. However, it is still better
than the metadata-based solution. Our solution merged with
the monochrome-frames-based breaks has also been the most
Ok/All 2s10s µ σ µ σ
Our. Sol. 56/58 2 15 46.7s 244.3s 80.3s 226.5s
Monoch. 56/58 2 9 30.6s 78.9s 104.9s 198.4s
Both. 56/58 11 23 12.9s 32.7s 61.8s 201.8s
Meta. 55/58 0 0 180.5s 120.1s 469.7s 289.1s
Table 2. Evaluation results on Cpv .
The obtained results on Cpub and Cpv show that automatic
TV program extraction is a complex problem. Our best results
show that about 45.6% of the TV programs have been effec-
tively extracted with an imprecision less than 10 seconds.
The successfully extracted TV progr ams have been mainly
prime time shows, movies and daily programs such series,
news, or games shows. The TV programs that have been ex-
tracted with a greater imprecision have been series inside a
succession of episodes. For two programs on Cpv , impreci-
sion has been due to the mis-detection of a sponsoring that
does not repeat in the stream.
This paper shows the importance of the repetition density
of inter-programs and how it can be used in a TV stream
segmentation process for TV program extraction. Experi-
ments show that the traditional approaches (metadata-based
or monochrome-frames-based) are not sufﬁciently effective
in order to perform an accurate TV segmentation. These can
be, however, greatly improved by our merged solution that
can achieve very accurate TV program extraction.
Future extension will study how our approach can be ex-
tended to be applied on-line, that is, how to segment the TV
stream on-line. This will require performing the repetition de-
tection on-line. We will also address how to remove breaks
The authors would like to gratefully acknowledge X. Naturel
for his help with the monochrome-frames-based solution.
 S.-A. Berrani, P. Lechat, and G. Manson, “TV broadcast macro-
segmentation: Metadata-based vs. content-based approaches,” in Proc.
of the ACM Int. Conf. on Image and Video Retrieval, Amsterdam, The
Netherlands, July 2007.
 X. Naturel, G. Gravier, and P. Gros, “Fast structuring of large television
streams using program guides,” in Proc. of the 4th Int. Workshop on
Adaptive Multimedia Retrieval, Geneva, Switzerland, July 2006.
 J. Oostveen, T. Kalker, and J. Haitsma, “Feature extraction and a
database strategy for video ﬁngerprinting,” in Proc. of the 5th Int. Conf.
on Recent Advances in Visual Information Systems, Hsin Chu, Taiwan,
 J.-P. Poli and J. Carrive, “Modeling television schedules for television
stream structuring,” in Proc. of ACM Int. MultiMedia Modeling Conf.,
Singapore, January 2007.
 R. Lienhart, C. Kuhmunch, and W. Effelsberg, “On the detection and
recognition of television commercials,” in Proc. of the IEEE Int. Conf.
on Multimedia Computing and Systems, Ottawa, Ontario, Canada,
 A. Albiol, M.J. Ch, F.A. Albiol, and L. Torres, “Detection of TV com-
mercials,” in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and
Signal Processing (vol. 3), Montreal, Quebec, Canada, May 2004.
 J. M. Gauch and A. Shivadas, “Finding and identifying unknown com-
mercials using repeated video sequence detection,” Journal of Com-
puter vision and image understanding, vol. 103, pp. 80–88, 2006.
 C. Herley, “Argos: automatically extracting repeating objects from
multimedia streams,” IEEE Transactions on Multimedia, vol. 8, no.
1, pp. 115–129, 2006.
 S.-A. Berrani, G. Manson, and P. Lechat, “A non-supervised approach
for repeated sequence detection in tv broadcast streams,” Signal Pro-
cessing: Image Communication, spec. iss. on ”Semantic Analysis for
Interactive Multimedia Services”, vol. 23, no. 7, pp. 525–537, 2008.
 G. Manson and S.-A. Berrani, “Tv broadcast macro-segmentation us-
ing the repetition property of inter-programs,” in Proc. of the IASTED
Int. Conf. on Signal Processing, Pattern Recognition and Applications,
Innsbruck, Austria, February 2009.