Content uploaded by Gerhard Joubert
Author content
All content in this area was uploaded by Gerhard Joubert on Jan 20, 2014
Content may be subject to copyright.
A System for Analysis and Presentation of
MPEG Compressed Newsfeeds
Guido FALKEMEIER, Gerhard R. JOUBERT and Odej KAO
Technical University of Clausthal, Department of Computer Science,
Julius-Albert-Str. 4, 38678 Clausthal-Zellerfeld, Germany
Tel: +49 5323 727-140; Fax: +49 5323 727-149;
Email: guido.falkemeier@informatik.tu-clausthal.de,
gerhard.joubert@informatik.tu-clausthal.de, odej.kao@informatik.tu-clausthal.de
Abstract1: In this paper an iterative algorithm for extraction of key frames from
digitised MPEG-1 compressed newsfeeds is introduced. Newsfeeds consist of as-
sembled news clips transmitted by news agencies. These are usually recorded in ana-
logue form by subscribing TV stations. Editors analyse the video material through a
sequential and time intensive search in order to select sequences which can be used
in a newscast. Digital video techniques can enhance this process. In order to find
changes in information run the video stream is examined with a multi-level algo-
rithm resulting in a key frame index of each clip. A key frame represents a sequence
of similar frames and a set of key frames gives an overview of clip content. Key
frames can be combined with additional information obtained from, for example, the
internet. News editors can interactively navigate through a whole day’s material of
all the newsfeeds recorded.
1. Introduction
Television stations receiving newsfeeds from various news agencies, such as Reuters or
APTN, are confronted with a very large video processing task. Each newsfeed consists of a
large number of assembled news clips.
Processing and selecting news items from a feed is usually done by copying the videos
received via satellite onto conventional tapes. These are then manually scanned by editors
for particular news items. Selected items are subsequently copied and assembled for news
casts. In view of the increasing volume of material to be scanned and the limited number of
professional workstations and copies of newsfeed videos available, it becomes increasingly
difficult for editors to execute their task satisfactorily with present equipment and proce-
dures. Digital processing of videos could facilitate this process. Editors could, for example,
work from their offices using desk computers to extract and view material stored on a cen-
tral server. This reduces the need for expensive specialised professional video processing
stations.
In order to handle the large data volumes in a client/server environment the analogue
videos must be digitised and then compressed into, for example, MPEG-1 format. This
format does not supply broadcast quality material. Thus the newsfeeds are recorded both in
analogue broadcast quality and MPEG-1 compressed form.
Although compression of the digitised video’s greatly reduces the storage and commu-
nication requirements, it creates serious difficulties regarding the processing of newsfeeds.
It proves too time and space consuming to decompress the videos in order to find a particu-
1 This work is part of a joint research project with a commercial television station.
lar news item in the data stream. It is thus desirable to have methods available which can
detect the beginning of a particular news item in compressed video data. Subsequently only
that particular clip need be decompressed and viewed by the editor.
The system developed as part of this work—ClaViPS—extracts data from a compressed
video stream and supplies editors with a list giving the start and end points and a set of key
frames of each news clip. This list is used to select, decompress and view particular news
clips. In addition the time code of each frame can be extracted, thus facilitating the frame
accurate selection and assembly of the final news cast from the broadcast quality copies.
Although the availability of digitised videos on a network seems to offer great advan-
tages to editors, this is not necessarily the case in practice. The work flow practices of edi-
tors and their relationship with technicians executing the final assembly of news casts are
directly affected by the digital video processing system. One aspect is, for example, that
analogue material can be viewed at high speed. This is not possible with digitised videos
made available on an affordable computer network. On standard networks the “high speed”
playing of a video implies that only every n-th frame is displayed. This creates a display
which is unacceptable to news editors. It is thus essential that additional instruments are
made available to editors in order to allow them to execute their task at least as efficiently
and effectively as is the case with the traditional systems. In order to achieve this it is
essential to supply editors with the following information as a minimum:
• Start frames of the video clips contained in a newsfeed;
• Key frames of each video clip, and
• A (textual) summary of the news items contained in a continuous sequence of video ma-
terial.
In order to understand the information extraction process it is necessary to understand
the structure of a newsfeed. A feed consists of a continuous stream of news items. The in-
dividual items are separated by a special frame sequence. This sequence can by divided into
three sub-sequences. The first consists of only black frames followed by a few frames giv-
ing some information of the next news item in the newsfeed (usually title, date, duration).
These frames are again followed by a black frames until the subsequent news item starts.
The MPEG-1 compression method, used in the case of the videos considered in this pa-
per, has been standardised and the reader is referred to the literature [2]. To summarise: a
MPEG compressed video consists of three different frame types. The intra frames (I-
frames) are “base” frames which are directly compressed. I-frames can thus also be decom-
pressed without reference to other frames. The predicted frames (P-frames) and bi-
directional predicted frames (B-frames) are referenced to I-frames before compression.
They can thus not be directly decompressed. I-frames are selected at regular intervals. The
MPEG standard allows for a certain freedom of choice. In the compressed videos consid-
ered here an I-frame is selected every 12 - 16 frames.
2. A new method for detecting hard scene cuts
In this paper a new method for detecting hard scene cuts in compressed videos is presented.
Hard scene changes or cuts are the most common form of changing from one scene to an-
other in video's. Soft scene changes—where one scene is gradually faded out and the next
faded in—require special effort and is rarely used by reporters in news clips.
A scene cut detection method should both be fast, accurate and complete. The iterative
method proposed here consists of three phases. The first phase comprises a fast global
analysis, which is then refined in subsequent phases to determine the exact cut position.
This results in a considerable speed advantage over presently available methods. The
method always terminates as the ISO-standard for MPEG compressed videos is adhered to.
The global analysis of the first phase considers I-frames only. Two neighbouring I-
frames are compared and, if a scene cut is detected, the method switches to the second
phase. During this phase only the P-frames enclosed by two I-frames are analysed in order
to obtain a closer determination of the position of the scene cut. This results in two
neighbouring frames—for example an I- and a P-frame—which contain the scene cut. Dur-
ing the third phase the included B-frames are analysed in order to determine the exact posi-
tion of the sought frame.
Figure 1: Multi-phase approach
In Figure 1 the iterative process is depicted. The analysis of the I-frames during the first
phase shows that a scene cut exists between the second and third I-frames. The second
phase is then entered and the P-frames between these two I-frames are analysed. This re-
sults in the indication that a scene cut exists between the first and second P-frame. During
the third phase an analysis of the B-frames determines the exact position of the cut.
The process remains the same if the scene cut is located on an I- or a P-frame. In order
to determine a cut located on a P-frame all three phases are executed, with both the first
and second phases signalling the existence of a scene cut. The third phase, however, fails
to determine the position of a cut amongst the enclosed B-frames. This means that two
possibilities exist:
1. The scene cut is located on the first B-frame, or
2. The scene cut is located on the second P-frame.
In the first case all B-frames will mainly reference the following (second) P-frame, as
they are all part of the new scene. On the other hand, if the second case is true, the B-
frames will mainly reference the first P-frame, as they then form part of the previous scene.
Thus, in order to determine at which P-frame the scene cut is located, the references of the
B-frames must be analysed.
If a scene cut is detected during the first, but not the second, phase the cut must be lo-
cated on the I-frame or the B-frames following the last P-frame.
The main advantage of the new approach proposed here as compared to existing meth-
ods is the higher compute speed. This results from the considerably smaller number of
frames which must be investigated. In spite of this there is no additional loss of informa-
tion due to the fact that within a MPEG compressed video approximately every 12th-16th
frame is an I-frame. It is highly unlikely that—at least in the case of newsfeeds—more than
one scene cut occurs within this time frame of just over half a second.
2.1. Scene cuts within the I-frames
The I-frames contain the complete image information as transformed by the DCT. The de-
tection of a scene cut is thus based on the analysis of the DC [11] components of all 8×8
1. Phase
2. Phase
3. Phase
Scene cut
blocks of the intra-coded images.
For a video, V, and the contained I-frames the following parameters are defined:
N
The number of I-frames in V
k
DCI The ordered set of DC components of the k-th I-frame. The DC components
are ordered according to the sequence of the associated 8×8 blocks.
i
k
DCI The sequence of all k
DCI of V.
The existence of scene cuts is determined by analysing the sets k
DCI and 1+k
DCI :
∑−=
+
nm nmnmkk yxA
,,,1,
with knm DCIx ∈
,, 1, +
∈knm DCIy , 1,...,1 −= Nk and m, n the number of rows and col-
umns respectively. The result of this analysis is a sequence of difference images between
the two sets k
DCI and 1+k
DCI for all I-frames contained in V. A number of different norms
[5 - 7] are available for calculating the differences. Many of these are very compute inten-
sive. The norm proposed here requires relatively little compute effort and has been tested
with success in the case of compressed videos [9 - 11].
If a scene cut exists between two I-frames the calculated distance between the two must
be large compared to the average distance between I-frames. In practice a scene cut spans a
number of I-frames and it is thus reasonable to assume that within a sequence of difference
images a large value will be bound by comparatively small values. In the case of a scene
cut between k
DCI and 1+k
DCI the following must hold:
1.
{}
4,9.3,...,1.1,1 ,
1
with,
1
1
1,
1, ∈
−
=⋅>
∑
−
=
+
+x
N
A
AAxA
N
n
nn
avgavgkk
2.
{} { }
4,9.3,...,1.1,1 ,3,2,1for ,
2
,
1,
1, ∈∈⋅>
∑
+
≡−=
+
+yl
l
A
yA
lk
knlkn
nn
kk
The values of the parameters x, y and l are determined through an adaptive correction
[4]. The first condition detects images which show up a greater difference than their neigh-
bourhood. This formula ensures a greater robustness than if the simple maximum norm
was used. The second condition ensures that no additional large difference images exist in
the direct vicinity of the frame considered. This prevents the false interpretation of move-
ment as scene cuts.
2.2. Scene cuts within the P- and B-frames
Referencing macro blocks can be used for the detection of scene cuts in the case of P-
frames [4] in addition to the intra-coded blocks. In the referencing macro blocks up to five
88 × blocks can be intra-coded. If a scene cut exists between the P-frame considered and
the reference frame a large number of macro blocks will be completely intra-coded and
many referencing macro blocks will be partly intra-coded. This results from the big differ-
ence between the P-frame and the reference frame. In summary a large number of 88 ×
blocks will be intra-coded. The number of these blocks can thus be used as a measure to
indicate the existence of a scene cut.
In cases where extensive movements are depicted in a scene this method may break
down. This is due to the fact that in such situations the differences between the reference
and the P-frames within the scene are relatively large. A clear distinction between lots of
movement and scene cuts cannot be made. This results in the possible non-detection of
scene cuts within or bordering on such scenes. In order to achieve a higher degree of ro-
bustness of the method the average of all differences, excluding the maximum value, is
used.
In the case of the B-frames it does not make sense to base the scene cut detection on the
number of intra-coded blocks. This is due to the fact that B-frames can use a preceding or
subsequent P- or I-frame within the same scene as reference. This results in the generation
of a very small number of intra-coded blocks.
One possibility to work around this problem is to calculate the difference between the
number of forward and backward reference blocks for each frame investigated. This results
in a sequence of numbers in which a sign change indicates that a scene cut may be in-
cluded in the frame sequence. This is not a sufficient condition as sign changes may also
result from rounding errors [10].
In addition to the construction of the sequence of differences a second norm is thus in-
troduced. For each frame the number of macro blocks which contain references to both
preceding and subsequent reference frames is determined. This number is then divided by
the total number of macro blocks in the frame. A small value of this quotient indicates the
existence of a scene cut. The combination of these two metrics was shown to greatly in-
crease the accuracy with which scene cuts can be detected.
This algorithm was tested on a number of newsfeeds from various news agencies. In the
case of Reuters’ newsfeeds, for example, the methods described succeeded in completely
segmenting the newsfeed into individual clips. Within each clip at least 80% of scene cuts
were detected. This gives news editors a good overview of the clips and scenes contained
in a feed.
3. ClaViPS: A presentation system for newsfeeds
The iterative algorithm described above forms the basis for the ClaViPS-System which
analyses and presents MPEG-1 compressed newsfeeds over a computer network. With this
system a TV news editor can quickly obtain an overview of recently received news mate-
rial. This information can be combined with additional related data from a database or a
search of the internet. The information is presented in the form of a HTML page on the
editor’s workstation.
At present the system comprises the following five components:
1. Detection of start and end frames of each video clip in the newsfeed;
2. Detection and extraction of key frames from each clip scene;
3. Detection, extraction and analysis of text frames inserted between clips with a specially
developed OCR system;
4. Key word search in local databases, news paper archives and the internet;
5. Visualisation of the collected information on individual workstations.
In the first step the MPEG compressed newsfeeds are separated into clips by an algo-
rithm which identifies black separation frames between individual contributions. The algo-
rithm [8] works directly on the compressed data, thus obviating a decompression proce-
dure. This enables real time processing on standard PC’s connected to a network. A frame
within the black frame sequence and which contains textual information about subsequent
clip content is also detected, extracted and saved. The textual information is then extracted
with an OCR method, which was developed for recognising text in low quality images.
The key words extracted are used as input for the integrated search tools.
Subsequently the proposed iterative algorithm is applied in order to detect and extract
key frames—i.e. representative images of scene content—from the individual clips. A key
frame sequence gives a quick overview of clip content and enables the news editor to de-
cide which clips he wishes to select for closer examination. This is based on the assump-
tion that scene content can usually be represented equally well by any frame from a scene.
Thus any I-frame within a scene can be selected and decompressed. The advantage of this
approach is that I-frames are self contained and can be decompressed with little effort as
no references to other frames need be considered. One key frame is generated for each
scene detected in the clip.
Figure 2: Typical page for a news item in ClaViPS
The key word search is executed in two stages. During the first stage a search is exe-
cuted to find topical information relating to the theme. In addition to current comments ear-
lier news items or information pertaining to similar cases can be used to enhance a particu-
lar piece of news. The second stage supports the archiving of newsfeeds. The available in-
formation to be archived can, for example, be enhanced by adding news paper articles and
comments. Thus additional key words which can be used for search operations are ob-
tained.
The compiled descriptions of news items are collected and stored in a central database
and presented on the distributed workstations of the respective news editors by a HTML
based system developed for this purpose.
The start page of this system gives an overview of the available video material, sorted
according to the times when the individual items were received. The attached link refers to
an index page which lists the individual news items together with short content summaries.
The editor can select interesting items from this list. Each item on the list has a separate
WWW page assigned to it. This contains title, location, time, language, key words, ex-
tracted key frames, additional information as well as a link to the MPEG compressed video
clip. Figure 2 shows such a sample page.
4. Conclusions
In this paper an iterative algorithm which reliably extracts information from compressed
newsfeeds is presented. Working directly on the compressed material enables the imple-
mentation of the methods on standard computer networks due to the low demands made on
computing and communication performance. The proposed method is already in regular
use.
Future investigations will be directed at generalising the detection methods to enable their
application to video material compressed using MPEG-2, etc. Furthermore, it is to be inves-
tigated whether the additional use of statistical methods can improve the detection accuracy
by reducing the sensitivity to changes in the characteristics of the original material.
The possible reduction in storage requirements as well as support for the archiving of vid-
eos are further areas of future investigation.
References
[1] S. Geisler and U. Luengjiranothai, Automatische Erkennung von Themenwechseln in MPEG kodierten
Videos, Studienarbeit an der TU Clausthal, February 1998.
[2] ISO/IEC 11172-1 (2,4), Information technology --- Coding of moving pictures und associated audio for
digital storage media at up to about 1,5 Mbit/s ---, Part 1 (2,4): Systems (Video, Compliance Testing),
1993.
[3] B-L. Yeo and B..Liu, Rapid scene analysis on compressed video, IEEE Transactions on circuits and
systems for video technologie, vol.5, December 1995, pp. 533-544.
[4] G. Falkemeier, Speicherplatzreduzierung und Informationsanalyse von MPEG-komprimierten Videos,
Papierflieger Verlag, 1998.
[5] R.M. Ford, C. Robson, D. Temple and M. Gerlach, Metrics for scene change detection in digital video
sequences, Proceedings IEEE International Conference on Multimedia Computing and System 97, pp.
610-611, 1997.
[6] I.K. Sethi, and N. Patel, A statistical approach to scene change detection, Proceedings of the Society of
Photo-Optical Instrumentation Engineers, vol. 2420, pp. 329-338, 1995.
[7] F. Irrgang, Schnitterkennung in digitalisierten Filmsequenzen, Diplomarbeit an der TU Clausthal, 1996.
[8] G. Falkemeier, G.R. Joubert and O. Kao, Analysis and Processing of Compressed Newsfeeds, in J-Y.
Roger, B. Stanford-Smith, P. T. Kidd: Technlogies for the Information Society: Developments and Op-
portunities, IOS Press, 1998
[9] K. Shen and E. Delp, A spatial-temporal parallel approach for real-time MPEG Video compression,
Image Compression and Graphics, pp. 100-107, 1996.
[10] K. Shen and E. Delp, A fast Algorithm for Video parsing using MPEG compressed Sequences, Proceed-
ings International Conference on Image Processing, IEEE Computer Society Press, pp. 252-255, 1995.
[11] B.L. Yeo and B. Liu, Rapid scene analysis on compressed video, IEEE Transactions on circuits and
systems for video technologie, vol.5, pp. 533-544, December 1995.