A Two-Stage Approach to Detect Abandoned Baggage in
Bhargav Kumar Mitra, Waqas Hassan, Philip Birch, Akber Gardezi, Rupert Young,
Industrial Informatics Research Group
School of Engineering and Design
University of Sussex
Falmer, Brighton BN1 9QT.
Baggage abandoned in public places can pose a serious security threat. In this paper a two-stage approach
that works on video sequences captured by a single immovable CCTV camera is presented. At first, foreground
objects are segregated from static background objects using brightness and chromaticity distortion parameters
estimated in the RGB colour space. The algorithm then locks on to binary blobs that are static and of ‘bag’ sizes;
the size constraints used in the scheme are chosen based on empirical data. Parts of the background frame and
current frames covered by a locked mask are then tracked using a 1-D (unwrapped) pattern generated using a
bi-variate frequency distribution in the rg chromaticity space. Another approach that uses edge maps instead of
patterns generated using the fragile colour information is discussed. In this approach the pixels that are part of
an edge are marked using a novel scheme that utilizes four 1-D Laplacian kernels; tracking is done by calculating
the total entropy in the intensity images in the sections encompassed by the binary edge maps. This makes the
process broadly illumination invariant. Both the algorithms have been tested on the iLIDS dataset (produced
by the Home Office Scientific Development Branch in partnership with Security Service, United Kingdom) and
the results obtained are encouraging.
Keywords: computational model, chromaticity space, edge detection, entropy
A lot of work has been reported in the video-surveillance literature on detecting abandoned objects in public
places. Some of these works are based on single-camera vision [1-3] and the rest relies on multi-camera tracking [4-
8]. In general, the authors of these papers have tried to develop fully automatic systems capable of analysing the
scene, and pointing out suspicious events like bags being left for long periods of time by their owners. However,
a busy scene, change of illumination and strict real-time constraints seriously jeopardize the proper functioning
of such systems. It has to be understood that given the current day infrastructure, it is nearly impossible to
have an fully automatic intelligent system that can analyse a real-life scene and identify suspicious activities
with 100% accuracy. In this paper a practical two-stage approach to detect abandoned bag(s) in public areas
like underground tube stations is described. The approach serves two purposes: (a) to aid personnel monitoring
several screens to pick up suspicious activities, and (b) to reduce the number of false detections caused by change
of illumination. In other words, the algorithm detects and locks onto bag-size stationary blobs in the frames,
and subsequently calls for human intervention to take a decision and accordingly some actions. The approach in
its first stage segments foreground objects from the background using a computational model that works in the
RGB colour space . The computational model is used as it not only can detect foreground objects but also
can eliminate moving shadows (false targets) ,. It should, however, be mentioned that the algorithm works
well on Lambertian surfaces; in practise, it has been noted that it gives rise to false segmentations as in a real-life
scene the objects are far from perfect. The algorithm then locks onto bag-size objects that remain stationary for
a few frames. To verify whether the locked blob is an actual target blob or not, a 1-D pattern is generated from
Further author information: (Send correspondence to Bhargav Kumar Mitra)
Bhargav Kumar Mitra: E-mail: B.K.Mitra@sussex.ac.uk, Telephone: +44 1273 877207
the bi-variate frequency distribution in the rg chromaticity space. This step of the algorithm only makes use
of the colour information by discarding the brightness information; inclusion of this 1-D pattern matching step
makes the overall approach broadly illumination invariant. However, the algorithm fails badly if someone enters
the scene carrying a bag which has a colour close to that of the background. To have a method that does not
depend on fragile colour information and is broadly immune to illumination change, an edge-map based entropy
estimation method is considered as the second stage of the approach. The edges of the image encompassed by
the locked binary mask are marked using a novel edge detector that works using four 1-D Laplacian kernels.
The fact that the entropy of the background frame covered by the edge map will be different from that of the
current frame in the presence of a foreground object is intuitive. The edge detector picks up high frequency
spatial components from the images sectioned out by the mask and this makes the process, to a great extent,
illumination invariant. The two versions of the two stage approach have been tested on the UK Home Office
iLIDS dataset; the results validate the practical usefulness of the scheme.
The rest of the paper is organized as follows: Section 2 briefly describes the computational model that is used in
the first stage of the algorithm to segment the true targets (foreground objects and no moving shadows) from the
background; Section 3 elaborates how the 1-D pattern is generated from the bivariate frequency distribution in
the rg chromaticity space to do colour content based detection and tracking; Section 4 describes the novel edge
detector, and talks about edge-map entropy based matching method used in the algorithm; results are shown
and discussed in Section 5, and finally conclusions are drawn in Section 6.
2. COMPUTATIONAL MODEL
The computational model employed works in the RGB colour space and discriminates a pixel as a foreground
pixel based on brightness and chromaticity distortions of the pixel with respect to the corresponding pixel in
the background frame. The model also exploits the fact that a shadow can be considered as ‘a semi-transparent
region in the image, which retains a representation of the underlying surface pattern, texture or colour value’ to
eliminate the shaded regions from the segmented blobs [9-12]. If Ei,jis the expected vector (Ei,j=?¯ ri,j¯ gi,j¯bi,j
for the (i,j)thpixel, and if Ii,j is the current vector (I(i,j) =?rc
distortion, ρi,j, and chromaticity distortion, θi,j, are estimated as follows:
?) for the same pixel, then brightness
ρi,j=| Ei,j| −?Ii,j, ˆ ei,j?(1)
| Ei,j|| Ii,j|
where, in equation (1), ˆ ei,jis the unit vector in the direction of Ei,j.
A pixel Λi,j is then treated either as a foreground pixel Λf
according to the following rule:
i,j, or a shadow pixel Λs
i,j, or a background pixel Λb
if (ρi,j> τb) ∩ (θi,j> τθ)
if (ρi,j> τb) ∩ (θi,j< τθ)
where, in equation (3), τb, and τθare the brightness distortion and chromaticity distortion thresholds respectively;
note that the thresholds have been coarsetuned using techniques discussed in , and then fine tuned interactively.
It should also be noted that shadow pixels (false moving targets) are eventually replaced by the corresponding
background pixels, i.e.
i,j, if Λi,j= Λs
The described method is applied to generate binary masks containing all the true foreground targets. The
algorithm then locks onto ‘bag-sized’ stationary blobs on the generated mask. Note thresholds to identify ‘bag-
size’ blobs have been selected based on empirical data. It should also be mentioned that the computational
model has been developed to work on ‘Lambertian’ surfaces; hence, it generates false segmentation results
because surfaces in real-life scenes are far from perfect. In addition to this, change of illumination stymies the
working of the first stage of the algorithm. Also, note here that studies based on empirical data have shown
that brightness distortion estimates play a greater role than chromaticity distortion estimates in segmenting the
foreground scene from the background . This makes the first stage of the approach more vulnerable to change
To get over the above mentioned problems, i.e., to reduce the number of false detections, a broadly illumination
invariant method has been conceived as the second stage of the algorithm. The second stage, in turn, can be
implemented in two different ways; the method that works in the rg chromaticity space by taking the brightness
factor out of the RGB triplet is described next.
3. GENERATING A 1-D PATTERN FROM THE BIVARIATE FREQUENCY
DISTRIBUTION IN THE rg CHROMATICITY SPACE
The r and g values for the chromaticity space are obtained by dividing the first two elements of the RGB triplet
by R + G + B, i.e.
R + G + B
R + G + B
g = (6)
Such an operation takes away the brightness information from the elements of the RGB triplet; what is left is
the proportion of the colour content in r, g and b. Also, note that the transform is irreversible, i.e. from rgb
values we cannot get back the RGB triplet. Since the r, g and b values obtained in this way add upto 1, one
element in the rgb chromaticity space can be dropped and a bi-variate frequency table generated using the other
two; we dropped b while developing our algorithm.
The individual elements in the chromaticity space have a range: [0,1]; in the adopted approach, the r and g values
are scaled up by multiplying each of them by 255. Eight equal sized bins are then drawn for r and g respectively,
and a bi-variate frequency distribution is generated by filling up each of the 64 cells of the distribution by the
corresponding frequency values. The 64 cells of the distribution are then encoded using a 6-bit binary pattern.
A 1-D pattern is then generated by plotting the frequency values of the cells against an increasing 6-bit binary
pattern (signatures of the cells).
It should be kept in mind that taking the brightness component from the elements of the RGB triplet makes
the approach broadly illumination invariant. However, it has been observed that the method is insensitive to
inter-class colour variation. Simply, it generates a different pattern if the background is grey in colour, and the
foreground object is coloured. It fails to discriminate between different shades of the same colour. That the
method will fail if applied on many practical scenes has to be accepted, and so will any method based on fragile
Another way to make the second stage of the approach illumination-invariant will be to pick up the high fre-
quency components since illumination change from an omnidirectional source in a bag-sized region of a frame
can be roughly modelled as a blob, and hence would contribute to low frequency points. The high frequency
components of the image encompassed by a locked bag-sized mask are picked up using a novel edge detector
that utilizes four 1-D Laplacian kernels. The next section briefly describes the edge detector, and how edge-map
based entropy estimate has been used to reduce the false detection rate, and track the abandoned object.
4. EDGE DETECTION AND TRACKING USING EDGE-MAP BASED ENTROPY
A novel edge detector has been developed that picks up the high frequency components of the image covered by
a locked binary-mask. The window of the edge detector comprises 1-D Laplacian kernels in the four directions as
shown in Fig 1. It can be argued that if an edge exits in one of the four directions spanned by the 1-D Laplacian
kernels, then the absolute convolution sum in the other directions will return a high value. The maximum value
returned by the sub-windows can be checked against a threshold, and if the value is higher than the chosen
threshold the corresponding pixel can be marked as an edge pixel. Going by the same reasoning, it also becomes
obvious that in a flat image region the outcome of each of the kernels will be 0, or a value close to 0, and then
the output of the edge detector will also be 0. In short, a 5×5 edge detector with four Laplacian sub-windows is
scanned through the image covered by the locked binary mask. The maximum of the four absolute convolution
sums is determined, and if the maximum value is more than the chosen threshold, the corresponding pixel is
marked as an edge pixel. Is should be mentioned here the advantage of using the described edge detector is
the fact that the same architecture can be used to detect impulses in an impulse-noise corrupted image. In
fact, Zhang and Karim in  have used a similar structure, and a minimum finding operation to switch the
standard square median filter to denoise images contaminated with impulse noise. It should, however, be pointed
out that use of Laplacian kernels results in multi-pixel thick edge lines [15-16]. In addition to this, it is also
known that second differential operators loose the sense of edge direction, and are extremely sensitive to noise
[15-16]. The sensitivity of these operators, though, can prove to be advantageous in some situations; such second
differential operators can be used to pick up weak edges in any particular direction. It should also be noted that
zero-crossings cannot be utilized to localise the edges, as the filter window is broken up into 4 unidirectional sub-
windows, and because of the fact that finding out the maximum of the sub-filter-window output is a non-linear
operation. Integer arithmetic has been used in the developed edge detection algorithm to restrict the spread of
the edge-lines in some cases.
Edge-maps are generated for those parts of the expected background and current frames that are covered by a
Figure 1. The four one directional sub-filter windows of the edge detector.
locked mask. Edge-map based entropies are determined for both frames to verify whether the segmented blob
actually contains a foreground object or not; this reduces false segmentations that are, by and large, caused by
Figure 2. Camera view of a London underground tube station platform.
change of illumination. If the entropies are different then the two entropies are registered for future tracking.
For every successive frame edge-map based entropy is estimated for the same region covered by the locked mask.
If the estimated entropy matches with the entropy of the foreground object, then it is assumed that the object
(abandoned bag) is still there. If it matches with that of the background frame, then it is assumed that the
object is removed. A fuzzy-state has also been considered in the algorithm. If the edge-map based entropy of
the current frame neither matches with that of the abandoned bag-sized foreground object nor with that of the
background, then it is assumed that the camera-view is obstructed, and human intervention is called for. This
step ensures that the approach does not loose the abandoned object even if the view is obstructed with people
standing in front of the bag. A first level alarm is generated once the entropy of a foreground object along with
the entropy of the background (for the same region) are registered. A more serious level of alarm, that calls for
human intervention, is generated if the object is not removed for 60 sec.
As mentioned earlier, the two step approach has been tested on all the sequences of the Imagery Library for
Intelligent Detection Systems (iLIDS) database. Fig. 2 shows a typical expected background scene (underground
tube station in London) and Fig. 3 a busy scene with people waiting on the station, and a train approaching.
Also included in Fig. 3 is the window showing the scene segmented using the computational model described in
Section 1. Fig. 4 shows an abandoned bag being registered after an edge-map based entropy check against that
for the same region in the background. A serious level of alarm that calls for human intervention is generated if
the object is left for more than 60 sec; this is shown in Fig. 5. Fig. 6 illustrates the fuzzy-state when the camera
view is obstructed by a person walking in front of the bag; note the algorithm understands the fuzzy condition,
and does not loose the track of the bag. Finally, it should be mentioned that the algorithm has been tested on
all the sequences of the iLIDs dataset; the results validate the practical use of the two-step approach.
The paper describes a practical two-step approach to detect abandoned objects in moderately busy public places.
The method first segments a scene using a computational method, and then locks onto stationary bag-size blobs.
It then uses an edge-detector to pick up the high frequency components from the background and current frames
covered by the locked mask. Edge-map based entropy estimation is then used to verify whether the segmented
blob actually corresponds to a foreground object (e.g. a left bag) or not. If the answer is affirmative, the
Figure 3. A moderately busy scene of a London underground tube station; the small window shows the binary mask of
the scene segmented using the first stage of the algorithm.
Figure 4. A static object (abandoned bag) is registered after applying the edge-map based entropy matching process.
Figure 5. A serious alarm is generated after tracking the abandoned bag for 60 seconds.
Figure 6. The track of the registered abandoned object is not lost even if the view is obstructed.
registered foreground object is tracked using the edge-map based entropy matching method. Obstructions in
camera view is considered as a fuzzy-state, and the track of the object is never lost. This stage of the approach
makes the overall method broadly illumination invariant and, thus, reduces the number of false alarms that other
available methods that do not accommodate illumination variance generate.
1. S.-N. Lim and L. Davis, “A one-threshold algorithm for detecting abandoned packages under severe oc-
clusions using a single camera,” Tech. Rep. CS-TR-4784, University of Maryland, College Park, CS Dept.,
2. F. Porikli, “Detection of temporarily static regions by processing video at different frame rates,” in Proc.
IEEE International Conference on Advanced Video and Signal-Based Surveillance, (London,UK), September
3. Y. Tian and R. Feris and A. Hampapur, “Real-time detection of abandoned and removed objects in Complex
Environments,” in Proc. IEEE International Workshop on Video Surveillance in conjunction with ECCV’08,
(Marseille, France), 2008
4. E. Auvinet, E. Grossmann, C. Rougier, M. Dahmane, and J. Meunier, “Left-luggage detection using homo-
graphies and simple heuristics,” in PETS, pp. 51–58, 2006.
5. S. Guler, J. A. Silverstein, and I. H. Pushee, “Stationary objects in multiple object tracking,” in Proc. IEEE
International Conference on Advanced Video and Signal-Based Surveillance, (London, UK), September 2007.
6. P. T. N. Krahnstoever, T. Sebastian, A. Perera, and R. Collins, “Multiview detection and tracking of
travellers and luggage in mass transit environments,” in PETS, pp. 67–74, 2006.
7. J. M. del Rincn, J. E. Herrero-Jaraba, J. R. Gomez, and C. Orrite-Urunuela, “Automatic left luggage
detection and tracking using multiple cameras,” in PETS, pp. 59–67, 2006.
8. K. Smith, P. Quelhas, and D. Gatica-Perez, “Detecting abandoned luggage items in public space,” in PETS,
pp. 75–82, 2006.
9. B. K. Mitra, P. Birch, I. Kypraios, R. Young, and C. Chatwin, “On a method to eliminate moving shad-
ows from video sequences,” in Proc. SPIE Photonics Europe- Optical and Digital Image Processing, 7000,
pp. 700012–1:9, (Strasbourg, France), April 2008.
10. B. K. Mitra, R. Young, and C. R. Chatwin, “On shadow elimination after moving region segmentation based
on different threshold selection strategies,” Optics and Lasers in Engineering 45, pp. 1088–1093, November
11. T. Horprasert, D. Harwood, and D. L. S., “A statistical approach for real-time robust background subtraction
and shadow detection,” in Proc. IEEE International Conference on Computer Vision (‘99 FRAME-RATE
12. P. L. Rosin and T. Ellis, “Image difference threshold strategies and shadow detection,” in Proc. of the sixth
British Machine Vision Conference, pp. 347–356, 1995.
13. G. Bradski and A. Kaehler, Learning Open CV, O’Reilly, USA, 2008.
14. S. Zhang and M. A. Karim, “A new impulse detector for switching median filters,” IEEE Signal Processing
Letters 9, pp. 360–363, 2002.
15. J. C. Russ, The Image Processing Handbook, CRC Press, Canada, fifth ed., 2007.
16. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley Longman Publishing Co., Inc.,
Boston, MA, USA, 1992.