Content uploaded by Odej Kao
Author content
All content in this area was uploaded by Odej Kao on Feb 26, 2021
Content may be subject to copyright.
On Image Storage and Retrieval
O. Kao
Department of Computer Science, TU Clausthal
Julius-Albert-Strasse 4, D-38678 Clausthal-Zellerfeld, Germany
e-mail: okao@informatik.tu-clausthal.de
Abstract: This paper presents an overview over the current
technology for the realisation of image databases. The state of the art
approach uses a-priori extracted features and limits the applicability of
such databases, as a detail search for objects and for other important
elements in an image can not be performed. Therefore the idea of a
dynamic retrieval based on an on-line feature extraction and
processing is proposed and described by considering a simple
template matching operation as an example. The analysis of the
related memory, time and computational requirements of an image
database with such capabilities shows, that powerful parallel
architectures are necessary for the solution of the performance
problem. Therefore an implemented prototype for the realisation of an
image database on a Beowulf cluster is introduced in the paper.
1. Introduction
The development of the information technology in the 1990s is often described as a
multimedia revolution. Multimedia is the synchronised association of
• Time depend (dynamic) media: Video and audio sequences, graphic animations
• Time independent (static) media: Text, graphics and images.
Multimedia documents offer significant advantages in information representation and
communication. Further the capacities and the performance of I/O devices, memories,
networks, CPUs etc. are improving continuously and are nowadays widely available.
Nevertheless 90% of all information is still on paper. One of the reasons is the lack of reliable
methods for content analysis of the different media types, thus basic mechanisms and
technologies for the realisation of multimedia database management systems are not available
yet. Many existing relational and object-oriented databases handle multimedia objects as
BLOBs (Binary Large Objects) [1,2] and describe their content by a manually compiled and
limited set of keywords. The retrieval is then realised by a full text search in the assigned set
of keywords. A main disadvantage of this approach is given by the reduction of the complex
media content on a few keywords.
This paper focuses on image databases, as they build the main part of general multimedia
databases. Such systems for the archival and retrieval of images can be found in many areas,
for example medical applications, remote sensing, news agencies, authorities, museums etc.
Image databases contain general sets of images and allow a search for a number of images,
which are similar to a given sample image.
2. Image Description
A complete image description requires the extraction and analysis of various information
regarding the objects, persons, textures etc. in the scene. Subsequently a relationship between
the image objects and the real world entities has to be established, for example this image
shows President Clinton in Berlin together with chancellor Schroeder. The possible image
information can be classified into following groups:
• Raw image data is the matrix with the pixel colour values, which can be stored in
various file formats.
• Technical information describes the image resolution, number of used colours, file
format etc.
• Results of the image processing: this group encloses all extracted image features, like
objects and regions, statistical characteristics and topological data, which define the
spatial relationship between the image objects.
• Knowledge based information describes the relationship between the image elements
and the real world entities, for example who or what is shown on the image etc.
• World oriented information concerns usually photo time and date, photographer, etc.
The raw image data, the technical and world oriented information can be represented by well
known data structures and stored in existing databases. The user can either browse the
database (similar to a printed image catalogue for example in an art gallery) or search the
database by entering keywords like the name of the artist or the picture, date and other related
information. These are compared with the technical and world oriented information.
Examples for such databases with images are widely available on the Internet, for example as
a part of e-Business and entertainment sites, web presentations of museums, art galleries,
trademarks etc. as well as useful programs for the management and archiving of photo
collections (for example vacation, special occasions etc.).
These primitive retrieval and annotation methods have many disadvantages in terms of time-
intensive and difficult selection of appropriate keywords. Therefore an image database should
support improved querying, retrieval and annotation methods, which enable a content-based
similarity search in a general set of images.
Interfaces for the query of an image database include visual methods like query-by-pictorial-
example (QBPE), query-by-painting/sketching or standard methods like browsing in a given
set of sample images [3,4].
The state-of-the-art approach for the creation and retrieval of image databases is based on the
extraction and comparison of a-priori defined features. These are directly derivable from the
raw data and represent properties related to the dominant colours in the image and their
distribution, important shapes, textures and consider the global layout. The extracted features
can be combined and weighted in different ways resulting into logical (advanced) features,
which represent the image content on a higher abstraction level. New works in this area
consider the image semantics and try to define and integrate different kinds of emotions [5].
The similarity degree of a query image and the target images is determined by calculation of a
distance (for example Euclidian Distance L2) between the corresponding features. The result
is a sorted list with n distances, where n is the number of stored images. The first k elements
correspond to the most similar images in the database, thus the raw data of those images is
presented to the user as a retrieval hit.
Satisfactorily recognition rates are reported for a number of implemented image retrieval
systems and prototypes. The measurements are often performed on a small number of image
classes. Thereby the intra-class distance is small (for example landscape images, faces, etc.)
and the inter-class distance is large. Acceptable system response time are achieved, because
no further processing of the image raw data is necessary during the retrieval process resulting
into immense reduction of computing time. The easy integration in existing database systems
is a further advantage of this approach.
Extraction of simple features results in disadvantageous reduction of the image content.
Important details like objects, topological information etc. are not sufficiently considered in
the retrieval process, thus a precise detail search is not possible. Furthermore, it is not clear,
whether the known, relatively simple features can be combined in the right way for the
retrieval of all kinds of images. For the realisation of queries like "Show all images containing
the marked object X" more powerful methods with dynamic feature extraction are necessary.
Definition 1: Image Retrieval with Dynamic Extracted Features -- short dynamic retrieval --
is the process of analysis¸ extraction and description of any manually selected image
elements, which are subsequently compared to all image sections in the database.
An example for this operation is given by template matching. It is a simple and often used
method for searching of particular patterns/objects in images. The region of interest is
represented by a minimal bounding rectangle and subsequently correlated with all images in
the database. The colour values, texture features and contour form build the fundaments for
the similarity measurements. These attributes can be combined and weighted with heuristic
determined coefficients. In contrast to the previous approach image features and attributes
can't be a-priori defined and extracted, thus all images in the database have to be processed.
Figure 1 depicts this approach.
(a)
(b)
.
.
(c)
.
.
Figure 1: Example for image retrieval with dynamic extracted features: (a) Creation of a region of
interest; (b) Search for the region in an unsuitable image; (c) Person found in another image by
template matching
3. Computational complexity of dynamic retrieval
The dynamic feature extraction and the detail search increase significantly the time and
computational complexity for the query processing. In order to estimate these requirements
we assume the template matching method as a represent of the algorithms for dynamic feature
extraction. For each position the template window is compared with the corresponding image
region for example by calculating the differences between the colour values of the pixel. This
process is repeated with different sizes and rotation slopes for all positions and all available
images resulting into a list with the probabilities for the existence of the searched object in a
certain image. Thereby distortions caused by rotation and deviations regarding the size,
colours, textures and contour shape have to be considered. The maximal range for these
transformations as well as the stride depends on the concrete application. The wider the range,
the more complex, time and compute intensive is the template matching. For example the
number of operations Top for an image b with dimensions M x N and a template a with
dimensions Ma x Na amounts
( )( )
ScaleiNNMMNMT aaaaop −−
(1)
where Scale is the number of sizes and i the number of rotations to be considered. Processing
of a 1024x768 image with a 100x100 template with only two sizes and without rotation
necessities approximately 12344 million additions. The effort for processing of a 512x512
image with a 20x20 template amounts 193 million additions.
Summarising the analysis results it becomes clear, that the time, computation and memory
requirements of an image database with dynamic retrieval option exceeds the performance
capabilities of modern computer architectures with a single processing element (PE).
Therefore we examine the properties and the suitability of modern parallel computer
architectures for the realisation of dynamic image retrieval.
4. Parallel architectures for dynamic image retrieval
Parallel architectures represent key components of the modern computer technology with a
growing impact on the development of information systems. From the viewpoint of the
database community the parallel architectures are classified into [6]:
• Shared nothing architecture: the nodes are replicated, interconnected whole computers
• Shared everything architecture: all processing elements have equal access to the
memory and the I/O subsystem
• Shared disk architecture: in this case all nodes of the parallel computer share the I/O
subsystem.
Shared-nothing systems are usually used for the realisation of databases distributed over wide
area networks. Each node has a separate copy of the database management system and an own
file system. All operations are performed with the local data and the inter-node
communications is usually based on the client/server paradigm with conventional network
techniques. The insufficient bandwidth limits the parallel speedup.
Shared-everything systems are main platform for parallel database systems and the most
vendors offer parallel extensions for their relational and object-oriented databases.
Independent query parts are distributed over number of processing elements. Fast
communication and synchronisation as well as easy management are advantages of this
architecture class. Disadvantages concern the fault tolerance and limited extensibility of
shared-everything systems.
Shared disk systems host databases distributed in local area networks. They usually replace
shared-everything systems, which are not powerful enough for the performance requirements
of new database functions such as data mining. Another important issue of these systems is
the fault tolerance for critical applications. Each node has an own database management
system all connected over a shared file systems.
The important question is now, which computer architecture is suitable for the
implementation of dynamic retrieval. Images and other multimedia objects represent large
data blocks, thus communication between the I/O subsystem, memory and CPU is an
important factor. The applied operations may vary significantly regarding the processing time:
some of them like edge detection are relatively simple and require only a split second
computing time, whereas a template matching operation with many parameters necessities
more than a minute processing time per image. A retrieval process ends with a creation of a
hit ranking, thus a one-time unification of all sub results is necessary.
In the case of shared disk and shared everything architectures the I/O subsystem and the
transfer of the huge image data to the main memory might be a bottleneck. On the other hand
the workload distribution and the synchronisation can be realised easily and efficiently.
Cluster architectures have an advantage that each node has an own I/O subsystem, thus the
transfer effort is shared by a number of nodes. Moreover, the reasonable price per node
enables the creation of systems with a large number of processing elements. Open problems
concern workload balancing, synchronisation and data distribution as well as the general
cluster problems like missing single system image and the large maintenance effort. The next
section describes a prototype for image retrieval with dynamic feature extraction, developed
on Beowulf cluster architecture.
5. Dynamic image retrieval on cluster architecture
PFISTER defines in [8] a cluster as a parallel or distributed system consisting of collection of
interconnected whole computers, which are used as a single, unified computing resource. The
term "whole computer" is a stand-alone computer with CPU, memory and all other necessary
architectural components, operating system, etc. Furthermore clusters are built out of
commodity components. The nodes of a cluster system may be PCs, workstations or any
parallel architecture with some UNIX like operating system. The interconnection structures of
cluster architectures distinguish from parallel virtual machines, as usually high performance
networks, like Myrinet, SCI or ATM are used. Cluster nodes follow the message passing
paradigm, thus the bandwidth and latency of the network have an important impact on the
overall system performance. The best-known cluster is Beowulf, a multi computer
architecture usually consisting of one server and many client nodes connected via Ethernet or
some other network. It is a system built using commodity hardware components and it is
trivially reproducible. Beowulf uses software like Linux, PVM and MPI. The server node
controls the whole cluster and serves files to the client nodes. It is also the cluster's console
and gateway to the outside world [9,10].
Clusters of SMPs (CLUMPs) are interesting architecture for efficient image retrieval, because
they combine the advantages and disadvantages of two parallel paradigms: an easy
programmable SMP model with the scalability and data distribution over many nodes of the
architectures with distributed memory. Therefore we constructed a cluster-based prototype for
efficient, affordable image retrieval with dynamic feature extraction [11].
The hardware infrastructure for the realisation of our prototype is a Beowulf cluster with 8
Dual Pentium III 667 MHz computing nodes, 512 MByte of memory, connected with Myrinet
Network. The nodes are subdivided according to their functionality into following groups:
• Query stations host the web-based user interfaces for the access to the database and
visualise the retrieval results.
• Master node controls the cluster, receives the query requests and broadcast the
algorithms with the search parameter as well as the sample image and features to the
computing nodes.
• Computing nodes: each of these nodes contains a part of the image material and
executes the feature extraction and the comparison on the local data. The results are
sent to the media server.
• The media server has multiple functions. Firstly it is a redundant storage server and
contains the whole image database. Secondly it receives and compares the results of
all computing nodes and sends the k best hits to the user on the query station.
Figure 2 shows a simplified graphic representation of this cluster architecture.
Figure 2: Schematics of the proposed cluster architecture
The software for image retrieval consists of a graphical user interface, algorithms for feature
extraction, relational database system and components for the parallel extraction and
comparison of dynamic features.
The query stations contain various web-based interfaces for the communication with the
retrieval system. Standard querying concepts like query by pictorial example, by sketch,
browsing etc. and presentation methods for the visualisation of the retrieval results are
installed. The querying results may be used as an input for a new, more detailed search. The
relational database stores static, a-priori extracted features, like histograms and wavelet
coefficients together with the corresponding index structures. For the modelling and retrieval
of this information conventional data structures and techniques are sufficient. A component
called transaction manager analyses the queries. If only static features are considered, the time
intensive search of all images is replaced by a relatively simple next neighbours search on the
index structures.
Otherwise all images in the system must be processed. In this case the transaction manager
transmits the query to the distribution manager, which is installed on the clusters master node
and initiates the search on the computing nodes. The extraction algorithms with the querying
parameter and eventually a list of images to be processed are broadcasted to all cluster nodes
and received by the computing managers. A node i performs the image processing operations
on the local image set, determines the local ki best hits and sends these, e.g. characteristic
values, overlay degrees etc. together with the univocal image identifier to the result manager
running on the media server. The value of the constant ki depends on the concrete application
and the size of the image database and is usually set by the user. The media server combines
all results in order to select the k best hits of the global image set and passes finally the
retrieved images to the user on the query station.
Critical parts of the system are the partitioning of the image data and the distribution of the
workload over the nodes. Furthermore the communication effort between the nodes must be
minimised, as the transfer of large images is very time intensive. The simplest form of
workload distribution is achieved, when all nodes have approximately the same amount of
material, measured in Mbytes: each of the n nodes stores 1/n of the global material on the
local hard disks. An update component calculates the total store capacity used for each node
and sends a new image to the node with the minimal volume. In a dedicated cluster
architecture similar run times of each part can be expected resulting into an efficient retrieval.
Advantages of this strategy are the simple realisation and maintaining. Disadvantages are
given due to the fact, that each query necessities the processing of all images. A much better
approach can be realised, if the images are partitioned according to a certain feature or a set
thereof. This is not possible at the moment, because no reliable features for the separation of
different kind of images exist.
The advantages of the proposed cluster architecture are the combination of existing parallel
image methods for SMP architectures with a low communication effort and simple workload
distribution resulting from the subdivision of the images over the nodes. Furthermore the low
hardware costs and availability of Linux PCs with 2 or 4 processing elements could lead to a
wide acceptance of the architecture. Disadvantages result from the centralised master node
and media server. If one of these nodes fails, the whole system is going down.
6. Conclusions
This paper presents an overview over the current technology for the realisation of an image
database as an important subsystem of multimedia databases. The idea of a dynamic retrieval
based on an on-line feature extraction and processing is discussed and described by
considering a simple template matching operation as an example. The analysis of the related
memory, time and computation requirements of an image database with such capabilities
shows, that powerful parallel architectures are necessary for the solution of the performance
problem. In the last part of this paper a cluster prototype for the realisation of an image
database is discussed. The image data is distributed over all available nodes, which perform
the operation with the local data and send the results to the media server. This node defines
the k best hits and presents them to the user.
Our future work includes development of more flexible strategies for the workload
distribution and data placement. A database partitioning based on orthogonal features can
enable a workload concentration on a part of the nodes, thus several queries on different
images sets could be performed simultaneously. Shorter response times may be obtained, if
the images groups are subdivided and distributed over all nodes. Furthermore extensive
performance measurements have to be executed.
References
[1] S. Khoshafian, A. B. Baker: MultiMedia and Imaging Databases. Morgan Kaufmann
Publishers, 1996.
[2] W.I. Grosky, R. Jain, R. Mehrota (Eds.): The Handbook of Multimedia information
management. Prentice Hall, 1997, 365-404
[3] J. Ashley et. al: Automatic and semi-automatic methods for image annotation and
retrieval in QBIC. In Proceedings of Storage and Retrieval for Image and Video
Databases III, pp 24--35, 1995.
[4] O. Kao, I. la Tendresse: CLIMS - A system for image retrieval by using colour and
wavelet features. In Proceedings of the First Biennial International Conference on
Advances in Information Systems (ADVIS'2000), to be published
[5] A. Del Bimbo: Expressive Semantics for Automatic Annotation and Retrieval of Video
Streams, In Proceedings of the IEEE Conference on Multimedia & Expo, 2000
[6] W. Klas, K. Aberer: Multimedia and its Impact on Database System Architectures. In
P.M.G. Apers, H.M. Blanken, M.A.W. Houtsma (Eds.): Multimedia Databases in
Perspective, pp 31-62. Springer Verlag, 1997.
[7] A. Reuter: Methods for parallel execution of complex database queries. Journal of
Parallel Computing, Volume 25, pp 2177-2188, 1999.
[8] G.F. Pfister, In Search of Clusters: The ongoing battle in lowly parallel computing.
Prentice Hall, 1998
[9] T. Sterling, D. Becker, D.F. Savarese: BEOWULF: A Parallel Workstation for Scientific
Computation, Proceedings of the International Conference on Parallel Processing, 1995,
pp 11-14, Beowulf Web Site: http://www.beowulf.org/
[10] D.F. Savarese, T. Sterling: Beowulf. in R. Buyya (Edt.): High Performance Cluster
Computing - Architectures and Systems, Prentice Hall, 1999, 625-645
[11] O. Kao: Towards Cluster Based Image Retrieval, in Proceedings of Conference on
Parallel and Distributed Processing Techniques and Applications (PDPTA), 2000, pp
1307-1315, CSREA