Conference PaperPDF Available

A Multi-Sensor System for Mobile Services with Vision Enhanced Object and Location Awareness

Authors:
  • TTTech Computertechnik AG

Abstract and Figures

We describe a system which proposes a solution for multi-sensor object awareness and positioning to enable stable location awareness for a mobile service in urban areas. The system offers technology of outdoors vision based object recognition that will extend state-of-the-art location and context aware services towards object based awareness in urban environments. In the proposed application scenario, tourist pedestrians are equipped with a GPRS or UMTS capable camera-phone. They are interested whether their field of view contains tourist sights that would point to more detailed information. Multimedia type data about related history might be explored by a mobile user who is intending to learn within the urban environment. Ambient learning is in this way achieved by pointing the device towards the urban sight, capturing an image, and consequently getting information about the object on site and within the focus of attention, i.e., the user's current field of view. The described mobile system offers multiple opportunities for application in both mobile business and commerce, and is currently developed towards an industrial prototype
Content may be subject to copyright.
A Multi-Sensor System for Mobile Services with Vision
Enhanced Object and Location Awareness
Patrick Luley, Alexander Almer, Christin Seifert, Gerald Fritz and Lucas Paletta
JOANNEUM RESEARCH, Wastiangasse 6,
A-8010 Graz, Austria
{patrick.luley, alexander.almer, christin.seifert,
gerald.fritz, lucas.paletta}@joanneum.at
http://www.joanneum.at/dib
Abstract. We describe a system which includes a multi-sensor object
awareness and positioning solution to enable stable location awareness for a
mobile service in urban areas. The system offers technology of outdoors vision
based object recognition that will extend state-of-the-art location and context
aware services towards object based awareness in urban environments. In the
proposed application scenario, tourist pedestrians are equipped with a GPRS or
UMTS capable camera-phone. They are interested whether their field of view
contains tourist sights that would point to more detailed information.
Multimedia type data about related history might be explored by a mobile user
who is intending to learn within the urban environment. Ambient learning is in
this way achieved by pointing the device towards the urban sight, capturing an
image, and consequently getting information about the object on site and within
the focus of attention, i.e., the user’s current field of view. The described
mobile system offers multiple opportunities for application in both mobile
business and commerce, and is currently developed towards an industrial
prototype.
1 Introduction
Location based services require the knowledge about the actual location of the
user, the current user context and geo-referenced information about areas and points
of interest. Different technologies can be used to fulfill these requirements. In
general, one can distinguish to provide the digital service as either an offline or online
solution. Location awareness can be provided based on GPS, using wireless network
technologies such as GSM and WLAN or using self-location possibilities e.g. based
on street names and house numbers. Furthermore, location awareness can be realized
based on the knowledge of the location of the geo-referenced objects of interest,
which allows determination of the user position.
The presentation of geo-referenced information on mobile devices requires a data
transfer to the mobile device. This can be implemented as an offline service, by
storing the required data on the phones’ memory card, or as an online service, by
transferring the data to the phone on-demand using wireless network infrastructure
(e.g. GSM, GPRS, UMTS and WLAN. Both solutions have advantages and
disadvantages and can be used for the development of mobile applications [1].
Location awareness for a mobile service in urban areas can, by the use of GPS
only, not be assured everywhere and at any time, because of the known weaknesses
of GPS signal availability in urban areas. Therefore, mobile systems operating in
urban environments must take advantage of contexts arising from the spatial and
situated information at a current location of the pedestrian user. Today, location based
services are in principle able to provide access to rich sources of information and
knowledge to the nomadic user. However, the kind of the location awareness that
they do provide is not intuitive, requires reference to maps and addresses, i.e., the
information is not directly mediated via the object of interest.
In contrast, the proposed system takes a decisive step towards getting in line with
the user's current intention to relate information to its current sensorial experience,
e.g., the object in its line of sight. In this way, the system can respond to the user’s
focus of attention, e.g., for the purpose of tourist information systems. A camera
attached to the mobile system (PDA, or camera phone) pointing towards the object of
interest (e.g., a building or a statue) will capture images on demand and would be
capable of automatically finding objects in the tourist user's view. The images are
then transmitted to a server that automatically extracts the object information,
associates it to geo-referenced content, and sends the resulting data back to the
mobile user. ‘Mobile vision’ is here referred to mobile visual data that are processed
in an automated way to provide additional information to the nomadic client in real-
time.
A location based service in urban areas has to offer area-wide location awareness
to allow a spatial oriented access to information. A mobile application has to focus on
the thematic requirements and also on the target groups. In the following chapters a
mobile application system (see Fig. 1) will be described which is based on a common
smart-phone and image based object recognition as a tool support the location
awareness in combination with a GPS module.
2 User Scenario
This chapter briefly describes a user scenario, in case of a city-tourist type
pedestrian, focusing on the service of image based object recognition.
The common way of doing city sightseeing is using a printed city map with
integrated sightseeing-tours leading the tourist along a pre-defined path from one
sight to the next. Brief descriptions of the sights can be found at the backside of the
map.
By the use of the image based object recognition service the tourist gets the
freedom to explore the city without any pre-defined sightseeing tours.
The tourist moves completely free through an unknown area and if he is interested
in any object (e.g. a historical building or a statue) he just has to take a picture of it
with his camera-phone, with or without GPS device connected, and pressing the
“Identify” button. As result he gets a detailed description of the object containing
multi-media tourist information. As a second achievement he also gets the position of
the identified object which can be used for navigation.
Fig. 1. Scenario: taking a picture of the object to be identified by the system
3 System Overview
This chapter will briefly describe how a common smart-phone with built-in digital
camera can be used for image based object recognition. A GPS device, built-in or
connected to the phone via cable or Bluetooth can help to accelerate the recognition
process considerably. The overall concept consists of three main phases.
In the first phase a software client is activated by a user on his personal smart
phone. The software can be directly downloaded from a website with the internet
enabled smart-phone and then installed on the device. The software-client offers
functionality to take a picture of an object the user wants to identify. Next, if
available, the smart-phone reads the actual position from the GPS device. If the GPS
cannot obtain a position for any reason, the cell information of the phone-network
provider can be used to approximate the user location instead. The picture of the
object and the position of the snapshot-location are put together into a SOAP
Message and send to the image-recognition web-service, running on a dedicated
server, over an common wireless internet connection like GPRS or UMTS.
In the second phase the web-service reads the request from the client (smart-
phone) and extracts the picture and the GPS position. The picture is then analysed by
an image recognition algorithm to obtain representative features of the picture like
edges and significant surfaces or colour transitions. Next, these features are compared
with an object-database. The database contains pictures of different objects, with the
pre-processed features and the position of the snapshot-point. The object is then
identified by matching the features of the user-picture with the features of in the
database. This matching process is, in the case of a big database with many objects,
very time consuming and that is where the GPS position can help. To accelerate this
process the objects in the database can be filtered with the users GPS position. Only
those database objects come into account for the matching process, which have
corresponding pictures with a snapshot-point near the users-position. (see Fig. 3.)
Once the object is identified the object-database is queried for some information
containing text and pictures. This information is integrated into a SOAP-message and
is then send to the smart-phone client as response.
In the third phase the web-service response is presented to the user on the smart-
phone. Additional to the object quick information containing text and pictures there is
an URL which can be used by the user to obtain detailed information about the
object. The URL can be viewed in a common smart-phone internet browser like
“Opera” or “Pocket Internet Explorer“. The URL contains, as a parameter, the unique
identification number of the desired object and links to a dynamic website, which is
generated at runtime on the server. The layout of the website is optimized for the
requesting hardware platform – because different smart-phones have different display
resolutions.
The following Figure shows the whole technical concept and its three phases.
Fig. 2. Overview of the mobile application systems concept
The aim of this client-server architecture is to bring the image based object
recognition service to any person using a common camera-phone and to gain
scalability in reference of the number of objects in the database and complexity of the
image-recognition algorithms.
Fig. 3. Geo-Context, e.g., from GPS based position estimates (‘M’ with blue uncertainty
radius), can set priors by geographically indexing into a number of object hypotheses (‘X’s are
coordinates of user positions while capturing images about objects of interest).
4 Image-based Object Recognition
Our proposed image based service for object awareness requires both, robust and
fast visual object recognition of typically low-quality outdoor images. We therefore
applied a methodology that is highly suited for mobile vision applications, i.e., the
Informative Features approach [2] on the state-of-the-art local SIFT descriptor [3],
which is designed to be rotation and scale invariant and also invariant to illumination
changes to some extent. Our resulting i-SIFT approach [4] tackles the standard SIFT
bottleneck, i.e., extensive nearest neighbour indexing, by (i) significantly reducing
the descriptor dimensionality, (ii) decreasing the size of object representation by one
order of magnitude, and (iii) performing matching exclusively on attended
descriptors, rejecting the majority of irrelevant descriptors [2]. Experiments on
images of the TSG-20 database (download at http://dib.joanneum.at/cape/TSG-20),
showed high recognition accuracy, even on low resolution images (320x240 pixel).
4.1 The Informative Visual Features Approach
According to the Informative Features approach, the information content of a
descriptor with respect to a specific task, i.e. object recognition, is determined from
the posterior distribution. In contrast to costly global optimization, we expect that it
is sufficiently accurate to estimate local information content, by computing it from
the posterior distribution within a sample test point's local neighbourhood in feature
space [5]. Using only the informative features in further processing steps leads to a
significant speed-up compared to the normal SIFT based object recognition.
The object recognition task is applied to sample local descriptors
in feature
space F,
, where denotes an object hypothesis from a given object set
. We need to estimate the entropy of the posteriors
i
f
||F
i
f
i
o
)|(
i
fOH
=
..1),|( kfoP
ik
,
(1)
is the number of instantiations of the object class variable O. Shannon conditional
entropy denotes
k
ikiki
foPfoPfOH )|(log)|()|(
.
(2)
We approximate the posteriors at
using only samples inside a Parzen
window of a local neighbourhood
i
f
i
g
ε
,
Jjff
ji
..1,||||
=
ε
.
(3)
We weight the contribution s of specific samples
- labelled by object - that
should increase the posterior estimate
by a Gaussian kernel function
value
kj
f
, k
o
)|(
ik
foP
),(
σ
µ
N
in order to favour samples with smaller distance to observation ,
with
i
f
i
f
=
µ
and
2/
ε
σ
=
.
(4)
The estimate about the conditional entropy provides then a measure of
ambiguity in terms of characterizing the information content with respect to object
identification within a single local observation
.
)|(
ˆ
i
fOH
i
f
We receive sparse instead of extensive object representations, in case we store
only selected feature information that is relevant for classification purposes, i.e.,
discriminative
with
i
f
Θ)|(
ˆ
i
fOH
.
(5)
A specific choice on the threshold
consequently determines both, storage
requirements and recognition accuracy. For efficient memory indexing of nearest
neighbour candidates we use the adaptive K-d tree method.
4.2 Object Recognition Using Informative SIFT Descriptors
The SIFT based descriptor showed good performance, measured in terms of
repeatability, with respect to matching distinctiveness, invariance to blur, image
rotation, and illumination changes [6], [7]. As described in [3] the descriptor is
calculated in four stages:
1. Determine scale-space extrema as keypoint canditates.
2. Keypoint localization with subpixel accuracy and rejection of unstable
keypoints.
3. Orientation assignment, calculated from local keypoint context.
4. Calculation of the orientation histogram in a local keypoint environment
relative to keypoint orientation.
A SIFT descriptor consists of 128 floating point numbers. For performance
reasons in nearest neighbour calculation we applied PCA on the SIFT descriptors.
Thus, the informative approach is applied as described above, on vectors in the 40
dimensional sub-eigenspace.
We applied then the Informative Visual Features approach to the SIFT descriptors
(resulting in the i-SIFT approach), selecting only those SIFT responses from the
image that provided sufficient information content with respect to the task of object
recognition.
In Figure 3 the whole object recognition process is schematised. First, the SIFT
descriptors are calculated for a given input image. After applying PCA, the lower
dimensional features are fed into an decision tree [8] which has been trained to
estimate the local entropy of a feature. Third, the so determined informative features
are used for nearest neighbour matching against the image database, whereby the
object hypotheses is build only on features with a distance to their nearest neighbour
below a given threshold.
Fig. 3. Object Recognition using i-SIFT descriptors. Standard SIFT descriptors are first
extracted within the test image. Then the entropy of the descriptors is determined and decision
making is performed only on informative descriptors. Majority voting is then used to integrate
local votes into a global classification.
4.3 Experimental Results
The TSG-20 database includes images from 20 objects, i.e. facades of buildings from
the city of Graz, Austria. Most of these images contain a tourist sight, together with
background information from surrounding buildings, trees, pedestrians, etc. The
images contain severe changes in 3D viewpoint, partial occlusion, scale changes by
varying distances from exposure, and various illumination changes due to different
weather situations and changes in daytime. The images were first subsampled to size
320x240. For each object, we then selected 2 images taken by a viewpoint change of
of a similar distance to the object for training to determine the i-SIFT based
object representation. 2 additional frontal views with different distances were taken
for testing purpose, given 40 images in total. Figure 4 shows one view of each of the
20 objects contained in the database.
%30
Fig. 4. The TSG-20 database, consisting of images from 20 tourist relevant buildings in the city
of Graz.
In the experiments on the average only 178 out of 711 (31%) descriptors per object
were retained for object representation. Also the descriptor dimensionality was
reduced from 128 to 40 dimensions. The threshold on the entropy criterion for
attentive matching was defined by
= 1.8, which leads to only 40 % of nearest
neighbor processing’s. The recognition accuracy according to MAP (Maximum A
Posteriory) classifications was 100%, the average entropy in the posterior distribution
was
. Figure 5 shows two sample building recognitions, whereas
figure 6 shows results for background images (buildings not in training database).
4.1)|(
ˆ
i
fOH
Fig. 5 Sample building recognition for objects and , (top down) training images,
entropy coded (blue: low, red: high) SIFT descriptors (without selection), entropy coded
informatitive SIFT descriptors, and corresponding posterior distribution for i-SIFT based
descriptor recognition.
15
o
18
o
Fig. 6 Detection of background from high entropy in the posteriors.
5 Conclusion and Outlook
Location based services will be a default service for many themes in the future.
Location awareness, concerning the presented theme, is a key issue for such mobile
applications. Shortcomings in the provision of position information (“everywhere and
anytime”) related to the mobile user may be a crucial requirement for the acceptance
of mobile services in urban areas. The integration of contexts arising from the spatial
information at a current location will support the area-wide location awareness in
combination with GPS functionality or the cell information of the phone-network
provider. This allows an individual and also spontaneous information service. Such a
mobile service will offer new ways of visualising spatial information and a
customized data access for a mobile user in an urban environment.
In this paper, we provide an overview about a mobile system, which includes GPS
functionality and vision enhanced context awareness to offer user friendly digital
services for pedestrians in urban areas. The client-server architecture of the system
allows bringing the image based object recognition service to any person using a
common camera-phone. This mobile system enables its ubiquitous use in many
business and commerce relevant situations.
Further work will focus on the integration of different sensors for the realization of
smart mobile vision services in urban environments. These developments will be
realized in frame of both the EU-project MOBVIS (Vision Technologies and
Intelligent Maps for Mobile Attentive Interfaces in Urban Scenarios) and the national,
ASA funded project Mobile City Explorer.
References
1. Baldzer J., Boll S., Krösche J., Rump N., Scheibner H. and Thieme S.: Location-Based
Geodata Broadcast. 1
st
Workshop on Positioning, Navigation and Communication 2004 –
WPNC’04; March, 26
th
2004. Shaker Verlag Aachen (2004); pp 67-73.
2. Fritz, G., Seifert, C., Paletta, L., and Bischof, H., Rapid Object Recognition from
Discriminative Regions of Interest, Proc. 19th National Conference on Artificial
Intelligence, AAAI 2004, San Jose, CA, July 25-29, 2004, pp. 444-449.
3. Lowe, D., Distinctive image features from scale-invariant keypoints, International Journal of
Computer Vision 60(2), pp. 91-110, 2004.
4. Paletta, L., Fritz, G., Seifert, C. Informative SIFT Descriptors for Object Detection,
submitted to CVPR 2005, San Diego, CA
5. Fritz, G., Paletta, L., and Bischof, H., Object Recognition using Local Information Content,
Proc. International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, August
22-26 2004, Vol. II, pp. 15-18.
6. Mikolajczyk K, and Schmid, C. A performance evaluation of local descriptors. Proc.
Computer Vision and Pattern Recognition, CVPR, Madison, WI, 2003.
7. Mikolajczyk K, and Schmid, C. A performance evaluation of local descriptors. submitted to
PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a, 2004.
8. Quinlan, J. R. C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA,
1993.
... Cognitive radio with cognitive vision systems can have a capability to convert the acquired scene state into text, image or voice formats depending on the applications. Consequently, numerous image-based location and environment-aware applications [20,21] can be developed. However, it is a challenging task to embody such advanced cognitive vision systems due to low power, cost, and size limitations. ...
... pattern analysis and machine intelligence algorithms [19] ) to construct the scene state in the desired formats; text, image, video, and voice. One of the well-known visual location sensing techniques is scene analysis192021. Scene analysis simply is a pattern matching based location sensing technique similar to RF pattern matching based methods (e.g. RF fingerprinting) [25]. ...
... 4. A conceptual model for environment awareness engine. enhanced object and location awareness method for mobile services in [21]. In the proposed method, a mobile user (e.g. ...
Article
Location and environment awareness are two prominent features of cognitive radios and networks enabling them to interact with and learn the operating environment. A cognitive radio architecture with location and environment awareness engines is introduced in this paper. Architectural framework of both engines along with their components are presented. The proposed architecture is a promising model to support advanced and autonomous location and environment-aware applications (e.g. advanced location-based services (LBS)). Implementation options, design challenges, issues, and potential solutions for the realization of both engines are discussed. An underlying method for both engines, which is range accuracy adaptation, is presented. Finally, concluding remarks with future research directions are provided.
... pattern analysis and machine intelligence algorithms [27]) to construct the scene state in the desired formats: text, image, video, and voice. One of the well known visual location sensing techniques is scene analysis [27], [28], [29]. Scene analysis simply is a pattern matching based location sensing technique similar to RF pattern matching based methods (e.g. ...
... There are some efforts towards the realization of topographical map such as the Google Maps TM . Another example is the proposed vision enhanced object and location awareness method for mobile services in [29]. In the proposed method, mobile user (e.g. ...
Article
Location awareness in cognitive radio networks
... Installing sensor networks for sensing and reading the chips for irregular events does have some serious economic considerations. Another approach is having an object recognition system where a picture, which usually is a land object or structure, is taken using a built-in camera common in any mobile phone to identify their location with respect to the picture taken [5]. GPS is used to read the actual location if available. ...
... Object recognition has been proposed, a landmark such a picture, which is taken using a built-in camera in any common mobile phone. The landmark is used to identify their location based on the picture they have taken (Luley et.al., 2005). The mobile device also uses the GPS to read the actual position if the connection is available, However if the data cannot be obtained, it will make an approximate estimation of the cell information of the phone-network provider. ...
Article
Hajj is a huge congregation of Muslims from all over the world which happens annually in Makkah, Kingdom of Saudi Arabia (KSA). It is one of the pillars of Islam and every able Muslim must perform this act at least once in their lifetime. Many challenges are faced by the organizers as well as the pilgrims during this massive religious gathering. Cases of missing Hajj pilgrims are not uncommon and although several tracking and navigation devices have been introduced, there is still a need for a better solution in overcoming the issue. There are several factors that prevent widespread use of the system, such as the operational costs, availability of the connections and the use in uncommon platform. This paper proposes a framework for tracking Hajj pilgrims in a crowded pervasive environment using a system called HajjLocator. A discussion on the prototype of HajjLocator, as a system to track and monitor pilgrims while performing Hajj and to save lives, with an SOS mechanism, is also presented in this paper.
... Another approach is by having object recognition where a picture, usually landmark, is taken, using a built-in camera in any common mobile phone, to identify their location based on the picture they taken [9]. It also used the GPS to read the actual position if available and if the data cannot be obtained, it uses an approximate estimation of the cell information of the phone-network provider. ...
Conference Paper
Full-text available
Hajj, an annual Muslim pilgrimage, is one of the pillars of Islam and every able Muslim must perform this act at least once in their lifetime. During the pilgrimage, millions of Muslims from all over the world congregate for religious rituals in Makkah, Kingdom of Saudi Arabia. The cases of missing Hajj pilgrims are not uncommon and although several tracking and navigation devices have been introduced, there is still a need for a better solution in overcoming the issue. There are several factors that prevent a widespread use of the system, such as the operational costs, availability of the connections and the use in uncommon platform. Thus, this paper proposes a HajjLocator framework for Hajj Pilgrim tracking based on mobile phone environments as it is reasonably affordable and is extensively used by people. The prototype of HajjLocator, as a system to track and monitor pilgrims while performing Hajj, is also discussed.
... In the advent of wireless technologies, ubiquitous devices and wireless sensors are used to gather and process data for location information. The interaction of these devices provides location-awareness and necessary information to mobile users [1]. Location-awareness is an evolution of mobile computing, location sensing and wireless technology [2] where a mobile device like PDA is used as information service of the location. ...
Conference Paper
Full-text available
Location-aware services using data mining techniques are recent re- search topics where rules from the data are extracted to provide interesting in- formation. In addition, multi-agent systems are applied in location-based ser- vice for autonomous interaction of the system. Different data mining techniques are applied for knowledge discovery from location-based services. However, wireless environment limits the transmission of large data and possible for er- rors. This work presents a multi-agent framework for the location-based service using data mining. To support the data mining, a data compressor agent (DCA) based on neuro-fuzzy classifier is proposed. DCA performs data preprocessing where it merges the less frequent dataset by using neuro-fuzzy classifier before sending the data. User agent processes the knowledge discovery by using data mining like association rule mining. The result shows the proposed neuro-fuzzy data compression is more efficient compressor.
Article
Full-text available
this paper, we present broadcasting technology and the new DVB standard and investigate thespecial requirements of broadcasting geodata. We show how broadcasting of geodata can actuallybe integrated into mobile location-based applications, using our Niccimon platform as a developmentframework. The paper concludes with an outlook to upcoming broadcasting standards and future workin geodata broadcasting in the Niccimon project
Conference Paper
Full-text available
Object recognition and detection represent a relevant compo- nent in cognitive computer vision systems, such as in robot vision, intelligent video surveillance systems, or multi-modal interfaces. Object identification from local information has recently been investigated with respect to its potential for ro- bust recognition, e.g., in case of partial object occlusions, scale variation, noise, and background clutter in detection tasks. This work contributes to this research by a thorough analysis of the discriminative power of local appearance pat- terns and by proposing to exploit local information content to model object representation and recognition. We identify discriminative regions in the object views from a posterior en- tropy measure, and then derive object models from selected discriminative local patterns. For recognition, we determine rapid attentive search for locations of high information con- tent from learned decision trees. The recognition system is evaluated by various degrees of partial occlusion and Gaus- sian image noise, resulting in highly robust recognition even in the presence of severe occlusion effects.
Conference Paper
Full-text available
We propose reliable outdoor object detection on mobile phone imagery from o-the-shelf devices. With the goal to provide both robust object detection and reduction of computational complexity for situated interpretation of urban imagery, we propose to apply the 'Informative Descriptor Approach' on SIFT features (i-SIFT descriptors). We learn an attentive matching of i-SIFT keypoints, resulting in a signican t im- provement of state-of-the-art SIFT descriptor based keypoint matching. In the o-line learning stage, rstly , standard SIFT responses are eval- uated using an information theoretic quality criterion with respect to object semantics, rejecting features with insucien t conditional entropy measure, producing both sparse and discriminative object representa- tions. Secondly, we learn a decision tree from the training data set that maps SIFT descriptors to entropy values. The key advantages of in- formative SIFT (i-SIFT) to standard SIFT encoding are argued from observations on performance complexity, and demonstrated in a typical outdoor mobile vision experiment on the MPG-20 reference database.
Article
Full-text available
In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context, steerable filters, PCA-SIFT, differential invariants, spin images, SIFT, complex filters, moment invariants, and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.
Conference Paper
Full-text available
Object identification from local information has recently been investigated with respect to its potential for robust recognition, e.g., in case of partial object occlusions, scale variation, noise, and background clutter in detection tasks. This work contributes to this research by a thorough analysis of the discriminative power of local appearance patterns and by proposing to exploit local information content for object representation and recognition. In a first processing stage, we localize discriminative regions in the object views from a posterior entropy measure, and then derive object models from selected discriminative local patterns. Object recognition is then applied to test patterns with associated low entropy using an efficient voting process. The method is evaluated by various degrees of partial occlusion and Gaussian image noise, resulting in highly robust recognition even in the presence of severe occlusion effects.
Article
Full-text available
This paper introduces the weighted-Parzen-window classifier. The proposed technique uses a clustering procedure to find a set of reference vectors and weights which are used to approximate the Parzen-window (kernel-estimator) classifier. The weighted-Parzen-window classifier requires less computation and storage than the full Parzen-window classifier. Experimental results showed that significant savings could be achieved with only minimal, if any, error rate degradation for synthetic and real data sets
Conference Paper
In this paper we describe an image-based approach to finding location-based information from camera-equipped mobile devices. We introduce a point-by-photograph paradigm, where users can specify a location simply by taking pictures. Our technique uses content-based image retrieval methods to search the web or other databases for matching images and their source pages to find relevant location-based information. In contrast to conventional approaches to location detection, our method can refer to distant locations and does not require any physical infrastructure beyond mobile internet service. We have developed a prototype on a camera phone and conducted user studies to demonstrate the efficacy of our approach.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Learning Informative SIFT Descriptors for Attentive Object Recognition
  • G Fritz
  • C Seifert
  • L Paletta
  • H Bischof
Fritz, G., Seifert, C., Paletta, L., and Bischof, H., Learning Informative SIFT Descriptors for Attentive Object Recognition, Proc. 1st Austrian Cognitive Vision Workshop, ACVW 2005, Zell an der Pram, Austria, in print.