Conference PaperPDF Available

Landmark recognition in VISITO: VIsual Support to Interactive TOurism in Tuscany


Abstract and Figures

We present the VIsual Support to Interactive TOurism in Tuscany (VISITO Tuscany) project which offers an interactive guide for tourists visiting cities of art accessible via smartphones. The peculiarity of the system is that user interaction is mainly obtained by the use of images -- In order to receive information on a particular monument users just have to take a picture of it. VISITO Tuscany, using techniques of image analysis and content recognition, automatically recognize the photographed monuments and pertinent information is displayed to the user. In this paper we illustrate how the use of landmarks recognition from mobile devices can provide the tourist with relevant and customized information about various type of objects in cities of art.
Content may be subject to copyright.
Landmark recognition in
VISITO: VIsual Support to Interactive TOurism in Tuscany
Giuseppe Amato ISTI-CNR
via G.Moruzzi, 1
Pisa, Italy
Paolo Bolettieri ISTI-CNR
via G.Moruzzi, 1
Pisa, Italy
Fabrizio Falchi ISTI-CNR
via G.Moruzzi, 1
Pisa, Italy
We present the VIsual Support to Interactive TOurism in
Tuscany (VISITO Tuscany) project which offers an inter-
active guide for tourists visiting cities of art accessible via
smartphones. The peculiarity of the system is that user in-
teraction is mainly obtained by the use of images – In order
to receive information on a particular monument users just
have to take a picture of it. VISITO Tuscany, using tech-
niques of image analysis and content recognition, automati-
cally recognize the photographed monuments and pertinent
information is displayed to the user. In this paper we il-
lustrate how the use of landmarks recognition from mobile
devices can provide the tourist with relevant and customized
information about various type of objects in cities of art.
Categories and Subject Descriptors
H3.1 [Information Storage and Retrievals]: Content
Analysis and Indexing; H3.5 [Information Systems]: On-
line Information Services—Commercial services
General Terms
Experimentation, Algorithms
landmarks recognition, image classification, interactive tourism
In the last few years, the problem of recognizing landmarks
has received growing attention by the research community.
As an example, Google presented its approach to building
a web-scale landmark recognition engine [6] that was also
used to implement the Google Goggles service [5].
VISITO Tuscany (VIsual Support to Interactive TOurism
in Tuscany1) also aims at addressing this interesting issue
and investigates and realizes technologies able to offer an
Figure 1: Tourist information on a smartphone
interactive and customized advanced tour guide service to
visit the cities of art in Tuscany. More specifically, it focuses
on offering services to be used (see Figure 2):
During the tour – through the use of mobile devices of new
generation, in order to improve the quality of the experi-
ence. As shown in Figure 1, the mobile device is used by the
user to get detailed information about what he’s watching,
or about the context he’s placed in. While taking pictures
of monuments, places and other close-up objects, the user
points out what, according to him, seems to be more in-
teresting. When a picture is taken it is processed by the
system to infer which are the user’s interests and to provide
him relevant and customized information. For example, if a
user takes a picture of the bell tower of Giotto, he can get
detailed information describing the bell tower, its structural
techniques, etc.
Before the tour – to plan the visit in a better way. Both the
information sent by other users and their experiences, can be
employed by the user to better plan his own visit, together
with the information already included in the database sys-
tem and, more generally, on the web. The interaction will
take place through advanced methods based on 3D graphics.
After the tour – to keep the memory alive and share it
with other people. The user can access the pictures and the
itinerary he followed through advanced mode of interaction
based on 3D graphics. Moreover, he might share his infor-
mation and experiences with other users by creating social
Figure 2: The VISITO Tuscany project services.
Even if the general objective of the VISITO Tuscany project
is broader, in this demonstration we will mainly focus on
the use of the Smartphone to obtain information during a
visit in an tourist place. The user can obtain information on
monuments by simply pointing the landmark of interest with
the embedded camera and taking a picture. The acquired
image is analyzed and the landmark recognized so that the
user can be provided with related information.
The demonstrated system is composed of three main com-
ponents: a client application that runs on a mobile phone,
an image classifier that recognizes landmarks contained in
pictures, and a digital library containing descriptions of var-
ious monuments. At the moment of writing, we have cre-
ated recognizers for hundreds of monuments in three cities
in Tuscany: Florence, Pisa, and San Gimignano. For these
monuments we also populated the digital library with de-
scriptions consisting of text and images that can be easily
read from a mobile device. When the user takes a picture
of a monument, the picture is first sent to the classifier that
checks if one of the available monuments is recognized. In
case a monument is recognized, the description is retrieved
from the digital library and sent back to the mobile device.
Landmark recognition is performed using local features and
kNN based classification algorithms. We defined a new ap-
proach that relies on a revision of the single label kNN clas-
sification algorithmn. More specifically, as better discussed
in [1, 2], we propose an algorithm that first assigns a label to
each local feature of an image query. The label of the image
is then assigned on the basis of the labels and confidences
assigned to its local features. In other words, our kNN ap-
proach is based on the similarity among the local features of
the query image and the ones in the training set rather than
similarity among whole images. Even if we do not rely on an
Image-to-Class distance, our approach is similar to the one
described in [3]. Moreover, for bag of words approaches, the
importance of considering relations between local features
belonging to different images of the same class, has been re-
cently studied in [4] where visual synonyms are considered
for landmark image retrieval.
This work was partially supported by the VISITO Tuscany
project, funded by Regione Toscana, in the POR FESR
2007-2013 program, action line 1.1.d. VISITO Tuscany is co-
ordinated by ISTI-CNR. Its consortium includes three ISTI-
CNR laboratories (Networked Multimedia Information Sys-
tems, Visual Computing, High Performance Computing),
the security laboratory of IIT-CNR, and three private com-
panies: Alinari24Ore, Hyperborea, and 3Logic MK.
We also thank the municipalities of Florence, Pisa, and San
Gimignano that provided us with all needed permissions to
build the demonstrator.
[1] G. Amato and F. Falchi. kNN based image
classification relying on local feature similarity. In
SISAP ’10: Proceedings of the Third International
Conference on SImilarity Search and APplications,
pages 101–108, New York, NY, USA, 2010. ACM.
[2] G. Amato and F. Falchi. Local feature based image
similarity functions for kNN classfication. In
Proceedings of the 3rd International Conference on
Agents and Artificial Intelligence (ICAART 2011),
2011. to appear.
[3] O. Boiman, E. Shechtman, and M. Irani. In defense of
nearest-neighbor based image classification. In
Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition (CVPR08), pages 1–8, 2008.
[4] E. Gavves and C. G. M. Snoek. Landmark image
retrieval using visual synonyms. In ACM International
Conference on Multimedia, 2010.
[5] Google. Goggles., 2011. last
accessed on 3-March-2011.
[6] Y. Zheng, M. Z. 0003, Y. Song, H. Adam,
U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua,
and H. Neven. Tour the world: Building a web-scale
landmark recognition engine. In Proceedings of IEEE
Conference on Computer Vision and Pattern
Recognition (CVPR 2009), pages 1085–1092, 2009.
... A few examples: VISITO-Tuscany, promoted by the Tuscany Region (Italy), is based on the use of photos taken by visitors and it become an interactive guide to visit some art cities of Tuscany [5]; The Mobile, made by the Department of DOR in Philadelphia in 2010, uses the Layar platform to make accessible the historical images of the city of the entire collection [6]; UAR, launched by the NAI in Netherlands in 2011 makes use of the Layar infrastructure connected to their digitized archive. It shows 3D visualizations of buildings as they were in the past as well as projects to be realized in the future [7]. ...
Conference Paper
Full-text available
This paper describes the current results of a research project whose goal is to create a mobile Application aimed at the use of the Cultural Heritage stored at the Documentary Archives of Ascoli Piceno, Italy. The application core is based on the visualization of multimedia content in Augmented Reality mode. The application shows the realistic and digitally rendered scenarios of virtual restoring vision of the places “as they were and where they were” in order to communicate the historical value of the heritage preserved in the Documentary Archives; it prioritizes the visualization of the 3D reconstructions of architectural heritage based on Archival Resources. Moreover, the 3D models become interfaces to access different associated 2D content connected generating a cascading system, structured on hierarchical information. This App, defining a technological framework mainly based on Open Source systems, aims to offer an expandable and repeatable "open data" experience of Cultural Heritage, to make available and accessible “hidden” information.
This chapter describes an interdisciplinary research project that utilizes a variety of emerging computing technologies in the domain of heritage conservation. This pilot study investigated the development of novel field tools with the following three interlinked capabilities: firstly, the position of frescos was to be located in situ by the matching of images captured by a mobile device with those from a known set; secondly, these live images (from the camera’s view of the fresco) were to be visually augmented with information and images tailored to the needs of researchers, conservators, and educationalists; and, thirdly, fresco photographs taken in the field were to be cross-referenced with existing online databases of wall paintings. In practice, the frescos of the Armenian church presented an exacting challenge to pattern recognition technologies and these limitations prompted a reflexive approach to problem-solving that usefully points to future avenues of collaborative research and development.
Full-text available
The purpose of this chapter is to investigate some of the opportunities offered by technological innovations, in particular referring the specific application areas of Augmented Reality and Augmented Virtuality. The contribution presents a series of applications based on effective tests of innovative communication, which are characterized by different levels of interactivity and immersion. The general subject of interest is the city of Ascoli Piceno considering both the city as a whole and particular places/buildings of value (case studies). The central aim is to construct an informational/educational approach to real objects in innovative terms, experimenting each time with the most useful ‘container’ (communicational product) to enable the best knowledge of a determined heritage.
Conference Paper
Full-text available
State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric nearest-neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NN-based image classifiers useless. We claim that the effectiveness of non-parametric NN-based image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate "bags-of-words ", codebooks). (ii) Computation of 'image-to-image' distance, instead of 'image-to-class' distance. We propose a trivial NN-based classifier - NBNN, (Naive-Bayes nearest-neighbor), which employs NN- distances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'image- to-class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN. Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101 ,Caltech-256 and Graz-01).
Conference Paper
Full-text available
In this paper, we consider the incoherence problem of the visual words in bag-of-words vocabularies. Different from existing work, which performs assignment of words based solely on closeness in descriptor space, we focus on identifying pairs of independent, distant words - the visual synonyms - that are still likely to host image patches with similar appearance. To study this problems we focus on landmark images, where we can examine whether image geometry is an appropriate vehicle for detecting visual synonyms. We propose an algorithm for the extraction of visual synonyms in landmark images. To show the merit of visual synonyms, we perform two experiments. We examine closeness of synonyms in descriptor space and we show a first application of visual synonyms in a landmark image retrieval setting. Using visual synonyms, we perform on par with the state-of-the-art, but with six times less visual words.
Conference Paper
Full-text available
In this paper, we propose a novel image classification approach, derived from the kNN classification strategy, that is particularly suited to be used when classifying images described by local features. Our proposal relies on the possibility of performing similarity search between image local features. With the use of local features generated over interest points, we revised the single label kNN classification approach to consider similarity between local features of the images in the training set rather than similarity between images, opening up new opportunities to investigate more efficient and effective strategies. We will see that classifying at the level of local features we can exploit global information contained in the training set, which cannot be used when classifying only at the level of entire images, as for instance the effect of local feature cleaning strategies. We perform several experiments by testing the proposed approach with different types of image local features in a touristic landmarks recognition task.
Conference Paper
In this paper we consider the problem of image content recognition and we address it by using local features and kNN based classification strategies. Specifically, we define a number of image similarity functions relying on local feature similarity and matching with and without geometric constrains. We compare their performance when used with a kNN classifier. Finally we compare everything with a new kNN based classification strategy that makes direct use of similarity between local features rather than similarity between entire images. As expected, the use of geometric information offers an improvement over the use of pure image similarity. However, surprisingly, the kNN classifier that use local feature similarity has a better performance than the others, even without the use of geometric information. We perform our experiments solving the task of recognizing landmarks in photos.
Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily avail- able list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and ef- ficiency is another challenge for such a large scale system. This paper leverages the vast amount of multimedia data on the web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address these issues. First, a comprehen- sive list of landmarks is mined from two sources: (1) ! 20 million GPS-tagged photos and (2) online tour guide web pages. Candidate images for each landmark are then ob- tained from photo sharing websites or by querying an image search engine. Second, landmark visual models are built by pruning candidate images using efficient image matching and unsupervised clustering techniques. Finally, the land- marks and their visual models are validated by checking authorship of their member images. The resulting landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries. The experiments demonstrate that the engine can deliver satisfactory recognition performance with high efficiency.