Photo Context as a Bag of Words

Conference Paper (PDF Available) · December 2008with27 Reads
DOI: 10.1109/ISM.2008.15 · Source: DBLP
Conference: Tenth IEEE International Symposium on Multimedia (ISM2008), December 15-17, 2008, Berkeley, California, USA
In the recent years, photo context metadata (e.g.,date, GPS coordinates) have been proved to be useful in the management of personal photos. However, these metadata are still poorly considered in photo retrieving systems. In order to overcome this limitation, we propose an approach to incorporate contextual metadata in a keyword-based photo retrieval process.We use metadata about the photo shot context (address location, nearby objects, season, light status...) to generate a bag of words for indexing each photo. We extend the Vector Space Model in order to transform these shot context words into document-vector terms. In addition, spatial reasoning and geographical ontologies are used to infer new indexing terms. This facilitates the query-document matching process and also allows performing semantic comparison between the query terms and photo annotations.
Photo Context as a Bag of Words
Windson Viana, Samira Hammiche, Marlène Villanova-Oliver, Jérôme Gensel, Hervé Martin
LIG - Laboratoire Informatique de Grenoble BP 72
38402 Saint Martin d’Hères
Grenoble, France
{carvalho, hammiche, villanova, gensel, martin}
In the recent years, photo context metadata (e.g.,
date, GPS coordinates) have been proved to be useful in
the management of personal photos. However, these
metadata are still poorly considered in photo retrieving
systems. In order to overcome this limitation, we
propose an approach to incorporate contextual
metadata, in a keyword-based photo retrieval process.
We use metadata about the photo shot context (address
location, nearby objects, season, light status…) to
generate a bag of words for indexing each photo. We
extend the Vector Space Model in order to transform
these shot context words into document-vector terms. In
addition, spatial reasoning and geographical ontologies
are used to infer new indexing terms. This facilitates the
query-document matching process and also allows
performing semantic comparison between the query
terms and photo annotations.
1. Introduction
In image management systems, most users add
metadata such as place, people’s names, temporal
information, and event description in order to describe
their photos [1, 2]. These metadata are related to the
context of the photo shot (i.e. information about users’
situation when they snapped the photo) and form what
we call the photo contextual metadata [3]. In order to
alleviate the user’s boring task of manually adding
annotations, research works in automatic context-based
annotation have emerged [1, 2, 3, 4]. These works are
mainly devoted to photo organisation and tags
suggestion. However, contextual metadata is still poorly
considered in the photo retrieval process, despite the
fact that their use will help users to find more
interesting results since a large proportion of the user’s
queries contain words related to the shot context of a
photo [2, 5].
This paper deals with keyword-based retrieval of
photos. We propose a photo retrieval framework that
incorporates photo metadata, spatial and temporal
ontologies in the generation of keyword-based indexes.
The main goal of our framework is to support a
semantic search of photos based on content and
contextual annotations. First, we use an evolution of the
annotation system PhotoMap described in [3], in order
to produce the enriched photo metadata, such as
location, nearby objects, season, light status, nearby
friends and friends in the photo. Second, we use an
ontology-based representation for photo annotation that
provides an explicit semantics about the relationships
among each of these annotations and the photo (e.g.,
Photo1 has been taken in London-UK near Big Ben).
We propose then an adaptation of the Vector Space
Model (VSM) [7,9] for exploiting this annotation
representation in order to overcome the syntactic
comparison limitations, by incorporating explicit
semantics in the retrieval process. We extract from each
annotation a bag of words for indexing the photos. Each
index term is composed of a word and a semantic stamp
that represents the relationship between the photo and
the annotation. The main idea is to use the VSM for
computing the matching score between textual queries
and keyword-based indexes, without losing the
semantics of each keyword. As a result of this
indexation process, users can query photos, exploring
more precise descriptions like “photos of flowers taken
in Paris near the Eiffel Tower, during the Spring, where
Karol appears in the photo”. Moreover, we define
spatial similarity measures for expanding the initial
photos indexes in order to include other terms,
potentially related to the photo shot context. Methods
for calculating the relevance weight of index terms and
a ranking algorithm are defined.
The rest of this paper is organized as follows:
Section 2 presents an overview of image management
and retrieval systems, especially those based on
contextual annotations. A brief description of our
framework is presented in Section 3. In the Section 4,
we formalise the term weighting, the query formulation
and the ranking processes. In Section 5, we describe the
approach for index expansion. Finally, we conclude in
Section 6 with some future works.
2. Photo Management and Retrieval
The amount of photos in personal digital collections has
grown so rapidly that finding a specific photo has
become a frustrating activity. Hence, it is important to
offer rich interfaces and efficient retrieval processes in
order to help users to find easily their desired photos.
Generally speaking, two main approaches have been
explored: Content-Based Image Retrieval (CBIR) and
Keyword-Based Image Retrieval. The main problem of
the CBIR approach is the semantic gap, i.e.; the
divergence between the low-level visual features and
the high-level semantic meanings [3]. To alleviate this
problem, keyword based approaches use simple and
comprehensible words to describe photos. Thus, a
query is formulated as a combination of keywords. The
main drawbacks of such approach are the time-
consuming task of manual annotation and the syntactic
matching. In fact, keywords query engines do not
consider the semantic relations that hold between query
terms and photo index terms. To outperform the
classical image retrieval systems limitations, automatic
contextual based annotation systems can be used.
Indeed, a large proportion of the user’s queries contains
words related to the context of a photo shot, such as
names of places and events, [5][2]. Moreover, with the
possibility to explore camera-enabled mobile devices
that contains built-in sensors (e.g., GPS), information
about user situation can be automatically captured and
enriched. Contextual metadata can then, be used to
index the photos and to allow shot context queries.
Context-based Annotation Systems
Many systems have got success using contextual image
metadata (e.g., date, location) for photos organization,
publication and visualisation. Examples of such systems
are PhotoCompas [2], ZoneTag [1], MediAssist [4],
FlickrMap, and PhotoMap [3]. Most of these systems
make use of captured spatial and temporal contexts
(e.g., GPS and date/time of the snapshot) to
automatically organise photos collections, and to offer
context-based interfaces that allows users to navigate in
their photo collections. For example, our system
PhotoMap [3] offers a map-based interface for photo
browsing. It also uses contextual metadata to suggest
keyword annotations (i.e., tags) to describe an image.
Moreover, this system accesses Web resources such as
gazetteers and weather forecast services, to infer rich
information about the shot context (e.g., address
location, weather, nearby objects and people).
Context-based image retrieval
Most of the systems, previously mentioned, do not
consider contextual metadata during the retrieval
process. Only the MediAssist system goes towards the
retrieval phase. It offers a text-based retrieval interface
and the inferred image annotation is transformed into
keywords index for each image [5]. The main drawback
of this approach is the lack of consideration of the
semantic closeness of query terms during the matching
process. Only syntactical matching is performed.
Moreover, when users want to retrieve a photo,
usually they do not formulate precise queries. Usually,
since the retrieved photos have been taken a long time
before the search, users tend to remember only little
information about the context (like the day period, the
city location and the season), or more general
information (like the region visited or the commonly
referred name of a place). In addition, when they
remember some information about the photo context,
they can be partially incorrect or different from those
automatically generated by the annotation system. In
these types of queries, syntactical comparisons also fail.
3. Context-based Photo Retrieval
The first step in our approach is to extract candidate
keywords from the image annotation for indexing the
photos and to store together with these keywords the
semantic relations that hold between them and the
image. The image metadata contains an explicit
segmentation of the relation types between annotation
and images. For example, we can describe formally that
an annotation “Paris, France” represents the place
where a photo was taken. With semantic keyword-based
indexes derived from this segmentation, we allow users
to create keyword-based queries containing both
contextual and content conditions. For clarifying our
proposition, we describe a scenario of photo retrieval
that integrates context semantic similarity measures.
3.1. Photo retrieving example
Let us consider a mobile user, Bob, taking a photo of
the Stade de France (i.e., a stadium in the inner suburbs
of Paris). He uses a context-based annotation system,
such as PhotoMap, for tagging automatically his photo.
He is within 200 meters from the stadium. The
annotation system transforms the GPS coordinates, by
accessing a gazetteer service, into an address
description (e.g., Avenue du Géneral de Gaulle, Saint-
Denis, France). The camera phone also detects nearby
Bluetooth devices and infer that one of detected devices
belongs to Bob’s friends April (see in [3] for more
details). Moreover, Bob adds “stadium” as a manual
annotation, and he publishes the photo in his personal
space in a collection-based photo site. Later, Bob wants
to get the photo of the Stade de France where April was
In order to initiate his search, Bob gives “stadium”,
“April” andParis” as query keywords. First, let us
consider two possibilities: 1) the retrieving system only
indexes photos using manual tags and 2) the system
uses text-based retrieval approach of MediAssist [5]. In
the first case, the system will return photos of the Stade
de France mixed up with photos of stadiums where
Paris Saint Germain football team has played.
Moreover, the photo of April will probably be badly
ranked in the result. In the second case, the results will
be worst since photos taken in April (i.e., month) will
be better ranked.
Let us now assume the image system has added
content and context photo metadata during the
indexation process. This system can propose now
advanced query interfaces for exploring the whole
metadata. For example, Bob can inform that keyword
“Paris” indicates a location tag, “stadium” a content
ones and “April” a person tag. Moreover, having the
search term “Paris” as geographical concept, a spatial
similarity measure between the place name and the
location metadata of the photos can be calculated by
combining neighbour’s connectivity and geographical
distance as closeness measure (e.g., Saint-Denis is
neighbour of Paris). The Bob’s photo will now be
correctly ranked since the system will consider the
photo location as close to Paris.
3.2. Framework Overview
As mentioned previously, contextual metadata are
useful in keyword-based image retrieval, especially, for
retrieving personal photos. However, contextual
metadata should not only be transformed in simple text
tags. A retrieval system must consider: i) the semantics
of each generated tag; ii) the proposition of a matching
process that take into account this semantic; iii) the
inclusion of context and content similarity measures in
the matching process; and iv) a query model and the
corresponding query interface, that takes advantage of
the annotation segmentation. In Figure 1, we show our
indexation proposition to create a collection-based
system that takes into account these four concerns.
Figure 1 – Overview of the indexation framework.
The key concepts of the framework are:
- Formal representation and photo metadata
enriching process. A common vocabulary is required
to describe photo metadata. In our system, we have built
the OWL-DL ontology “ContextPhoto” for photo
annotation [3]. In ContextPhoto, we classify metadata
into two categories: content and contextual metadata. In
addition, the contextual image metadata is divided into
five context dimensions: spatial, temporal,
spatiotemporal, social, and computational. Some of
these metadata, like geographic position of the device
and the camera properties, are captured automatically
using our Photomap system [3]. Later, the PhotoMap
server explores the captured metadata in order to infer
other contextual information such as: nearby user’s
friends, address information, and nearby objects.
Multi-model indexation. We explore the photo
metadata generated with PhotoMap to create spatial,
temporal, and keyword-based indexes. Mainly, we
extract from each photo annotation a bag of words in
order to provide keyword-based indexes. We use the
manual annotation provided by users and augmented
with text surrogates selected from the contextual
metadata. In addition, we add a suffix for each text
surrogate that acts as a semantic “stamp” and it
indicates the context or the content dimension of the
generated keyword (See Section 4.1).
Semantic Index Expansion. The image
retrieval systems could show fewer results when the
user’s query is partially incorrect. In this case, most of
text-based image retrieval systems try to propose
alternative query terms, by using query expansion or
query rewriting processes. In such proposals, query
processing makes use of semantic resources like
ontologies and thesaurus (e.g., WordNet) in order to
rewrite a query into several approximated queries where
more related terms (like synonyms, hypernyms, and
hyponyms) are included [8]. One drawback of this
approach is the increase of query time response due to
the rewriting process and the execution of several
expanded queries [8]. To bypass this shortcoming, we
propose a semantic expansion of our stamped keyword
indexes. We enhance the image bag of words by using
spatial resources. We combine the original photo
context metadata with spatial ontologies in order to
derivate an expanded list of terms.
Figure 2 - Retrieving framework.
Context-based Query Model. By using our
photo indexing process, we can offer to a user the
possibility to inform the relationship between the
desired photos and the keywords she is typing on the
search. Our query model is intended to be a simple text-
based query interface to access photo collections.
Hence, we have chosen a simplified form-based
interface where we guide users in expressing the
keyword semantics. We offer labelled text fields where
the labels indicate the relationship contained in each
keyword. The available relationships are inspired from
the ubiquitous computing vision of context dimensions
[6]: What (image subject), When (temporal properties
like season and date), Where (location of the photo
snapshot), and Who (person present). Figure 2 shows
an overview of the retrieving framework.
4. Photo Indexation Process
4.1. Term Generation
In our retrieval system, a term is the combination of one
word and one semantic stamp. Words are extracted
from the manual and the automatic annotations of a
photo. A semantic stamp represents both the annotation
type (content or context) and some knowledge about the
annotation dimension (e.g., spatial annotation). For
example, the Bob’s photo will have as terms:
Stadium.what, Saint_Denis.locatedin, France.locatedin,
Europe.locatedin, Weekday.when, Afternoon.when,
Sunset.when, Autumn.when, 2007.when, April.who
December.when, and Stade_de_France.nearby. Hence,
we use the original content and contextual annotation to
generate a bag of stamped words. The main idea is still
using the VSM for computing the matching score
between textual queries and keyword-based indexes
without losing the semantics of each keyword. In a first
time, we use only the following stamps: locatedIn,
nearby, when, what, and who.
We can now define a corpus
as a set of photos,
T as the set of stamped words used for annotating
them. Each photo
p has a set
tttpST ,,,)(
stamped words, i.e., the words extracted from the
ontology annotation of the photo
p . We also define the
that contains the hierarchic of
geographic concepts used to generate the address of the
p . For instance, for Bob’s photo p
we have:
After the creation of the set
, we will
expand the set of terms that index each photo. In order
to increase the number of terms related to a photo
p ,
we use semantic resources (i.e., spatial ontologies) for
finding concepts semantically close to the shot context
of a photo. First, we retrieve in our knowledge base the
c that corresponds to the deepest concept of the
(e.g., Saint Denis for Bob’s
photo). The other concepts in the hierarchy will be used
to eliminate possible ambiguities (e.g., two cities with
the same name). After that, we apply a spatial similarity
) in order to find in this same
ontology other instances
that are relevant for
indexing the photo
(Section 5). With each instance c
that satisfies the condition
, we
generate a new term
as the combination of the
instance name of
and the stamp locatedin. We can
then define
} as the set of the
stamped terms derived from this process.
After the construction of the sets
) and
), we begin the creation of our inverted indexes.
21 i
= be the |T|-dimensional
weighted-vector of the photo
w is defined
as the relevance weight of the term
t to the photo p
This weight is computed as:
tipf represents the inverse photo frequency of the
t . It is computed as the IDF for text-based
document retrieval:
ftPtipf = , where |P| is
the size of the photo corpus and
computes how many
times the term
was used to annotate photos. For the
expanded terms (i.e.,
pSTt ), the weight
w is
the result of the product of
) and the similarity
score of the instance
that originates t
4.2. Query and Matching Processes
In our query model, users specify the semantic relations
among the desired photos and the query keywords by
using labelled text fields (i.e., what, where, who, when).
Advanced options are also available for allowing users
to inform more refined relations such as “
people on the
” or “photos nearby an object”. A query is, then, a
set of pairs
srksrkqQ ,,,,
= where n is the
number of written keywords
, and sr
is its respective
semantic relation. We transform )(
qQ into a vector
1 i
w , in this case, represents a
discrimination power of a term on the query for
discerning relevant from irrelevant documents. In order
to construct )(
, the first step is to create the set of
() }{
tttqST ,,,
= from the tuples Q. As we
have indexed our photos with terms generated from the
combination of a word and a semantic stamp, then, for
each keyword
, we use its semantic relation to
derive the related semantic stamp. Once the term
generation is over, we compute the weights of
) is a real number in the range
1,0 for
expressing a discriminating factor of a term among the
others on a query. The objective is to offers ways for
establishing priorities among query terms. Studies of
user’s behaviour while searching for photos have
revealed that the most important cues for remembering
photos are “who”, “where”, and “when” in that order
[2]. Hence, the discriminating factor could be used to
express this priority order in a query. After the
computation of the query weights, we start the
calculation of the score for each photo in the corpus.
The score is calculated by using the cosines similarity:
() ()
5. Index Expansion Process
Spatial terms that we consider for term expansion are
those with the stamp
locatedIn. In order to expand the
spatial terms in the photo shot context, we propose to
calculate the relevance weight of the potentially related
terms, using two spatial criteria: semantics and distance.
The global spatial similarity is calculated as:
and 1-
represent the relevance weight of the
semantic and distance similarities respectively.
The semantic dimension of the similarity accounts
for the semantic closeness of spatial terms like:
equivalence, inclusion, neighbourhood and
containment, whereas the distance is used to calculate
the physical distance between two places [11]. In order
to consider the semantic relations, we need to use a
spatial ontology, which represents the geographical
concepts of a region, their properties and their relations.
The manual construction of such ontologies is a long
time and energy consuming process. However, recent
works have successful in the automatic generation of
spatial ontology [10]. We can assume that spatial
ontologies will be more and more available in the
Figure 3 -
The spatial ontology.
In our approach, the geographical ontology is based
on the GeoNames model
. We define a generic model,
which we instantiate according to the available
geospatial data. This model (see Figure 3) represents
the geographical concept “
Place”, its properties (name,
equivalent Name
and Geometry), and the relations that
may exist among places (e.g.;
neighbour, partof and
capitalOf).We start our similarity computation by
searching in the spatial ontology the concept
c that
generates the term
c.located_in of a photo. If we find c
in our ontology, we begin the computation of the spatial
similarity among
c and the other concepts of the same
type (e.g., region, city). The steps of the process are:
First, we use the spatial ontology in order to retrieve
the known equivalent or alternative names of a place.
The alternative names will generate terms with the
same relevance weight of
c.locatedin (i.e.,
In a second step, we use the spatial ontology instances
and an SPARQL
query in order to retrieve the set of
potential related places to the place
c. The close places
are the neighbours of the place
c and those places that
belong to the same geographical unit of
c. With the
retrieved concepts, we create a set of potentially
related concepts
R .
In the next steps, our goal is to reduce the initial set
of related concepts and to keep only the most similar
ones. To do so, we calculate the spatial similarity
between the initial concept
c and
Rc . Spatial
similarity is calculated as follows:
For each
Rc , we calculate the semantic
similarity ),(
ccSemSim , by using an adaptation of
the Wu & Palmer measure [12]. This similarity is
asymmetric if one of the spatial concepts is the capital
of a geographical unit; i.e.
ccapitalSemSimcapitalcSemSim .
- lsb is the lowest super bound of the two concepts
and c
in the ontology. depth(lsb) represents
the distance that separates the
lsb concept and the
top ontology concept
represent the distance of the concepts
c and c
to the root concept hierarchy.
is a factor that we introduce in order to
differentiate the similarity between two
neighbouring spatial terms. The similarity between
two neighbour cities (
) has to be higher than
the similarity between two distant cities (
is a factor that we introduce in order to give
more importance to the capital of a geographical
unit (
), since a region capital is generally
better known by users. Otherwise,
> .
For each c
, we calculate the physical spatial
similarity, based on the ),(
cpDistSim . This
measure should consider the travel time between the
two places. However, this approach depends on the
availability of such information. In our approach, we
propose to calculate the Euclidian distance between
the photo coordinates and boundary of the place
After the calculation, we keep only the concepts for
which the similarity is higher than a threshold
For example, if we want to calculate the extended
terms of Bob’s photo, first, we generate the set of
related concepts of
Saint-Denis (i.e., the city where
Bob’s photo was taken):
= {Paris, Vincennes,
}(see Figure 3). Then, we compute the
spatial similarity between
Saint-Denis and each concept
of the set
. We give an example of the calculation in
the case of the concept Paris. The values used are:
65,02,1 ===
and (see Formulas IV and V).
),int( =
= ParisDenisSaSemSim .
),(5,2),(tan =
== Paris
619,0285,035,08,065,0),,int( =×+×= PParisDenisSa
(Saint-Denis, Vincennes)=0.428
(Saint-Denis, Courneuve)=0.545
The following step is to select the concept places in
which the similarity is higher than the threshold
, that
we fixed to the value
0,5. Hence, the city Vincennes will
not be included in the index of the Bob’s photo
However, Paris and Courneuve are potential spatial
related concepts. The weight of
Paris will bigger than
Courneuve since the border of Paris is nearer to the
photo and Paris is the agglomeration’s capital.
6. First Experiments and Conclusion
We have presented an approach in order to exploit the
contextual metadata in the photo retrieval process. We
used an adaptation of the classical vector space model.
Five dimensions are used to classify the contextual
metadata and to build the photo index using keywords:
spatial, temporal, spatiotemporal, manual and social.
Hence, users can formulate queries, which combine
keywords from the different dimensions and keep some
of the relation semantics in each keyword.
Figure 4 – Query result with Spatial Expansion
In order to validate our approach, we have extended
Photomap with a keyword-based search interface. We
have use PostgreSQL for indexing a reduced set of
images of PhotoMap and a large set of geotagged Flickr
photos. In the Figure 4, we can see in a map the results
of the query “Paris, stadium” with the semantic stamps
locatedin and what. The solid line encircles the result
without spatial expansion. The dotted regions show the
photo markers when we use the spatial expansion. As
we have expected, the photos of Saint Denis are also
included. Further experiments will be carried out in
order to evaluate the effectiveness of contextual
metadata in the retrieving of personal photos. And we
will also measure the impact of the spatial expansion in
the recall/precision of the user queries.
7. References
1. Ames, M., Naaman, M., Why We Tag: Motivations for
Annotation in Mobile and Online Media. Proc. of
Conference on Human Factors (CHI 2007), 2007.
2. Naaman, M., Harada, S., Wang, Q., Garcia-Molina, H.
and Paepcke, A., 2004. Context data in geo-referenced
digital photo collections. Proc. of 12th ACM international
Conference on Multimedia (MULTIMEDIA '04), 196-203
3. Viana, W., Bringel Filho, J., Gensel, J., Villanova-Oliver,
M., Martin, H. 2007. “PhotoMap: automatic spatiotemporal
annotation for mobile photos”. In Proceedings of the 7th
International Symposium on Web and Wireless GIS. UK.
Lecture Notes in Computer Science , Vol. 4857 . 2007
4. N. O’Hare, H. Lee, S. Cooray, C. Gurrin, G. Jones,J.
Malobabic, N. O’Connor, A. F. Smeaton, and B.
Uscilowski. Mediassist: Using content-based analysis and
context to manage personal photo collections. In CIVR,
pages 529–532, Tempe, Arizona, USA, 2006.
5. Neil O'Hare, Cathal Gurrin, Gareth J. F. Jones, Hyowon
Lee, Noel E. O'Connor, Alan F. Smeaton: Using text search
for personal photo collections with the MediAssist system.
In Proceddings of ACM SAC 2007.
6. Abowd D., and Mynatt D., “Charting past, present, and
future research in ubiquitous computing”, ACM
Transactions on Computer-Human Interaction 7(1), 2000.
7. Pablo Castells, Miriam Fernández, David Vallet, "An
Adaptation of the Vector-Space Model for Ontology-Based
Information Retrieval," IEEE Transactions on Knowledge
and Data Engineering ,vol. 19, pp.261-272, 2007.
8. Hammiche, S.;Lopez, B.; Benbernou, S.; Hacid, M.-S.;
"Query Rewriting for Semantic Multimedia Data Retrieval".
Advances of Computational Intelligence in Industrial
Systems p.351-p.372, 2008
9. G. Salton, A. Wong, and C. S. Yang (1975), "A Vector
Space Model for Automatic Indexing," Communications of
the ACM, vol. 18, n° 11, pages 613–620.
10. D. Buscaldi, P. Rosso, and P. Peris. Inferring geographical
ontologies from multiple resources for geographical
information retrieval. In C. Jones and R. Purves, editors,
Proceedings of 3rd SIGIR Workshop on Geographical
Information Retrieval, August 2006.
11. Jones, C. B., Alani, H., and Tudhope, D. 2001.
Geographical Information Retrieval with Ontologies of
Place. In Proceedings of the international Conference on
Spatial information theory. Lecture Notes In Computer
Science, vol. 2205. Springer-Verlag, London, 322-335.
12. Peng-Yuan Liu, Tie-Jun Zhao. Application-oriented
comparison and evaluation of six semantic similarity
measures based on WordNet. International Conference on
Machine Learning and Cybernetics, 2006.
    • "Even though contextual information is helpful to organize photos and provides the first descriptions that are remembered by users (Naaman et al., 2004), it does not fully describe the content of the image. These metadata are rather related to the users' situation when the photo is captured (Viana et al., 2008). However, users might be interested as well in the content of the image (Content Based Image Retrieval -CBIR). "
    Conference Paper · Jan 2011 · Journal of Computer Science and Technology
  • [Show abstract] [Hide abstract] ABSTRACT: Due to geotagging capabilities of consumer cameras, it has become easy to capture the exact geometric location where a picture is taken. However, the location is not the whereabouts of the scene taken by the photographer but the whereabouts of the photographer himself. To determine the actual location of an object seen in a photo some sophisticated and tiresome steps are required on a special camera rig, which are generally not available in common digital cameras. This article proposes a novel method to determine the geometric location corresponding to a specific image pixel. A new technique of stereo triangulation is introduced to compute the relative depth of a pixel position. Geographical metadata embedded in images are utilized to convert relative depths to absolute coordinates. When a geographic database is available we can also infer the semantically meaningful description of a scene object from where the specified pixel is projected onto the photo. Experimental results demonstrate the effectiveness of the proposed approach in accurately identifying actual locations.
    Article · Feb 2013
  • [Show abstract] [Hide abstract] ABSTRACT: Personal photo revisitation on smart phones is a common yet uneasy task for users due to the large volume of photos taken in daily life. Inspired by the human memory and its natural recall characteristics, we build a personal photo revisitation tool, PhotoPrev, to facilitate users to revisit previous photos through associated memory cues. To mimic users’ episodic memory recall, we present a way to automatically generate an abundance of related contextual metadata (e.g., weather, temperature) and organize them as context lattices for each photo in a life cycle. Meanwhile, photo content (e.g., object, text) is extracted and managed in a weighted term list, which corresponds to semantic memory. A threshold algorithm based photo revisitation framework for context- and content-based keyword search on a personal photo collection, together with a user feedback mechanism, is also given. We evaluate the scalability on a large synthetic dataset by crawling users’ photos from Flickr, and a 12-week user study demonstrates the feasibility and effectiveness of our photo revisitation strategies.
    Article · May 2015