Conference Paper
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The rise of affordable wearable devices makes visual lifelog data easier to collect. This data can then be retrieved as an aside memory usable as well for people with memory issues or in daily life to be more efficient. This paper presents the retrieval system designed for the Lifelog Search Challenge 2019 competition. This retrieval system offers two search levels: a semantic one and an image-based one. The data used for the indexing for both levels (semantic data for the text-based method or combinations of HSV Histogram and BRIEF features for the example-based one) as well as the architecture of the overall system is described. Results from our experiments are promising. The text-based search engine has disparate results that strongly depend on the keywords used for the query. For the example search engine, the best features are composed of both HSV histogram and BRIEF features. Despite encouraging results, the processing time appears to be a limitation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... With respect to scene detection models, Place365 dataset is utilized to train ResNet152 [7], DenseNet [2], Place365CNN [8]. In order to enrich information extracted from images, different methods are conducted, such as removing blur images [2], creating tags from images' caption [11], building habit concepts [7], and constructing a keyframe of a bunk of images [9,10]. In [5], the Exquisitor system applies a high-dimensional-feature-vector-based clustering approach to group similar images into semantic clusters. ...
... For textualbased queries, text queries are parsed into noun, verb, adjective components, and then their synonyms are found to form a bag of words. These bags of words are compared to a library of words stored in the dataset [7], ranked tags generated through Microsoft Vision API [11], similar concept words [2], labels and captions in the dataset [14], or processed with Free-text Ranking Algorithm [8]. For visual-based queries, similar-semantic-content images are used as the inputs of systems developed by China-team [11], VIRET [10], Le et al. [8], while sketched images are applied by lifeXplore [9] and vitrivr [14]. ...
... These bags of words are compared to a library of words stored in the dataset [7], ranked tags generated through Microsoft Vision API [11], similar concept words [2], labels and captions in the dataset [14], or processed with Free-text Ranking Algorithm [8]. For visual-based queries, similar-semantic-content images are used as the inputs of systems developed by China-team [11], VIRET [10], Le et al. [8], while sketched images are applied by lifeXplore [9] and vitrivr [14]. Moreover, to take advantage of metadata, most systems construct a group of filters to provide users with the ability to form the new query whose content is the integration between the user's query and the metadata-based-search-keys from such filters, such as introduced in HCMUS-team [7], lifeXplore [9], VIRET [10], and Taiwan-team [2]. ...
Preprint
Full-text available
In this paper, we introduce an interactive multimodal lifelog retrieval system with the search engine built by utilizing the attention mechanism. The algorithm upon which the system relies is constructed by applying two observations: (1) most of the images belonged to one event probably contain cues (e.g., objects) that relate to the content of queries. These cues contribute to the representative of the event, and (2) instances of one event can be associated with the content and context of such an event. Hence, when we can determine the seed (by leveraging the first observation), we can find all relevant instances (by utilizing the second observation). We also take a benefit of querying by samples (e.g., images) by converting text query to images using the attention-based mechanism. Thus, we can enrich and add more semantic meaning into the simple text query of users towards having more accurate results, as well as discovering hidden results that cannot reach by using only text queries. The system is designed for both novice and expert users with several filters to help users express their queries from general to particular descriptions and to polish their results.
... In 2019, different approaches were used to enrich the provided dataset. In [14], the authors used the Microsoft Vision API for tagging and captioning, and HSV Histograms combined with BRIEF features [1] for visual search. While we have also used commercially available APIs such as the Google Vision API in the past, we have decided to move towards open-source solutions as part of the larger effort to make research both more accessible and reproducible. ...
Conference Paper
Full-text available
The variety and amount of data being collected in our everyday life poses unique challenges for multimedia retrieval. In the Lifelog Search Challenge (LSC), multimedia retrieval systems compete in finding events based on descriptions containing hints about structured, semi-structured an unstructured data. In this paper, we present the multimedia retrieval system vitrivr with a focus on the changes and additions made based on the new dataset, and our successful participation at LSC 2019. Specifically, we show how the new dataset can be used for retrieval in different modalities without sacrificing efficiency, describe two recent additions, temporal scoring and staged querying, and discuss the deep learning methods used to enrich the dataset.
Conference Paper
Full-text available
With the growing hype for wearable devices recording biometric data comes the readiness to capture and combine even more personal information as a form of digital diary - lifelogging today is practiced ever more and can be categorized anywhere between an informative hobby and a life-changing experience. From an information processing point of view, analyzing the entirety of such multi-source data is immensely challenging, which is why the first Lifelog Search Challenge 2018 competition is brought into being, as to encourage the development of efficient interactive data retrieval systems. Answering this call, we present a retrieval system based on our video search system diveXplore, which has successfully been used in the Video Browser Showdown 2017 and 2018. Due to the different task definition and available data corpus, the base system was adapted and extended to this new challenge. The resulting lifeXplore system is a flexible retrieval and exploration tool that offers various easy-to-use, yet still powerful search and browsing features that have been optimized for lifelog data and for usage by novice users. Besides efficient presentation and summarization of lifelog data, it includes searchable feature maps, concept and metadata filters, similarity search and sketch search.
Conference Paper
Full-text available
Fast and robust image matching is a very important task with various applications in computer vision and robotics. In this paper, we compare the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing. For this purpose, we manually apply different types of transformations on original images and compute the matching evaluation parameters such as the number of key points in images, the matching rate, and the execution time required for each algorithm and we will show that which algorithm is the best more robust against each kind of distortion. Index Terms-Image matching, scale invariant feature transform (SIFT), speed up robust feature (SURF), robust independent elementary features (BRIEF), oriented FAST, rotated BRIEF (ORB).
Conference Paper
Full-text available
Tags of social images play a central role for text-based social image retrieval and browsing tasks. However, the original tags annotated by web users could be noisy, irrelevant, and often incomplete for describing the image contents, which may severely deteriorate the performance of text-based image retrieval models. In this paper, we aim to overcome the challenge of social tag ranking for a corpus of social images with rich user-generated tags by proposing a novel two-view learning approach. It can effectively exploit both textual and visual contents of social images to discover the complicated relationship between tags and images. Unlike the conventional learning approaches that usually assume some parametric models, our method is completely data-driven and makes no assumption of the underlying models, making the proposed solution practically more effective. We formally formulate our method as an optimization task and present an efficient algorithm to solve it. To evaluate the efficacy of our method, we conducted an extensive set of experiments by applying our technique to both text-based social image retrieval and automatic image annotation tasks, in which encouraging results showed that the proposed method is more effective than the conventional approaches.
Conference Paper
Full-text available
Social media sharing web sites like Flickr allow users to annotate images with free tags, which significantly facilitate Web image search and organization. However, the tags associated with an image generally are in a random order without any importance or relevance information, which limits the effectiveness of these tags in search and other applications. In this paper, we propose a tag ranking scheme, aiming to automatically rank the tags associated with a given image according to their relevance to the image content. We first estimate initial relevance scores for the tags based on probability density estimation, and then perform a random walk over a tag similarity graph to refine the relevance scores. Experimental results on a 50, 000 Flickr photo collection show that the proposed tag ranking method is both effective and efficient. We also apply tag ranking into three applications: (1) tag-based image search, (2) tag recommendation, and (3) group recommendation, which demonstrates that the proposed tag ranking approach really boosts the performances of social-tagging related applications.
Article
Full-text available
Content Based Image Retrieval (CBIR) is an approach for retrieving similar images from an image database based on automatically-derived image features. The quality of a retrieval system depends on the features used to describe image content. In this paper, we propose an image clustering system that takes a database of images as input and clusters them using k-means clustering algorithm taking into consideration color, texture and shape features. Experimental results show that the combination of the three features brings about higher values of accuracy and precision.
Article
Full-text available
Memory is a key human facility to support life activities, including social interactions, life management and problem solving. Unfortunately, our memory is not perfect. Normal individuals will have occasional memory problems which can be frustrating, while those with memory impairments can often experience a greatly reduced quality of life. Augmenting memory has the potential to make normal individuals more effective, and those with significant memory problems to have a higher general quality of life. Current technologies are now making it possible to automatically capture and store daily life experiences over an extended period, potentially even over a lifetime. This type of data collection, often referred to as a personal life log (PLL), can include data such as continuously captured pictures or videos from a first person perspective, scanned copies of archival material such as books, electronic documents read or created, and emails and SMS messages sent and received, along with context data of time of capture and access and location via GPS sensors. PLLs offer the potential for memory augmentation. Existing work on PLLs has focused on the technologies of data capture and retrieval, but little work has been done to explore how these captured data and retrieval techniques can be applied to actual use by normal people in supporting their memory. In this paper, we explore the needs for augmenting human memory from normal people based on the psychology literature on mechanisms about memory problems, and discuss the possible functions that PLLs can provide to support these memory augmentation needs. Based on this, we also suggest guidelines for data for capture, retrieval needs and computer-based interface design. Finally we introduce our work-in-process prototype PLL search system in the iCLIPS project to give an example of augmenting human memory with PLLs and computer based interfaces.
Article
Full-text available
In this paper we examine the potential of pervasive computing to create widespread sousveillance, which will complement surveillance, through the development of life-logs—sociospatial archives that document every action, every event, every conversation, and every material expression of an individual’s life. Reflecting on emerging technologies, life-log projects, and artistic critiques of sousveillance, we explore the potential social, political, and ethical implications of machines that never forget. We suggest, given that life-logs have the potential to convert exterior generated oligopticons to an interior panopticon, that an ethics of forgetting needs to be developed and built into the development of life-logging technologies. Rather than seeing forgetting as a weakness or a fallibility, we argue that it is an emancipatory process that will free pervasive computing from burdensome and pernicious disciplinary effects.
Conference Paper
Lifelogging is becoming an increasingly important topic of research and this paper highlights the thoughts of the three panelists at the LSC - Lifelog Search Challenge at ICMR 2018 in Yokohama, Japan on June 11, 2018. The thoughts cover important topics such as the need for challenges in multimedia access, the need for a better user interface and the challenges in building datasets and organising benchmarking activities such as the LSC.
Conference Paper
Known-item search in multimodal lifelog data represents a challenging task for present search engines. Since sequences of temporally close images represent a significant part of the provided data, an interactive video retrieval tool with few extensions could be confronted at Lifelog Search Challenge in known-item search tasks. We present an update of the SIRET interactive video retrieval tool that recently won the Video Browser Showdown 2018. As the tool relies on frame-based representations and retrieval models, it can be directly used also for images from lifelog cameras. The updates comprise mostly visualization and navigation methods for a high number of visually similar scenes representing repetitive daily activities.
Conference Paper
Lifelogging data provides useful insight understanding about our lives during daily activities. Thus, it is essential to develop a system to assist users to retrieve events or memories from lifelogging data from ad-hoc text queries. In this paper, we first propose a method to process lifelogging data by grouping images into visual shots and clusters, then extract semantic concepts on scene category and attributes, entities, and actions. We then develop a query system that supports 4 main types of query conditions: temporal, spatial, entity and action, and extra data criteria. Our system is expected to efficiently assist users to search for past moments in daily logs.
Conference Paper
With the advancement in image capturing device, the image data been generated at high volume. Grouping images into meaningful categories to reveal useful information is a challenging and important problem. Content based image retrieval address the problem of retrieving images relevant to the user needs from image databases on the basis of low-level visual features that can be derived from the images. Due to semantic gap between low-level image features and the richness of human semantics, a challenge with image contents is to extract meaning from the data they contain. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Proposed framework focuses on color and texture as feature. Color moment and Gabor filter is used to extract features for image dataset. K-means and hierarchical clustering algorithm is applied to group the image dataset into various clusters.
Article
This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially content-based image retrieval, an area that has been so active and prosperous in the past few years. The survey includes 100+ papers covering the research aspects of image feature representation and extraction, multidimensional indexing, and system design, three of the fundamental bases of content-based image retrieval. Furthermore, based on the state-of-the-art technology available now and the demand from real-world applications, open research issues are identified and future promising research directions are suggested.
Conference Paper
Collaborative tagging has become an increasingly popular means for sharing and organizing Web resources, leading to a huge amount of user generated metadata. These tags represent quite a few different aspects of the resources they describe and it is not obvious whether and how these tags or subsets of them can be used for search. This paper is the first to present an in-depth study of tagging behavior for very different kinds of resources and systems - Web pages (Del.icio.us), music (Last.fm), and images (Flickr) - and compares the results with anchor text characteristics. We analyze and classify sample tags from these systems, to get an insight into what kinds of tags are used for different resources, and provide statistics on tag distributions in all three tagging environments. Since even relevant tags may not add new information to the search procedure, we also check overlap of tags with content, with metadata assigned by experts and from other sources. We discuss the potential of different kinds of tags for improving search, comparing them with user queries posted to search engines as well as through a user survey. The results are promising and provide more insight into both the use of different kinds of tags for improving search and possible extensions of tagging systems to support the creation of potentially search-relevant tags.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Manfred Jurgen Primus, and Klaus Shoeffmann. 2018. lifeXplore at the Lifelog Search Challenge
  • Bernd Munzer
  • Andreas Leibetseder
  • Sabrina Kletz
  • Duc-Tien Dang-Nguyen
  • Klaus Schoeffmann
  • Wolfgang Hurst
Duc-Tien Dang-Nguyen, Klaus Schoeffmann, and Wolfgang Hurst. 2018. LSE2018 Panel -Challenges of Lifelog Search and Access. (June 2018). https://doi.org/10. 1145/3210539.32105405