Conference Paper

An eye-tracking-based approach to facilitate interactive video search

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper investigates the role of gaze movements as implicit user feedback during interactive video retrieval tasks. In this context, we use a content-based video search engine to perform an interactive video retrieval experiment, during which, we record the user gaze movements with the aid of an eye-tracking device and generate features for each video shot based on aggregated past user eye fixation and pupil dilation data. Then, we employ support vector machines, in order to train a classifier that could identify shots marked as relevant to a new query topic submitted by new users. The positive results provided by the classifier are used as recommendations for future users, who search for similar topics. The evaluation shows that important information can be extracted from aggregated gaze movements during video retrieval tasks, while the involvement of pupil dilation data improves the performance of the system and facilitates interactive video search.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For instance, if a user, who performs interactive video search, views a video directly after she has submitted the query Bcar^, it is highly possible that this video contains a car and could be annotated using this tag. Towards to exploiting this fact, several research works rely upon graph-based representations, in order to link viewed or clicked items with queries [12,13,38], while others focus on identifying user interest by considering gaze movements [18,39] and predefined (known) topics. ...
... Recently, approaches for performing relevance feedback based on eye features are proposed in [44] and [23], while a gaze-based relevance feedback approach for region-based image search is presented in [30]. In [39] the authors propose the generation of recommendations based on a SVM classifier trained with fixation and pupil dilation based features, while [40] extends this work by performing query clustering based on dominant sets. Other recent works in image retrieval attempt to combine image features with eye movements, either by using a ranking SVM approach [10], or by identifying areas of interest in an image to extract local visual features [24]. ...
... These works focus on more controlled retrieval scenarios and do not deal with predicting the relevance of results for a new query by a new user [18], they do not consider unknown search topics [39] and they do not incorporate gaze movements to support query clustering [40]. Also, none of them considers combination of gaze movements and click-through data for video retrieval e.g., [1,23,30] focus on image search). ...
Article
Full-text available
In the recent years, the rapid increase of the volume of multimedia content has led to the development of several automatic annotation approaches. In parallel, the high availability of large amounts of user interaction data, revealed the need for developing automatic annotation techniques that exploit the implicit user feedback during interactive multimedia retrieval tasks. In this context, this paper proposes a method for automatic video annotation by exploiting implicit user feedback during interactive video retrieval, as this is expressed with gaze movements, mouse clicks and queries submitted to a content-based video search engine. We exploit this interaction data to represent video shots with feature vectors based on aggregated gaze movements. This information is used to train a classifier that can identify shots of interest for new users. Subsequently, we propose a framework that during testing: a) identifies topics (expressed by query clusters), for which new users are searching for, based on a novel clustering algorithm and b) associates multimedia data (i.e., video shots) to the identified topics using supervised classification. The novel clustering algorithm is based on random forests and is driven by two factors: first, by the distance measures between different sets of queries and second by the homogeneity of the shots viewed during each query cluster defined by the clustering procedure; this homogeneity is inferred from the performance of the gaze-based classifier on these shots. The evaluation shows that the use of aggregated gaze data can be exploited for video annotation purposes.
... In the related literature, implicit tagging has been used for direct annotation of data (such as images, video and music) with predefined sets of implicit tags (such as affective labels for describing emotion elicitation) [34] [38], assessment of explicit tag quality and correctness [1] [15] [33], user profiling by tracking personal preferences [11] and content summarization based on implicitly obtained feedback used mainly for re-ranking of results [41]. We limit our documentation of related work to research concerning the tagging of visual content such as images and videos, as they are more relevant to our image-oriented approach and experiments described in this paper, but we encourage readers to refer to the works of [2] [4] [28] concerning topical relevance of textual search results, as well as research on implicit characterization of musical scores [29], for a complete overview on emerging methodologies concerning IHCT. ...
... More specifically, the authors are able to extract a level of interest ranging from 0 (no interest) to 1 (fully interested) using a fuzzy logic based gaze inference system, reporting an accuracy of 53%. The potential of exploiting implicit gaze feedback data for the improvement of query-specific recommendations for movie clips is also explored in Vrochidis et al [41]. In their experiments using a content-based video search engine, they recorded past user gaze fixation and pupil dilation data using an eye tracking system in order to generate a set of features that describes each video being gazed at. ...
Article
Full-text available
In this paper, a framework for implicit human-centered tagging is presented. The proposed framework draws its inspiration from the psychologically established process of attribution. The latter strives to explain affect-related changes observed during an individual's participation in an emotional episode, by bestowing the corresponding affect changing properties on a selected perceived stimulus. Our framework tries to reverse-engineer this attribution process. By monitoring the annotator's focus of attention through gaze-tracking, we identify the stimulus attributed as the cause for the observed change in core affect. The latter is analyzed from the user's facial expressions. Experimental results attained by a lightweight, cost-efficient application based on the proposed framework show promising accuracy in both the assessment of topical relevance and direct annotation scenarios. These results are especially encouraging given the fact that the behavioral analyzers used to obtain user affective response and eye gaze lack the level of sophistication and high cost usually encountered in the related literature.
... Not until recently has eye tracking technology been applied in the field of information retrieval (Li et al., 2016;Papadopoulos et al., 2014;Vrochidis et al., 2011). Oyckoya and Stentiford (2005) compared visual input with mouse input and found that in a target recognition task, eye tracking was faster than the use of the mouse. ...
Article
Full-text available
Satisfying a user's actual underlying needs in the image retrieval process is a difficult challenge facing image retrieval technology. The aim of this study is to improve the performance of a retrieval system and provide users with optimized search results using the feedback of eye movement. We analyzed the eye movement signals of the user’s image retrieval process from cognitive and mathematical perspectives. Data collected for 25 designers in eye tracking experiments were used to train and evaluate the model. In statistical analysis, eight eye movement features were statistically significantly different between selected and unselected groups of images (p < 0.05). An optimal selection of input features resulted in overall accuracy of the support vector machine prediction model of 87.16%. Judging the user’s requirements in the image retrieval process through eye movement behaviors was shown to be effective.
... As for pupil velocity, we also generated the mean, stdev, min and max. Previous work has used pupil velocity to infer users' search intentions in video retrieval tasks [56], as well as reading comprehension [44]. To account for potential physiological differences in pupil size among individual users, measured pupil dilation values for each user are adjusted with respect to their baseline using the percentage change in pupil size (PCPS), reported in µm, which [32] defines as: measured pupilsize − baseline pupilsize baseline pupilsize ...
Conference Paper
In this paper we investigate using a variety of behavioral measures collectible with an eye tracker to predict a user's skill acquisition phase while performing various information visualization tasks with bar graphs. Our long term goal is to use this information in real-time to create user-adaptive visualizations that can provide personalized support to facilitate visualization processing based on the user's predicted skill level. We show that leveraging two additional content-independent data sources, namely information on a user's pupil dilation and head distance to the screen, yields a significant improvement for predictive accuracies of skill acquisition compared to predictions made using content-dependent information related to user eye gaze attention patterns, as was done in previous work. We show that including features from both pupil dilation and head distance to the screen improve the ability to predict users' skill acquisition state, beating both the baseline and a model using only content-dependent gaze information.
... The hybrid summarization process suggested candidates for text summarization using the prediction of visual attention and word semantics analysis. Vrochidis et al. (2011) studied the potentials of eyemovements as a source of implicit feedback in video retrieval tasks. They built a recommendation system for finding similar topics in videos based on Support Vector Machines. ...
Chapter
Inference about high-level cognitive states during interaction is a fundamental task in building proactive intelligent systems that would allow effective offloading of mental operations to a computational architecture. We introduce an improved machine-learning pipeline able to predict user interactive behavior and performance using real-time eye-tracking. The inference is carried out using a support-vector machine (SVM) on a large set of features computed from eye movement data that are linked to concurrent high-level behavioral codes based on think aloud protocols. The differences between cognitive states can be inferred from overt visual attention patterns with accuracy over chance levels, although the overall accuracy is still low. The system can also classify and predict performance of the problem-solving users with up to 79 % accuracy. We suggest this prediction model as a universal approach for understanding of gaze in complex strategic behavior. The findings confirm that eye movement data carry important information about problem solving processes and that proactive systems can benefit from real-time monitoring of visual attention.
Article
Eye-tracking research is beneficial for better understanding user behaviour in search engines. The present paper presents a comprehensive narrative literature review of eye-tracking studies examining factors influencing users’ viewing behaviour on results pages of search engines. Discipline-specific databases from Psychology, Computer Science, and Library and Information Science, as well as one multidisciplinary database have been searched for relevant articles. Criteria for inclusion were that a paper reported empirical results from an eye-tracking study in which effects of a specific factor on users’ viewing behaviour on search engine results pages (SERPs) were examined, with inferential statistical results being reported. This led to a set of 41 papers that were further examined. The papers were grouped into three categories according to three types of factors that may affect individuals’ web search activities: contextual factors, resource factors, and individual factors. Papers were assigned to these categories and subsequently to sub-categories. Overall, while for some sub-categories robust findings can be reported, we found results in many sub-categories to be inconclusive. For future research, we recommend a shift from small-scale studies examining single factors to more comprehensive and theory-driven research using larger sample sizes.
Chapter
Social Signal Processing is the first book to cover all aspects of the modeling, automated detection, analysis, and synthesis of nonverbal behavior in human-human and human-machine interactions. Authoritative surveys address conceptual foundations, machine analysis and synthesis of social signal processing, and applications. Foundational topics include affect perception and interpersonal coordination in communication; later chapters cover technologies for automatic detection and understanding such as computational paralinguistics and facial expression analysis and for the generation of artificial social signals such as social robots and artificial agents. The final section covers a broad spectrum of applications based on social signal processing in healthcare, deception detection, and digital cities, including detection of developmental diseases and analysis of small groups. Each chapter offers a basic introduction to its topic, accessible to students and other newcomers, and then outlines challenges and future perspectives for the benefit of experienced researchers and practitioners in the field.
Article
Full-text available
The modern era of search technology has changed the way the information searched and retrieved compared to the previous decade of search engines. Today's search engine has evolved as a way of shifting the locus of control over information dissemination closer to the consumers of that content. Information retrieval being a vast field, has many application related to it. In this paper we analyze various fields in which IR is being used as an application. We divide the application into seven categories; they are Communication, Databases, Natural Language Processing, Multimedia, Document Ranking, Semantic Web and Software Engineering. In this paper it can be observed that the importance of IR in the various fields just by sheer number of categories it supports. The more widely its used, the more it will change the way the mankind is going look at information and world at large.
Article
Relevance feedback is an efficient approach to improve the performance of content-based image retrieval systems, and implicit relevance feedback approaches, which gather users’ feedback by biometric devices (e.g. eye tracker), have extensively investigated in recent years. This paper proposes a novel image retrieval system with implicit relevance feedback, named eye tracking based relevance feedback system (ETRFs). ETRFs is composed of three main modules: image retrieval subsystem based on bag-of-word architecture; user relevance assessment that implicitly acquires relevant images with the help of a modern eye tracker; and relevance feedback module that applies a weighted query expansion method to fuse users’ relevance feedback. ETRFs is implemented online and real-time, which makes it remarkably distinguish from other offline systems. Ten subjects participate our experiments on the dataset of Oxford buildings and UKBench. The experimental results demonstrate that ETRFs achieves notable improvement for image retrieval performance.
Article
Relevance feedback is one of approach to improve the performance of content-based image retrieval system, and implicit feedback approaches, which gather users' feedback by biometric devices (e.g. eye tracker), are extensively investigated in recent years. This paper proposes a novel image retrieval system with eye tracking (IRSET). IRSET is composed of three modules: image retrieval module based on standard bag-of-words, eye tracking module to obtain a user's fixation data and to infer feedback information, and query expansion module that fuses the user's feedback and the input query to form a richer latent query. The implicit feedback of IRSET is implemented online and real-time, which makes IRSET remarkably distinguish from other systems with implicit feedback. We conduct experiments on the dataset of Oxford building for ten participants. The experimental results demonstrate that IRSET is an attractive interface to image retrieval and improves the retrieval performance.
Conference Paper
Users react differently to non-relevant and relevant tags associated with content. These spontaneous reactions can be used for labeling large multimedia databases. We present a method to assess tag relevance to images using the non-verbal bodily responses, namely, electroencephalogram (EEG), facial expressions, and eye gaze. We conducted experiments in which 28 images were shown to 28 subjects once with correct and another time with incorrect tags. The goal of our system is to detect the responses to non-relevant tags and consequently filter them out. Therefore, we trained classifiers to detect the tag relevance from bodily responses. We evaluated the performance of our system using a subject independent approach. The precision at top 5% and top 10% detections were calculated and results of different modalities and different classifiers were compared. The results show that eye gaze outperforms the other modalities in tag relevance detection both overall and for top ranked results.
Conference Paper
Tags are an effective form of metadata which help users to locate and browse multimedia content of interest. Tags can be generated by users (user-generated explicit tags), automatically from the content (content-based tags), or assigned automatically based on non-verbal behavioral reactions of users to multimedia content (implicit human-centered tags). This paper discusses the definition and applications of implicit human-centered tagging. Implicit tagging is an effortless process by which content is tagged based on users' spontaneous reactions. It is a novel but growing research topic which is attracting more attention with the growing availability of built-in sensors. This paper discusses the state of the art in this novel field of research and provides an overview of publicly available relevant databases and annotation tools. We finally discuss in detail challenges and opportunities in the field.
Conference Paper
This paper proposes a framework for automatic video annotation by exploiting gaze movements during interactive video retrieval. In this context, we use a content-based video search engine to perform video retrieval, during which, we capture the user eye movements with an eye-tracker. We exploit these data by generating feature vectors, which are used to train a classifier that could identify shots of interest for new users. The queries submitted by new users are clustered in search topics and the viewed shots are annotated as relevant or non-relevant to the topics by the classifier. The evaluation shows that the use of aggregated gaze data can be utilized effectively for video annotation purposes.
Article
Full-text available
Different modes of human-computer interaction will play a major part in making computing increasingly pervasive. More natural methods of interaction are in demand to replace devices such as the keyboard and the mouse, and it is becoming more important to develop the next generation of human-computer interfaces that can anticipate the user's intended actions. Human behaviour depends on highly developed abilities to perceive and interpret visual information and provides a medium for the next generation of image retrieval interfaces. If the computer can correctly interpret the user's eye gaze behaviour, it will be able to anticipate the user's objectives and retrieve images and video extremely rapidly and with a minimum of thought and manual involvement.
Technical Report
Full-text available
In this paper, we give an overview of the four tasks submitted to TRECVID 2007 by COST292. In shot boundary (SB) detection task, four SB detectors have been developed and the results are merged using two merging algorithms. The framework developed for the high-level feature extraction task comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a Bayesian classifier trained with a "bag of subregions". The third system uses a multi-modal classifier based on SVMs and several descriptors. The fourth system uses two image classifiers based on ant colony optimisation and particle swarm optimisation respectively. The system submitted to the search task is an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. Finally, the rushes task submission is based on a video summarisation and browsing system comprising two different interest curve algorithms and three features.
Conference Paper
Full-text available
One important user-oriented facet of digital video retrieval research involves how to abstract and display digital video surrogates. This study reports on an investigation of digital video results pages that use textual and visual surro- gates. Twelve subjects selected relevant video records from results lists con- taining titles, descriptions, and three keyframes for ten different search tasks. All subjects were eye-tracked to determine where, when, and how long they looked at text and image surrogates. Participants looked at and fixated on titles and descriptions statistically reliably more than on the images. Most people used the text as an anchor from which to make judgments about the search re- sults and the images as confirmatory evidence for their selections. No differ- ences were found whether the layout presented text or images in left to right order.
Conference Paper
Full-text available
We study a new task, proactive information retrieval by combining implicit relevance feedback and collaborative filtering. We have constructed a controlled experimental setting, a prototype application, in which the users try to find interesting scientific articles by browsing their titles. Implicit feedback is inferred from eye movement signals, with discriminative hidden Markov models estimated from existing data in which explicit relevance feedback is available. Collaborative filtering is carried out using the User Rating Profile model, a state-of-the-art probabilistic latent variable model, computed using Markov Chain Monte Carlo techniques. For new document titles the prediction accuracy with eye movements, collaborative filtering, and their combination was significantly better than by chance. The best prediction accuracy still leaves room for improvement but shows that proactive information retrieval and combination of many sources of relevance feedback is feasible.
Conference Paper
Full-text available
We examine the effect of incorporating gaze-based attention feedback from the user on personalizing the search process. Employing eye tracking data, we keep track of document parts the user read in some way. We use this information on the subdocument level as implicit feedback for query ex- pansion and reranking. We evaluated three different variants incorporating gaze data on the subdocument level and compared them against a baseline based on context on the document level. Our results show that considering reading behavior as feedback yields powerful improvements of the search result accuracy of ca. 32% in the general case. However, the extent of the improvements varies depending on the internal structure of the viewed documents and the type of the current informa- tion need.
Conference Paper
Full-text available
We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line pre- dictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming in- terface can be connected to any content-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gaze patterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking.
Conference Paper
Full-text available
It is natural in a visual search to look at any object that is similar to the target so that it can be recognised and a decision made to end the search. Eye tracking technology offers an intimate and immediate way of interpreting users' behaviours to guide a computer search through large image databases. This work describes experiments carried out to explore the relationship between gaze behaviour and a visual attention model that identifies regions of interest in image data. Results show that there is a difference in behaviour on images that do and do not contain a clear region of interest.
Conference Paper
Full-text available
Relevance feedback (RF) mechanisms are widely adopted in Content-Based Image Retrieval (CBIR) systems to improve image retrieval performance. However, there exist some intrinsic problems: (1) the semantic gap between high-level concepts and low-level features and (2) the subjectivity of human perception of visual contents. The primary focus of this paper is to evaluate the possibility of inferring the relevance of images based on eye movement data. In total, 882 images from 101 categories are viewed by 10 subjects to test the usefulness of implicit RF, where the relevance of each image is known beforehand. A set of measures based on fixations are thoroughly evaluated which include fixation duration, fixation count, and the number of revisits. Finally, the paper proposes a decision tree to predict the user's input during the image searching tasks. The prediction precision of the decision tree is over 87%, which spreads light on a promising integration of natural eye movement into CBIR systems in the future.
Conference Paper
Full-text available
Image retrieval technology has been developed for more than twenty years. However, the current image retrieval techniques cannot achieve a satisfactory recall and precision. To improve the effectiveness and efficiency of an image retrieval system, a novel content-based image retrieval method with a combination of image segmentation and eye tracking data is proposed in this paper. In the method, eye tracking data is collected by a non-intrusive table mounted eye tracker at a sampling rate of 120 Hz, and the corresponding fixation data is used to locate the human's Regions of Interest (hROIs) on the segmentation result from the JSEG algorithm. The hROIs are treated as important informative segments/objects and used in the image matching. In addition, the relative gaze duration of each hROI is used to weigh the similarity measure for image retrieval. The similarity measure proposed in this paper is based on a retrieval strategy emphasizing the most important regions. Experiments on 7346 Hemera color images annotated manually show that the retrieval results from our proposed approach compare favorably with conventional content-based image retrieval methods, especially when the important regions are difficult to be located based on visual features.
Conference Paper
Full-text available
Abstract In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm,optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two dierent,systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos. Q. Zhang, K. Chandramouli, U. Damnjanovic, T. Piatrik and E. Izquierdo are with Dept. of Elec-
Article
Full-text available
We introduce a new search strategy, in which the information retrieval (IR) query is in- ferred from eye movements measured when the user is reading text during an IR task. In training phase, we know the users' interest, that is, the relevance of training documents. We learn a predictor that produces a "query" given the eye movements; the target of learn- ing is an "optimal" query that is computed based on the known relevance of the train- ing documents. Assuming the predictor is universal with respect to the users' interests, it can also be applied to infer the implicit query when we have no prior knowledge of the users' interests. The result of an empir- ical study is that it is possible to learn the implicit query from a small set of read doc- uments, such that relevance predictions for a large set of unseen documents are ranked sig- nificantly better than by random guessing.
Article
Full-text available
It has long been documented that emotional and sensory events elicit a pupillary dilation. Is the pupil response a reliable marker of a visual detection event while viewing complex imagery? In two experiments where viewers were asked to report the presence of a visual target during rapid serial visual presentation (RSVP), pupil dilation was significantly associated with target detection. The amplitude of the dilation depended on the frequency of targets and the time of target presentation relative to the start of the trial. Larger dilations were associated with trials having fewer targets and with targets viewed earlier in the run. We found that dilation was influenced by, but not dependent on, the requirement of a button press. Interestingly, we also found that dilation occurred when viewers fixated a target but did not report seeing it. We will briefly discuss the role of noradrenaline in mediating these pupil behaviors.
Conference Paper
Full-text available
In order to help users navigate an image search system, one could provide explicit information on a small set of images as to which of them are relevant or not to their task. These rankings are learned in order to present a user with a new set of images that are relevant to their task. Requiring such explicit information may not be feasible in a number of cases, we consider the setting where the user provides implicit feedback, eye movements, to assist when performing such a task. This paper explores the idea of implicitly incorporating eye movement features in an image ranking task where only images are available during testing. Previous work had demonstrated that combining eye movement and image features improved on the retrieval accuracy when compared to using each of the sources independently. Despite these encouraging results the proposed approach is unrealistic as no eye movements will be presented a-priori for new images (i.e. only after the ranked images are presented would one be able to measure a user’s eye movements on them). We propose a novel search methodology which combines image features together with implicit feedback from users’ eye movements in a tensor ranking Support Vector Machine and show that it is possible to extract the individual source-specific weight vectors. Furthermore, we demonstrate that the decomposed image weight vector is able to construct a new image-based semantic space that outperforms the retrieval accuracy than when solely using the image-features.
Article
Full-text available
Several recent studies have reported success in applying EEG-based signal analysis to achieve accurate single-trial classification of responses to visual target detection. Pupil responses are proposed as a complementary modality that can support improved accuracy of single-trial signal analysis. We develop a pupillary response feature-extraction and -selection procedure that helps to improve the classification performance of a system based only on EEG signal analysis. We apply a two-level linear classifier to obtain cognitive-task-related analysis of EEG and pupil responses. The classification results based on the two modalities are then fused at the decision level. Here, the goal is to support increased classification confidence through the inherent modality complementarities. The fusion results show significant improvement over classification performance based on a single modality.
Article
Full-text available
We investigate how users interact with the results page of a WWW search engine using eye-tracking. The goal is to gain insight into how users browse the presented abstracts and how they select links for further exploration. Such understanding is valuable for improved interface design, as well as for more accurate interpretations of implicit feedback (e.g. clickthrough) for machine learning. The following presents initial results, focusing on the amount of time spent viewing the presented abstracts, the total number of abstract viewed, as well as data like query word frequency [6]. Howev tracking, these measurements can at best give in diameter, as a lar measures of how thoroughly searchers evaluate their results set.
Article
LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Article
LIBSVM is a library for support vector machines (SVM). Its goal is to help users to easily use SVM as a tool. In this document, we present all its imple-mentation details. For the use of LIBSVM, the README file included in the package and the LIBSVM FAQ provide the information.
Conference Paper
Query formulation and efficient navigation through data to reach relevant results are undoubtedly major challenges for image or video retrieval. Queries of good quality are typically not available and the search process needs to rely on relevance feedback given by the user, which makes the search process iterative. Giving explicit relevance feedback is laborious, not always easy, and may even be impossible in ubiquitous computing scenarios. A central question then is: Is it possible to replace or complement scarce explicit feedback with implicit feedback inferred from various sensors not specifically designed for the task? In this paper, we present preliminary results on inferring the relevance of images based on implicit feedback about users' attention, measured using an eye tracking device. It is shown that, in reasonably controlled setups at least, already fairly simple features and classifiers are capable of detecting the relevance based on eye movements alone, without using any explicit feedback.
Conference Paper
In this paper we propose an implicit relevance feedback method with the aim to improve the performance of known Content Based Image Retrieval (CBIR) systems by re-ranking the retrieved images according to users' eye gaze data. This represents a new mechanism for implicit relevance feedback, in fact usually the sources taken into account for image retrieval are based on the natural behavior of the user in his/her environment estimated by analyzing mouse and keyboard interactions. In detail, after the retrieval of the images by querying CBIRs with a keyword, our system computes the most salient regions (where users look with a greater interest) of the retrieved images by gathering data from an unobtrusive eye tracker, such as Tobii T60. According to the features, in terms of color, texture, of these relevant regions our system is able to re-rank the images, initially, retrieved by the CBIR. Performance evaluation, carried out on a set of 30 users by using Google Images and "pyramid" like keyword, shows that about the 87% of the users is more satisfied of the output images when the re-raking is applied.
Article
Reviews studies of eye movements in reading and other information-processing tasks such as picture viewing, visual search, and problem solving. The major emphasis of the review is on reading as a specific example of the more general phenomenon of cognitive processing. Basic topics discussed are the perceptual span, eye guidance, integration across saccades, control of fixation durations, individual differences, and eye movements as they relate to dyslexia and speed reading. In addition, eye movements and the use of peripheral vision and scan paths in picture perception, visual search, and pattern recognition are discussed, as is the role of eye movements in visual illusion. The basic theme of the review is that eye movement data reflect the cognitive processes occurring in a particular task. Theoretical and practical considerations concerning the use of eye movement data are also presented. (7½ p ref)
Article
(1)In cats with a light-driven pupillary response, dilatation-correlated single neurons have been recorded near, but not in, the Edinger-Westphal nucleus. The firing rates of these units range from 1 to 80 pitts (spikes/sec).(2)Focal electrical stimulation was performed confirming the dilatation action of these cells as well as the constriction activity of the Edinger-Westphal region itself.(3)During light stimulation the rate of change of the average neural response is faster than the concurrently averaged pupil area response.
Conference Paper
This paper explains a method for leveraging the standard video timeline widget as an interactive visualization of image features. An eye-tracking experiment is described with results that indicate that such a widget increases task efficiency without increasing complexity while being easily learned by experiment participants.
Article
A model of the pathways controlling the size of the human pupil is presented. Computer simulation of this model demonstrates the role played by each of the elements in the pupil pathways. Simulations of the effects of drugs and a few common abnormalities in the system also help to illustrate the workings of the internal processes. Computer models of this type can be used as teaching aids or as tools for testing of hypotheses regarding the system. Copyright © 1985 by The Institute of Electrical and Electronics Engineers, Inc.
Measuring the utility of gaze detection for task modeling: A preliminary study
  • P Brooks
  • K Y Phang
  • R Bradley
  • D Oard
  • R White
  • F Guimbretire
A qualitative look at eye-tracking for implicit relevance feedback In &lt;i&gt;Proceedings of the 2nd International Workshop on Context-Based Information Retrieval&lt
  • K K Moe
  • J M Jensen
  • B Larsen