
Jean MartinetUniversité Côte d'Azur · I3S Research Laboratory
Jean Martinet
PhD
About
83
Publications
12,119
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
393
Citations
Citations since 2017
Introduction
Publications
Publications (83)
Why do neurons communicate through spikes? By definition, spikes are all-or-none neural events which occur at continuous times. In other words, spikes are on one side binary, existing or not without further details, and on the other, can occur at any asynchronous time, without the need for a centralized clock. This stands in stark contrast to the a...
Why do neurons communicate through spikes? By definition, spikes are all-or-none neural events which occur at continuous times. In other words, spikes are on one side binary, existing or not without further details, and on the other can occur at any asynchronous time, without the need for a centralized clock. This stands in stark contrast to the an...
Foveation can be defined as the organic action of directing the gaze towards a visual region of interest, to acquire relevant information selectively. With the recent advent of event cameras, we believe that taking advantage of this visual neuroscience mechanism would greatly improve the efficiency of event-data processing. Indeed, applying foveati...
In this paper, we introduce a novel approach for face stereo reconstruction in passive stereo vision system. Our approach is based on the generation of a facial disparity map, requiring neither expensive devices nor generic face models. It consists of incorporating face properties in the disparity estimation to enhance the 3D face reconstruction. A...
Visual attention can be defined as the behavioral and cognitive process of selectively focusing on a discrete aspect of sensory cues while disregarding other perceivable information. This biological mechanism, more specifically saliency detection, has long been used in multimedia indexing to drive the analysis only on relevant parts of images or vi...
Several approaches have been proposed in the area of Automatic Image Annotation (AIA) in order to exploit the relationships between words that are extracted from image categories, and to automatically generate annotation words for a given image. Other methods exploit ontologies, where the annotation keywords were derived from ontology to improve im...
Image classification is one of the most important topics in computer vision. It became crucial for large image datasets. In the literature, several image classification approaches are proposed. In this context, Bag-of-Visual Words (BoVW) model has been widely used. The BoVW model relies on building visual vocabulary and images are represented as hi...
The bio-inspired concept of Spike-Timing-Dependent Plasticity (STDP) is derived from neurobiology and increasingly used in Spiking Neural Networks (SNNs) nowadays. Mostly found in unsupervised learning, though recent work has shown its usefulness in supervised or reinforced paradigms too, STDP is a key element to understanding SNN architectures' le...
With the rapid growth of image collections, image classification and annotation has been active areas of research with notable recent progress. Bag-of-Visual-Words (BoVW) model, which relies on building visual vocabulary, has been widely used in this area. Recently, attention has been shifted to the use of advanced architectures which are character...
Content-based image retrieval systems are meant to retrieve the most similar images of a collection to a query image. One of the most well-known models widely applied for this task is the bag of visual words (BoVW) model. In this paper, we introduce a study of different information gain models used for the construction of a visual vocabulary. In th...
We introduce a new test collection named FoxFaces, dedicated to researchers in face recognition and analysis. The creation of this dataset was motivated by a lack encountered in the existing 3D/4D datasets. FoxFaces contains 3 face datasets obtained with several devices. Faces are captured with different changes in pose, expression and illumination...
With the availability of massive amounts of digital images in personal and on-line collections, effective techniques for navigating, indexing and searching images become more crucial. In this article, we rely on the image visual content as the main source of information to represent images. Starting from the bag of visual words (BOW) representation...
The annotation of video streams by automatic content analysis is a growing field of research. The possibility of recognising persons appearing in TV shows allows to automatically structure ever-growing video archives. We propose a new descriptor to re-identify persons featured in videos, that is to say, to spot all occurrences of persons throughout...
This paper introduces a novel person track dataset dedicated to person re-identification. The dataset is built from a set of real life TV shows broadcasted from BFMTV and LCP TV French channels, provided during the REPERE challenge. It contains a total of 4,604 persontracks (short video sequences featuring an individual with no background) from 266...
Automatic landmark identification is one of the hot research topics in computer vision domain. Efficient and robust identification of landmark points is a challenging task, especially in a mobile context. This paper addresses the pruning of near-duplicate images for creating representative training image sets to minimize overall query processing co...
In this paper, we propose a novel gender recognition framework based on a fuzzy inference system (FIS). Our main objective is to study the gain brought by FIS in presence of various visual sensors (e.g., hair, mustache, inner face). We use inner and outer facial features to extract input variables. First, we define the fuzzy statements and then we...
Cet article présente un système d'identification de personnes dans des flux multimédia. Ce système a été engagé dans le défi REPERE, co-organisé par l'ANR et la DGA et qui s'est terminé en 2014. La tâche principale du défi consistait à identifier des individus apparaissant dans au moins une des modalités portées par la vidéo, qu'il s'agisse de locu...
This paper introduces a bi-modal face recognition approach. The objective is to study how combining depth and intensity information can increase face recognition precision. In the proposed approach, local features based on LBP (Local Binary Pattern) and DLBP (Depth Local Binary Pattern) are extracted from intensity and depth images respectively. Ou...
In this paper, we propose an original content-based image retrieval method using bag-of-words dedicated to building matching on mobile devices. In the literature, the repetitiveness of visual words in natural scenes, and especially in building images, has been demonstrated. Assuming images are composed of a set of elementary blocks, we represent th...
In this paper, we introduce a novel approach for face depth estimation in a passive stereo vision system. Our approach is based on rapid generation of facial disparity maps, requiring neither expensive devices nor generic face models. It consists in incorporating face properties into the disparity estimation process to enhance the 3D face reconstru...
This paper presents a novel descriptor for face depth images, generalizing the well-known Local Binary Pattern (LBP), in order to enhance its discriminative power for smooth depth images. The proposed descriptor is based on detecting shape patterns from face surfaces and enables accurate and fast description of shape variation in depth images. It i...
This paper describes a multimodal person recognition system for video broadcast developed for participating to the Defi-Repere challenge. The main track of this challenge targets the identification of all persons occurring in a video either in the audio modality (speakers) or the image modality (faces). This system is developed by the PERCOL team i...
In content based image retrieval, one of the most important step is the construction of image signatures. To do so, a part of state-of-the-art approaches propose to build a visual vocabulary. In this paper, we propose a new methodology for visual vocabulary construction that obtains high retrieval results. Moreover, it is computationally inexpensiv...
The goal of the PERCOL project is to participate to the REPERE multimodal evaluation program by building a consortium combining different scientific fields (audio, text and video) in order to perform person recognition in video documents. The two main scientific challenges we are addressing are firstly multimodal fusion algorithms for automatic per...
This paper presents an automatic way to discover pixels in a face image that improves the facial expression recognition results. Main contribution of our study is to provide a practical method to improve classification performance of classifiers by selecting best pixels of interest. Our method exhaustively searches for the best and worst feature wi...
Our goal is to automatically identify faces in TV content without pre-defined dictionary of identities. Most of methods are based on identity detection (from OCR and ASR) and require a propagation strategy based on visual clusterings. In TV content, people appear with many variation making the clustering very difficult. In this case, identifying sp...
We present an application of gaze tracking to image and video indexing, in the form of a model for selecting and weighting Regions of Interest (RoIs). Image/video indexing refers to the process of creating a synthetic representation of the media, for instance for retrieval purposes. It usually consists in labeling the media with semantic keywords d...
In this paper, we introduce a novel approach for face stereo reconstruction in passive stereo vision system. Our approach is based on the generation of a facial disparity map, requiring neither expensive devices nor generic face models. It consists of incorporating face properties in the disparity estimation to enhance the 3D face reconstruction. A...
Identifier et nommer à chaque instant d'une vidéo l'ensemble des personnes présentes à l'image ou s'exprimant dans la bande son fait parti de ces nouveaux outils de fouille de données. D'un point de vue scientifique la reconnaissance de personnes dans des documents audiovisuels est un problème difficile à cause des différentes ambigu"ités que prése...
Résumé L'approche populaire des "sacs de mots visuels" pour la re-présentation et la recherche de documents visuels consiste à décrire les images (ou trames d'une vidéo) à l'aide d'en-sembles de descripteurs, qui correspondent à des carac-téristiques de bas niveau discrétisées. La plupart des ap-proches existantes utilisant les mots visuels s'inspi...
Résumé Dans cet article, nous introduisons une nouvelle approche pour la reconstruction 3D de visage dans un système de stéréovision passive. L'approche vise une génération de la carte de disparité du visage qui ne nécessite pas l'utilisa-tion d'équipements onéreux ni de modèles génériques pour le visage. L'algorithme proposé consiste à effectuer u...
Résumé Ce travail propose une méthode pour détecter de manière automatique les régions qui contribuent le plus à une bonne classification des visages par rapport à des expressions prédéfinies : joie, surprise, etc. Notre méthode détermine les régions ayant le plus, (respectivement le moins) de pouvoir discriminant en utilisant un réseau de neurones...
Nous présentons dans cet article une approche pour ré-identifier des personnes, c'est-à-dire établir une correspondance d'identité de toutes les occurrences de personnes, dans les journaux télévisés. Notre approche est basée sur des histogrammes spatio-temporels, qui sont des histogrammes contenant, en plus des données de comptage de pixels dans un...
In this paper, we introduce a novel approach for face stereo reconstruction based on stereo vision. The approach is based on real time generation of facial disparity map, requiring neither expensive devices nor generic face model. An algorithm based on incorporating topological information of the face in the disparity estimation process is proposed...
In this article, we present a person re-identification in news video approach. It consists of matching the identity of all occurences of a person. Our approach is based on space-time histograms, that are histograms containing, in addition to pixel counts in a video, their position in space and time. Space-time histograms allow a higher precision th...
We present in this chapter a classification of image descriptors, from the low level to the high level, introducing the notion of intermediate level. This level denotes a representation level lying between low-level features - such as color histograms, texture or shape descriptors, and high-level features - semantic concepts. In a chain of process...
Having effective methods to access the desired images is essential nowadays with the availability of a huge amount of digital images. We propose a higher-level visual representation that enhances the traditional part-based Bag of Visual Words (BOW) representation in two aspects. Firstly, we introduce a new multilayer semantic significance analysis...
In this paper, we lay out a relational approach for indexing and retrieving photographs from a collection. The increase of digital image acquisition devices, combined with the growth of the World Wide Web, requires the development of information retrieval (IR) models and systems that provide fast access to images searched by users in databases. The...
Having effective methods to access the images with desired object is essential nowadays with the availability of huge amount
of digital images. We propose a semantic higher-level visual representation which improves the traditional part-based bag-of
words image representation, in two aspects. First, we propose a semantic model to generate a semanti...
Having effective methods to access the desired images is essential nowadays with the availability of huge amount of digital images. The proposed approach is based on an analogy between content-based image retrieval and text retrieval. The aim of the approach is to build a meaningful mid-level representation of images to be used later for matching b...
Having effective methods to access the desired images is essential nowadays with the availability of huge amount of digital images. The proposed approach is based on an analogy between image retrieval containing desired objects (object-based image retrieval) and text retrieval. We propose a higher-level visual representation, for object-based image...
In this chapter we illustrate how MPEG-7 and MPEG-21 standards can serve to the implementation of domain dependant content aggregation and delivery frameworks. In the scope of an ITEA2 project called CAM4Home, we have conceived a metadata metamodel enhancing the aggregation and context-dependent delivery of complex content and services. The metamod...
Having effective methods to access the desired images is essential nowadays with the availability of a huge amount of digital
images. The proposed approach is based on an analogy between content-based image retrieval and text retrieval. The aim of
the approach is to build a meaningful mid-level representation of images to be used later on for match...
In this paper, we develop a novel image representation method which is based firstly on constructing visual words based on a local patch extraction and a fusion of descriptors. The spatial constitution of an image is represented with a mixture of n Gaussians in the feature space. The new spatial weighting scheme consists in weighting visual words a...
This paper describes a new generic metadata model, called CAM Metamodel, that merges altogether information about content, services, physical and technical environment in order to enable homogenous delivery and consumption of content. We introduce a metadata model that covers all these aspects and which can be easily extended so as to absorb new ty...
We have used a support vector machine (SVM) with a radial basis function kernel, a K nearest neighbor algorithm (KNN) with K=10 and the Frobenius distance for the experiments. In (Lablack & al., 2008), they note that the head pose recognition accuracies increase with the number of the training samples which is consistent with the typical
Visual media is one of the most widely used in our societies. With the increasing demand for digital image and video technologies in applications such as communication, advertising, or entertainment, there is a growing need for assessment tools to evaluate the quality of visual media understanding. It is necessary to quantify the adequacy of an aud...
Eye movements are arguably the most natural and repetitive movement of a human being. The most mundane activity, such as watching television or reading a newspaper, involves this automatic activity which consists of shifting our gaze from one point to another. Identification of the components of eye movements (fixations and saccades) is an essentia...
The increase of digital image and video acquisition devices, combined with the growth of the World Wide Web, requires the
definition of user-relevant similarity matching methods providing meaningful access to documents searched by users among large
amounts of data. The aim of our work is to define media objects for document description suited to im...
Definition: Motion saliency helps to detect moving objects whose motion is discontinuous to its background.
Definition: Tracking in videos consists in following the successive locations of a given object region. We present an application of a covariance based feature used as robust image descriptors and related algorithms for object tracking in video.
This paper proposes a method for generating image-based quizzes from news video achieves. Although there are many types of quizzes, in this work we focus on matching quizzes in which an image is to be matched to one of several choices that are statements. The key to making a successful quiz of this type is to extract choice statements that are simi...
This paper presents a study of the use of association rules in document representation. It consists of an approach for building a compact and meaningful representation of low level features in the documents of a multimedia database. The proposed method is based on the discovery and the study of intra-modal association rules between individual objec...
This paper presents a contribution in the domain of automatic visual document indexing based on inter-modal analysis, in the
form of a statistical indexing model. The approach is based on inter-modal document analysis, which consists in modeling and
learning some relationships between several modalities from a data set of annotated documents in ord...