• Home
  • Pierre-Francois Marteau
Pierre-Francois Marteau

Pierre-Francois Marteau
  • Professor
  • Professor at Université Bretagne Sud, Vannes, France

About

151
Publications
26,131
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,440
Citations
Current institution
Université Bretagne Sud, Vannes, France
Current position
  • Professor
Additional affiliations
January 2012 - present
Institute for Research in IT and Random Systems
Position
  • Professor (Full)
January 1999 - present
Université de Bretagne Sud
Position
  • Professor (Full)

Publications

Publications (151)
Preprint
Full-text available
We address in this article the the quality of the WikiNER corpus, a multilingual Named Entity Recognition corpus, and provide a consolidated version of it. The annotation of WikiNER was produced in a semi-supervised manner i.e. no manual verification has been carried out a posteriori. Such corpus is called silver-standard. In this paper we propose...
Preprint
Full-text available
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN), for multivariate time series classification. The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability, as well as a new way of considering attention. In addition, this architecture...
Article
Full-text available
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Article
Numerous studies show that meteorological conditions have an impact on the emission, dispersion and suspension of pollens in the air. Several allergenic species permanently threaten the health of millions of people in France and that can be extrapolate that this is the case in most part of the world. Hence, preventive information on the risk of pol...
Chapter
Natural language resources are essential for integrating linguistic engineering components into information processing suites. However, the resources available in French are scarce and do not cover all possible tasks, especially for specific business applications. In this context, we present a dataset of French newsletters and their use to predict...
Preprint
Full-text available
This paper has been published in SIGKDD Newsletter exploration (december 2022) . ..... More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Suc...
Article
Full-text available
In this paper, we propose DiFF-RF, an ensemble approach composed of random partitioning binary trees to detect point-wise and collective (as well as contextual) anomalies. Thanks to a distance-based paradigm used at the leaves of the trees, this semi-supervised approach solves a drawback that has been identified in the isolation forest (IF) algorit...
Preprint
In this paper, we propose DiFF-RF, an ensemble approach composed of random partitioning binary trees to detect point-wise and collective (as well as contextual) anomalies. Thanks to a distance-based paradigm used at the leaves of the trees, this semi-supervised approach solves a drawback that has been identified in the isolation forest (IF) algorit...
Conference Paper
Graphs increasingly stand out as an essential data structure in the field of data sciences. To study graphs, or sub-graphs, that characterize a set of observations, it is necessary to describe them formally, in order to characterize equivalence relations that make sense in the scope of the considered application domain. Hence we seek to define a ca...
Presentation
Full-text available
Oral presentation of the paper "Scott: A Method for Representing Graphs as Rooted Trees for Graph Canonization" (10.1007/978-3-030-36687-2_48), made at Complex Networks 2019 at Lisbon. Project website : https://theplatypus.github.io/scott/
Preprint
Full-text available
In this article we address the problem of separation of shape and time components in time series. The concept of shape that we tackle is termed temporally "neutral" to consider that it may possibly exist outside of any temporal specification, as it is the case for a geometric form. We propose to exploit and adapt a probabilistic temporal alignment...
Conference Paper
Full-text available
La fouille d'opinion ciblée est une tâche complexe, susceptible de bé- néficier de l'apport d'approches variées. Nos expérimentations testent des com- binaisons de méthodes sur un corpus d'avis d'internautes concernant les livres. Sur ces données et pour ce qui concerne la polarité de l'opinion, des résultats prometteurs ont été obtenus par une app...
Article
Full-text available
A method called iTEKA, which stands for iterative time elastic kernel averaging, was successfully used for averaging time series. In this paper, we adapt it to GPS trajectories. The key contribution is a denoising procedure that includes an over-sampling scheme, the detection and removal of outlier trajectories, a kernelized time elastic averaging...
Article
Full-text available
In the light of regularized dynamic time warping kernels, this paper re-considers the concept of time elastic centroid for a setof time series. We derive a new algorithm based on a probabilistic interpretation of kernel alignment matrices. This algorithm expressesthe averaging process in terms of a stochastic alignment automata. It uses an iterativ...
Article
Full-text available
This paper introduces a new similarity measure, the covering similarity, that we formally define for evaluating the similarity between a symbolic sequence and a set of symbolic sequences. A pair-wise similarity can also be directly derived from the covering similarity to compare two symbolic sequences. An efficient implementation to compute the cov...
Article
Full-text available
Endowing animated virtual characters with emotionally expressive behaviors is paramount to improving the quality of the interactions between humans and virtual characters. Full-body motion, in particular, with its subtle kinematic variations, represents an effective way of conveying emotionally expressive content. However, before synthesizing expre...
Conference Paper
Full-text available
This paper presents a new corpus, called EMO&LY (EMOtion and AnomaLY), composed of speech and facial video records of subjects that contains controlled anomalies. As far as we know, to study the problem of anomaly detection in discourse by using machine learning classification techniques, no such corpus exists or is available to the community. In E...
Conference Paper
Full-text available
The European "Tenders Electronic Daily" (TED) is a large source of semi-structured and multilingual data that is very valuable to the Natural Language Processing community. This data sets can effectively be used to address complex machine translation, multilingual terminology extraction, text-mining, or to benchmark information retrieval systems. D...
Article
Full-text available
This paper introduces the sequence covering similarity, that we formally define for evaluating the similarity between a symbolic sequence (string) and a set of symbolic sequences (strings). From this covering similarity we derive a pair-wise distance to compare two symbolic sequences. We show that this covering distance is a semimetric. Few example...
Preprint
Full-text available
This paper introduces a new similarity measure, the covering similarity, that we formally define for evaluating the similarity between a symbolic sequence and a set of symbolic sequences. A pair-wise similarity can also be directly derived from the covering similarity to compare two symbolic sequences. An efficient implementation to compute the cov...
Article
Full-text available
Temporal data are naturally everywhere, especially in the digital era that sees the advent of big data and internet of things. One major challenge that arises during temporal data analysis and mining is the comparison of time series or sequences, which requires to determine a proper distance or (dis)similarity measure. In this context, the Dynamic...
Preprint
Full-text available
Temporal data are naturally everywhere, especially in the digital era that sees the advent of big data and internet of things. One major challenge that arises during temporal data analysis and mining is the comparison of time series or sequences, which requires to determine a proper distance or (dis)similarity measure. In this context, the Dynamic...
Chapter
Full-text available
In the scope of gestural action recognition , the size of the feature vector representing movements is in general quite large especially when full body movements are considered. Furthermore, this feature vector evolves during the movement performance so that a complete movement is fully represented by a matrix M of size DxT, whose element \(M_{i,j}...
Conference Paper
This paper presents an evaluation of three different anomaly detector methods over different feature sets. The three anomaly detectors are based respectively on Gaussian Mixture Model (GMM), One-Class SVM and isolation Forest. The considered feature sets are built from personality evaluation and audio signal. Personality evaluations are extracted f...
Conference Paper
Full-text available
This paper presents the design of an anomaly detector based on three different sets of features, one corresponding to some prosodic descriptors and two extracted from Big Five traits. Big Five traits correspond to a simple but efficient representation of a human personality. They are extracted from a manual annotation while prosodic features are ex...
Conference Paper
Full-text available
https://www.isca-speech.org/archive/pdfs/interspeech_2017/fayet17_interspeech.pdf
Article
Full-text available
From the identification of a drawback in the Isolation Forest (IF) algorithm that limits its use in the scope of anomaly detection, we propose two extensions that allow to firstly overcome the previously mention limitation and secondly to provide it with some supervised learning capability. The resulting Hybrid Isolation Forest (HIF) that we propos...
Article
Full-text available
Data mining techniques play an increasing role in the intrusion detection by analyzing network data and classifying it as ’normal’ or ’intrusion’. In recent years, several data mining techniques such as supervised, semi-supervised and unsupervised learning are widely used to enhance the intrusion detection. This work proposes a hybrid intrusion det...
Conference Paper
Full-text available
Dans cet article nous présentons une étude exploitant des méthodes d'apprentissage automatique de structures séquentielles pour extraire des relations sémantiques dans des textes issus de bases d'appels d'offres. L'une des relations que nous considérons concerne l'emprise d'un projet d'aménagement, caractérisée par une association entre les concept...
Preprint
In the light of regularized dynamic time warping kernels, this paper re-considers the concept of time elastic centroid for a setof time series. We derive a new algorithm based on a probabilistic interpretation of kernel alignment matrices. This algorithm expressesthe averaging process in terms of a stochastic alignment automata. It uses an iterativ...
Conference Paper
Full-text available
Dynamic Time Warping (DTW) is considered as a robust measure to compare numerical time series when some time elasticity is required. However, speed is a known major drawback of DTW due to its quadratic complexity. Previous work has mainly considered designing speed optimization based on early-abandoning strategies applied to nearest-neighbor classi...
Conference Paper
In this paper we characterize timpani gestures by temporal kinematic features, containing most information responsible for the sound-producing actions. In order to evaluate the feature sets, a classification approach is conducted under three main attack categories (legato, accent and vertical accent) and sub-categories (dynamics, striking position)...
Conference Paper
Recent results in the affective computing sciences point towards the importance of virtual characters capable of conveying affect through their movements. However, in spite of all advances made on the synthesis of expressive motions, almost all of the existing approaches focus on the translation of stylistic content rather than on the generation of...
Chapter
Designing and controlling virtual characters endowed with expressive gestures requires the modeling of multiple processes, involving high-level abstract representations to low-level sensorimotor models. An expressive gesture is here defined as a meaningful bodily motion which intrinsically associates sense, style, and expressiveness. The main chall...
Article
Full-text available
Les progrès actuels de la robotique offrent des opportunités considérables , pas seulement pour les particuliers mais également les professionnels en mission, tels que des pompiers ou militaires. Cependant, ce type d'opérateurs doit rester focalisé sur sa mission et attend donc un moyen de contrôle qui soit peu intrusif et tr es intuitif. De plus,...
Conference Paper
Virtual characters capable of showing emotional content are considered as more believable and engaging. However, in spite of the numerous psychological studies and machine learning applications trying to decode the most salient features in the expression and perception of affect, there is still no common understanding about how affect is conveyed t...
Article
Full-text available
Dynamic Time Warping (DTW) is considered as a robust measure to compare numerical time series when some time elasticity is required. Even though its initial formulation can be slow, extensive research has been conducted to speed up the calculations. However, those optimizations are not always available for multidimensional time series. In this pape...
Article
Full-text available
At the light of regularized dynamic time warping kernels, this paper reconsider the concept of time elastic centroid (TEC) for a set of time series. From this perspective, we show first how TEC can easily be addressed as a preimage problem. Unfortunately this preimage problem is ill-posed, may suffer from over-fitting especially for long time serie...
Article
Full-text available
We address in this paper the co-clustering and co-classification of bilingual data laying in two linguistic similarity spaces when a comparability measure defining a mapping between these two spaces is available. A new approach that we can characterized as a three-mode analysis scheme, is proposed to mix the comparability measure with the two simil...
Conference Paper
Full-text available
We propose a similarity measure between sentences which combines a knowledge-based measure, that is a lighter version of ESA (Explicit Semantic Analysis), and a distributional measure, Rouge. We used this hybrid measure with two French domain-orientated corpora collected from the Web and we compared its similarity scores to those of human judges. I...
Conference Paper
Full-text available
While human communication involves rich, complex and expressive gestures, available corpora of captured motions used for the animation of virtual characters contain actions ranging from locomotion to everyday life motions. We aim at creating a novel corpus of expressive and meaningful gestures, and we focus on body movements and gestures involved i...
Article
Full-text available
In the field of gestural action recognition, many studies have focused on dimensionality reduction along the spatial axis, to reduce both the variability of gestural sequences expressed in the reduced space, and the computational complexity of their processing. It is noticeable that very few of these methods have explicitly addressed the dimensiona...
Article
Full-text available
This paper proposes some extensions to the work on kernels dedicated to string or time series global alignment based on the aggregation of scores obtained by local alignments. The extensions we propose allow to construct, from classical recursive definition of elastic distances, recursive edit distance (or time-warp) kernels that are positive defin...
Article
Full-text available
Following the pioneering work by (Li and Gaussier, 2010), we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by (Li and Gaussier, 2010), we develop some v...
Article
Full-text available
Dans le domaine de la reconnaissance de gestes isolés, bon nombre de travaux se sont intéressés à la réduction de dimension sur l'axe spatial pour réduire à la fois la complexité algorithmique et la variabilité des réalisations gestuelles. Il est assez étonnant de constater que peu de ces méthodes se sont explicitement penchées sur la réduction de...
Article
Full-text available
Due to the increasing number of graphical applications, the generation of human life-like characters has become an important research topic. Different approaches have been proposed, the combination of motion capture data and machine learning methods being the dominant trend in the last years. Despite the good results produced by these approaches, t...
Article
Full-text available
In the presence of bilingual comparable corpora it is natural to embed the data in two distinct linguistic representation spaces in which a "computational" notion of similarity is potentially defined. As far as these bilingual data are comparable in the sense of a measure of comparability also computable (Li et Gaussier, 2010), we can establish a c...
Article
Full-text available
The similarity search problem is one of the main problems in time series data mining. Traditionally, this problem was tackled by sequentially comparing the given query against all the time series in the database, and returning all the time series that are within a predetermined threshold of that query. But the large size and the high dimensionality...
Conference Paper
Full-text available
Using kernels to embed non linear data into high dimensional spaces where linear analysis is possible has become utterly classical. In the case of the Gaussian kernel however, data are distributed on a hypersphere in the corresponding Reproducing Kernel Hilbert Space (RKHS). Inspired by previous works in non-linear statistics, this article investig...
Preprint
Full-text available
This paper proposes a framework dedicated to the construction of what we call discrete elastic inner product allowing one to embed sets of non-uniformly sampled multivariate time series or sequences of varying lengths into inner product space structures. This framework is based on a recursive definition that covers the case of multiple embedded tim...
Article
Full-text available
This paper proposes a framework dedicated to the construction of what we call discrete elastic inner product allowing one to embed sets of non-uniformly sampled multivariate time series or sequences of varying lengths into inner product space structures. This framework is based on a recursive definition that covers the case of multiple embedded tim...
Article
Full-text available
In the last two decades, a lot of protein 3D shapes have been discovered, characterized and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. This paper presents an approach entitled LNA (Laplacian Norm Alignment) th...
Conference Paper
Full-text available
Researches on signed languages still strongly dissociate lin- guistic issues related on phonological and phonetic aspects, and gesture studies for recognition and synthesis purposes. This paper focuses on the imbrication of motion and meaning for the analysis, synthesis and evaluation of sign language gestures. We discuss the relevance and interest...
Article
Full-text available
The rapid development of Linked Data leads to a proliferation of errors in pub-lished data, primarily related to inconsistencies between data instances and their related ontologies. This problem alters the reliability of Semantic Web applications when they in-volve the analysis or the exploitation of heterogeneous rdf data sets. We focus in this ar...
Article
Full-text available
The rapid Web development is cursed by the increase of errors in published data, mainly related to inconsistencies with domain's ontologies. This problem can alter application's reliability when analysis of independent and heterogeneous rdf data is in-volved. Therefore, it is important to improve the quality of the published data to promote the dev...
Article
Full-text available
We propose in this paper a framework dedicated to the construction of what we call time elastic inner products that allows embedding sets of non-uniformly sampled multivariate time series of varying lengths into vector space structures. This framework is based on a recursive definition that covers the case of multiple embedded time elastic dimensio...
Conference Paper
Full-text available
Fast retrieval of time series that are similar to a given pattern in large databases is a problem which has received a lot of attention in the last decade. The high dimensionality and large size of time series databases make sequential scanning inefficient to handle the similarity search problem. Several dimensionality reduction techniques have bee...
Conference Paper
Full-text available
In this paper we present a new generic frame that boosts the performance of different time series dimensionality reduction techniques by using a fast-and-dirty filter that we combine with the lower bounding condition of the dimensionality reduction technique to increase the pruning power. This fast-and-dirty filter is based on an optimal approximat...
Conference Paper
Full-text available
Similarity search in time series data mining is a problem that has attracted increasing attention recently. The high dimensionality and large volume of time series databases make sequential scanning inefficient to tackle this problem. There are many representation techniques that aim at reducing the dimensionality of time series so that the search...
Conference Paper
Full-text available
We propose a new multi-resolution indexing and retrieval method of the similarity search problem in time series databases. The proposed method is based on a fast-and-dirty filtering scheme that iteratively reduces the search space using several resolution levels. For each resolution level the time series are approximated by an appropriate function....
Article
This paper proposes some extensions to the work on kernels dedicated to string alignment (biological sequence alignment) based on the summing up of scores obtained by local alignments with gaps. The extensions we propose allow to construct, from classical time-warp distances, what we called summative time-warp kernels that are positive definite if...
Article
Full-text available
Local link analysis of topical graphs on the Web allows to experiment focused crawling strategies in a detailed way. In this scope, models, parameters and metrics used to orientate the crawler can be better understood, tuned and evaluated. We develop a methodological and experimental approach that exploits link analysis in order to determine what c...
Conference Paper
Full-text available
A Tabu move merge split (TMMS) algorithm is proposed for the polygonal approximation problem. TMMS incorporates a tabu principle to avoid premature convergence into local minima. TMMS is compared to optimal, near to optimal top down multi-resolution (TDMR) and classical split and merge heuristics solutions. Experiments show that potential improveme...
Article
Full-text available
We develop a top-down multiresolution algorithm (TDMR) to solve iteratively the problem of polygonal curve approximation. This algorithm provides nested polygonal approximations of an input curve. We show theoretically and experimentally that, if the simplification algorithm \({\mathcal{A}}\) used between any two successive levels of resolution sat...
Conference Paper
Full-text available
We propose a data replication scheme on a random apollonian P2P overlay that benefits from the small world and scale free properties. The proposed algorithm features a replica density estimation and a space filling mechanism designed to avoid redundant messages. Not only it provides uniform replication of the data stored into the network but it als...
Article
Full-text available
We present and study in this paper a simple algorithm that produces so called growing Parallel Random Apollonian Networks (P-RAN) in any dimension d. Analytical derivations show that these networks still exhibit small-word and scale-free characteristics. To characterize further the structure of P-RAN, we introduce new parameters that we refer to as...
Article
Full-text available
In a way similar to the string-to-string correction problem, we address discrete time series similarity in light of a time-series-to-time-series-correction problem for which the similarity between two time series is measured as the minimum cost sequence of edit operations needed to transform one time series into another. To define the edit operatio...
Conference Paper
Full-text available
We introduce a new P2P exploration strategy based on an extension of space filling curve principles. This strategy is exhaustive and do not generates redundant messages. Initiated at the source node of a search query, a walker is sent to explore the neighborhood of this node. This walker is carrying an increasing list of visited nodes as an Ariadne...
Article
Full-text available
The aim of this paper is the supervised classification of semi-structured data. A formal model based on bayesian classification is developed while addressing the integration of the document structure into classification tasks. We define what we call the structural context of occurrence for unstructured data, and we derive a recursive formulation in...
Conference Paper
Full-text available
The use of multi-agent topical Web crawlers based on the endogenous fitness model raises the problem of controlling the population of agents. We tackle this question through an energy based model to balance the reproduction/life expectancy of agents. Our goal is to simplify the tuning of parameters and to optimize the use of resources available for...
Conference Paper
Full-text available
Similarity search of time series has attracted many researchers recently. In this scope, reducing the dimensionality of data is required to scale up the similarity search. Symbolic representation is a promising technique of dimensionality reduction, since it allows researchers to benefit from the richness of algorithms used for textual databases. T...
Conference Paper
Full-text available
The problem of similarity search has attracted increasing attention recently, because it has many applications. Time series are high dimensional data objects. In order to utilize an indexing structure that can effectively handle large time series databases, we need to reduce the dimensionality of these data objects. One of the promising techniques...
Conference Paper
Full-text available
This paper describes a method to analyze human motion, based on the reduction of multidimensional captured motion data. A Dynamic Program- ming Piecewise Linear Approximation model is used to automatically extract in an optimal way key-postures distributed along the motion data. This non uni- form sub-sampling can be exploited for motion compressio...
Article
Similarity search is a fundamental problem in information technology. The main difficulty of this problem is the high dimensionality of the data objects. In large time series databases, it's important to reduce the dimensionality of these data objects, so that we can manage them. Symbolic representation is a promising technique of dimensionality re...
Article
Full-text available
We introduce a decentralized replication strategy for peer-to-peer file exchange based on exhaustive exploration of the neighborhood of any node in the network. The replication scheme lets the replicas evenly populate the network mesh, while regulating the total number of replicas at the same time. This is achieved by self adaptation to entering or...
Article
Full-text available
This technical report details a family of time warp distances on the set of discrete time series. This family is constructed as an editing distance whose elementary operations apply on linear segments. A specific parameter allows controlling the stiffness of the elastic matching. It is well suited for the processing of event data for which each dat...
Article
Full-text available
This paper addresses the problem of reconstructing partially observed stochastic processes. The L1 convergence of the filtering and smoothing densities in state space models is studied, when the transition and emission densities are estimated using non parametric kernel estimates. An application to real data is proposed, in which a wave time series...
Article
Full-text available
We propose specific data structures designed to the indexing and retrieval of information elements in heterogeneous XML data bases. The indexing scheme is well suited to the management of various contextual searches, expressed either at a structural level or at an information content level. The approximate search mechanisms are based on a modified...
Conference Paper
Full-text available
We present in this article a method to wisely replicate information in an unstructured peer-to-peer network. We make no assumption on the network topology. Thematic agents move randomly on the network and estimate the level of redundancy of the specific information they are dealing with. They can delete or create replicas if this estimated redundan...

Network

Cited By