J.-C. LamirelLorrain de Recherche en Informatique et Ses Applications | Loria
J.-C. Lamirel
PhD with Research Accreditation
About
182
Publications
21,074
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
848
Citations
Introduction
Additional affiliations
September 1997 - September 2015
September 1997 - present
September 1990 - present
Publications
Publications (182)
Mining the content of scientific publications is increasingly used to investigate the practice of science and the evolution of research domains. Topic models, among which LDA (statistical bag-of-words approach) and Top2Vec (embeddings approach), have notably been shown to provide rich insights into the thematic content of disciplinary fields, their...
In this paper we are addressing one specific problem of Context-Adaptive Document Analysis: the generation of specific learning data. While many Document Analysis solutions exist for well described cases, it is often difficult to adapt them to other contexts. We present an ongoing research methodology for generating synthetic documents that are not...
The aim of spatial econometrics is to analyze and/or predict the relationship between one dependent variable Y with other variables, building a model that takes into account the spatial dependence. Usual spatial econometric models are based on a neighbourhood matrix whose elements are linked to geographical distances. We propose to use distances be...
Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative and error-prone process. To assist these experts during modelling and labelling, the authors...
This paper focuses on using feature salience to evaluate the quality of a partition when dealing with hard clustering. It is based on the hypothesis that a good partition is an easy to label partition, i.e. a partition for which each cluster is made of salient features. This approach is mostly compared to usual approaches relying on distances betwe...
The influence of authors is mostly based on their capacity to form specific self-contained and/or active research communities or topics while also inspiring fruitful spin-off research derived from those communities or topics. Accurately estimating author influence and its indirect effect in inspiring external research creativity must thus be based...
While graph embedding aims at learning low-dimensional representations of nodes encompassing the graph topology, word embedding focus on learning word vectors that encode semantic properties of the vocabulary. The first finds applications on tasks such as link prediction and node classification while the latter is systematically considered in natur...
The 15th International Conference on Webometrics, Informetrics and Scientometrics & 20th COLLNET Meet-ing (COLLNET 2019) was held on November 6-8, 2019 in Dalian (China). It has been hosted by the WISE Lab in Da-lian University of Technology (DLUT), co-organized by the “Committee of Theory of Science of Science and Discipline Construction, Chinese...
In the first part of this paper, we shall discuss the historical context of Science of Science both in China and at world level. In the second part, we use the unsupervised combination of GNG clustering with feature maximization metrics and associated contrast graphs to present an analysis of the contents of selected academic journal papers in Scie...
The complexity of urban segregation challenges researchers to develop powerful and complex mathematical tools for assessing it. With more and more fine-grained and massive data becoming available these last years, individual-based models are now made possible in practice. Very recently, a mathematical object called multiscalar fingerprint [1], cont...
Introduced in the context of machine learning, the Feature F-measure is a statistical feature selection metric without parameters that allows to describe classes through a set of salient features. It was shown efficient for classification, cluster labeling and clustering model quality measurement. In this paper, we introduce the Node F-measure, its...
Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this...
Feature maximization is an alternative measure, as compared to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures, like Euclidean distance or correlation distance. One of the key advantages of this measure taking inspiration both from Galois lattice theory and information retrieval is that it is operat...
La F-Mesure de trait est une métrique de sélection de variables statistique
sans paramètres qui a montré de bonnes performances pour la classification,
l’étiquetage de clusters ou encore la mesure de qualité des clusters. Dans
cet article, nous proposons d’évaluer son utilisation dans le contexte des graphes
de terrain et de leur structure communau...
Feature maximization (F-max)
is an unbiased
quality estimation
metric of unsupervised classification (clustering) that favours clusters with a maximal feature F-measure value. In this article we show that an adaptation of this metric within the framework of supervised classification allows efficient feature selection and feature contrasting to be p...
This paper deals with a major challenge in clustering that is optimal model selection. It presents new efficient clustering quality indexes relying on feature maximization, which is an alternative measure to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures such as Euclidean distance or correlation di...
This paper deals with a major challenge in clustering that is optimal model selection. It presents new efficient clustering quality indexes relying on feature maximization, which is an alternative measure to usual distributional measures relying on entropy, Chi-square metric or vector-based measures such as Euclidean distance or correlation distanc...
In this paper we first propose a state of the art on the methods for the visualization and for the interpretation of textual data, and in particular of scientific data. We then shortly present our contributions to this field in the form of original methods for the automatic classification of documents and easy interpretation of their content throug...
Dans cet article, nous proposons tout d'abord un état de l'art des méthodes pour la visualisation et l'interprétation des données textuelles, et en particulier des données scientifiques. Nous présentons ensuite nos contributions à ce domaine, sous la forme de méthodes originales pour la classification automatique de documents et l'interprétation fa...
This paper
presents new cluster quality indexes which can be efficiently applied for a low-to-high dimensional range of data and which are tolerant to noise. These indexes relies on feature maximization, which is an alternative measure to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures such as Eucli...
In this paper, we aim to give insights about the self-organization of scientific collaboration. To that aim, we describe a new framework to monitor the evolution of a collaboration graph that models the co-authorship of research papers authors. We use community structure of the network as a high-level description of its self-organization and thus c...
As regard to the evolution of the concept of text and to the continuous growth of textual information of multiple nature which is available online, one of the important issues for linguists and information analysts for building up assumptions and validating models is to exploit efficient tools for textual analysis, able to adapt to large volumes of...
Classifications which group together verbs and a set of shared syntactic and semantic properties have proven to be useful in both linguistics and Natural Language Processing tasks. However, most existing approaches for automatically acquiring verb classes fail to associate the verb classes produced with an explicit characterisation of the syntactic...
Feature maximization is an alternative measure, as compared to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures, like Euclidean distance or correlation distance. One of the key advantages of this measure is that it is operational in an incremental mode both on clustering and on traditional classifica...
Labelling maximization (F-max) is an unbiased metric for estimation of the quality of non-supervised classification (clustering) that promotes the clusters with a maximum value of feature F-measure. In this paper, we show that an adaptation of this metric within the supervised classification allows to perform a selection of features and to calculat...
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we show that a simple adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. The method is experie...
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised class...
This paper focuses on a subtask of the QUAERO1 research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this paper is to propose a new method for the classification of scientific papers, developed in the context of an international patents classifica...
The development of dynamic information analysis methods, like incremental clustering, concept drift management and novelty detection techniques, is becoming a central concern in a bunch of applications whose main goal is to deal with information which is varying over time. These applications relate themselves to very various and highly strategic do...
Face à l’évolution de la notion de texte et à la croissance continuelle de l’information textuelle, de multiple natures, disponible en ligne, un des enjeux important pour les linguistes, mais également pour les didacticiens, pour pouvoir étayer ou construire des hypothèses et valider des modèles, est celui de pouvoir disposer d’outils d’analyse tex...
This paper deals with a new feature selection and feature contrasting approach for enhancing classification of both numerical and textual data. The method is experienced on different types of reference datasets. The paper illustrates that the proposed approach provides a very significant performance increase in all the studied cases clearly figurin...
Analyse desévolutionsdesévolutions et des interactions entre do-maines scientifiques: GRAFSEL, association de la sélection de variables et de la représentation graphique. VSST 2013-7e colloque de Veille Stratégique, Scientifique et Technologique, Oct 2013, Nancy, France. Résumé Cet article présente l'application d'une nouvelle méthode de sélection...
This paper deals with a new feature selection and feature contrasting approach for classification of highly imbalanced textual data with a high degree of similarity between associated classes. An example of such classification context is illustrated by the task of classifying bibliographic references into a patent classification scheme. This task r...
The development of dynamic information analysis methods, like incremental clustering and novelty detection techniques, is becoming a central concern in a bunch of applications whose main goal is to deal with large volume of textual information which is varying over time.
The purpose of the analysis and diachronic mapping is to track, for a given do...
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised class...
In the process or textual information analysis, like in the domain of technological survey through patents analysis, or in the domain of emerging research tracking through research papers analysis, the complexity of the studied concepts and the accuracy of the questions to be answered may often lead the analyst to partition his reasoning into viewp...
To cope with the current defects of existing incremental clustering methods, an alternative approach for accurately analyzing textual information evolving over time consists in performing diachronic analysis. This type of analysis is based on the application of a clustering method on data associated with two, or more, successive periods of time, an...
The IGNGF (Incremental Growing Neural Gas with Feature maximisation) method is a recent neural clustering method in which the use of a standard distance measure for determining a winner is replaced in IGNGF by cluster feature maximization. One main advantage of this method as compared to concurrent methods is that the maximized features used during...
This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international patents classific...
The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organizations is difficult, a single na...
Automated classification and summarization of websites, as well as knowledge retrieval from web contents, are central challenges for performing accurate and focused webometrics studies. As global approaches based on open web and full webpages content fail to cope with such challenges, in this paper we first focus our approach on organizational and...
The disambiguation of named entities is a challenge in many elds such as sciento- metrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organization is di cult, a single name...
The objective of this paper is to propose a new unsupervised incremental approach in order to follow the evolution of research themes for a given scientific discipline in terms of emergence or decline. Such behaviors are detectable by various methods of filtering. However, our choice is made on the exploitation of neural clustering methods in a mul...
We present a novel approach to the automatic acquisition of a Verbnet like classification of French verbs which involves the use (i) of a neural clustering method which associates clusters with features, (ii) of several supervised and unsupervised evaluation metrics and (iii) of various existing syntactic and semantic lexical resources. We evaluate...
This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international classification pl...
Learning algorithms proved their ability to deal with large amount of data. Most of the statistical approaches use defined size learning sets and produce static models. However in specific situations: active or incremental learning, the learning task starts with only very few data. In that case, looking for algorithms able to produce models with on...
Neural clustering algorithms show high perfor- mance in the general context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm (IGNG) and the labeling maximization based incremental growing neural gas algorithm (IGNG-F). In t...
This paper represents an attempt to throw some light on the quality and on the defects of some recent clustering methods,
either they are incremental or not, on “real world data”. An extended evaluation of the methods is achieved through the use
of textual datasets of increasing complexity. The third test dataset is a highly polythematic dataset th...
Neural clustering algorithms show high performance in the general context of the analysis of homogeneous textual dataset.
This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm
(IGNG) and the label maximization based incremental growing neural gas algorithm (IGNG-F). In this p...
Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision...
Avec l’augmentation croissante de documents nécessaires aux entreprises ou aux administrations, ainsi que la profusion de données disponibles via Internet, les méthodes automatiques de fouilles de données (text mining, data mining) sont devenues incontournables. Elles font appel à des disciplines comme la linguistique, l’analyse de données (statist...
The acquisition of new scientific knowledge and the evolution of the needs of the society regularly call into question the orientations of research. Means to recall and visualize these evolutions are thus necessary. The existing tools for research survey give only one fixed vision of the research activity, which does not allow performing tasks of d...
Le sujet principal de notre travail d'habilitation concerne l'extension de l'approche systémique, initialement implantée dans le Système de Recherche d'Information NOMAD, qui a fait l'objet de notre travail de thèse, pour mettre en place un nouveau paradigme général d'analyse de données basé sur les points de vue multiples, paradigme que nous avons...
Dans le cadre de la veille ou de l’analyse prospective, il est très courant d’avoir recours aux méthodes de clustering pour traiter de gros volumes de données textuelles. Les algorithmes de clustering affichent généralement de bonnes performances dans le cas où les corpus à traiter sont de nature homogène. Cela vaut particulièrement pour les algori...
This paper introduces metadata issues in the framework of the WICRI project, a network of semantic wikis for communities in research and innovation, in which a wiki can be related to an institution, a research field or a regional entity. Metadata and semantic items play the strategic role to handle the quality and the consistency of the network, th...
Ce papier propose une approche alternative aux indices classiques de qualité de clustering. Nos indices de Macro- et Micro- Rappel/Précision s'inspirent des modèles de classification symbolique en exploitant la répartition des propriétés des données associées aux classes. Nous en illustrons une application pour l'analyse de données textuelles qui p...
Nous présentons une approche alternative pour l'évaluation de la qualité de classifications non supervisées de textes basée sur des critères de rappel, précision et F-mesure non supervisés, exploitant les descripteurs associées aux classes. La comparaison expérimentale du comportement des critères classiques avec notre approche est effectuée sur de...
Neural clustering algorithms show high performance in the usual context of the analysis of homogeneous textual dataset. This
is especially true for the recent adaptive versions of these algorithms, like the incremental neural gas algorithm (IGNG).
Nevertheless, this paper highlights clearly the drastic decrease of performance of these algorithms, a...
Nos travaux sur une nouvelle méthode de classification non supervisée (Germen) nous ont amenés à nous interroger sur la qualité des résultats obtenus. Le problème est d'estimer si une méthode de clustering est ‘meilleure' qu'une autre pour le type de données que nous traitons (données textuelles). Dans un premier temps, après avoir fait un état de...
In the context of unsupervised classification, or clustering, the fact of not having a reference classification represents a heavy handicap to evaluate the performance of the algorithms. On their own side, traditional quality indexes (Inertia, DB...) do not allow to properly estimate the quality of the clustering in several cases, as in that one of...
In the context of unsupervised classification, or clustering, the fact of not having a reference classification represents a heavy handicap to evaluate the performance of the algorithms. On their own side, traditional quality indexes (Inertia, DB...) do not allow to properly estimate the quality of the clustering in several cases, as in that one of...