J.-C. Lamirel

J.-C. Lamirel
Lorrain de Recherche en Informatique et Ses Applications | Loria

PhD with Research Accreditation

About

182
Publications
21,074
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
848
Citations
Additional affiliations
September 1997 - September 2015
Lorrain de Recherche en Informatique et Ses Applications
Position
  • Senior Researcher
September 1997 - present
University of Strasbourg
Position
  • Lecturer
September 1990 - present
University of Lorraine
Position
  • Lecturer

Publications

Publications (182)
Article
Mining the content of scientific publications is increasingly used to investigate the practice of science and the evolution of research domains. Topic models, among which LDA (statistical bag-of-words approach) and Top2Vec (embeddings approach), have notably been shown to provide rich insights into the thematic content of disciplinary fields, their...
Chapter
In this paper we are addressing one specific problem of Context-Adaptive Document Analysis: the generation of specific learning data. While many Document Analysis solutions exist for well described cases, it is often difficult to adapt them to other contexts. We present an ongoing research methodology for generating synthetic documents that are not...
Chapter
The aim of spatial econometrics is to analyze and/or predict the relationship between one dependent variable Y with other variables, building a model that takes into account the spatial dependence. Usual spatial econometric models are based on a neighbourhood matrix whose elements are linked to geographical distances. We propose to use distances be...
Article
Full-text available
Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative and error-prone process. To assist these experts during modelling and labelling, the authors...
Article
Full-text available
This paper focuses on using feature salience to evaluate the quality of a partition when dealing with hard clustering. It is based on the hypothesis that a good partition is an easy to label partition, i.e. a partition for which each cluster is made of salient features. This approach is mostly compared to usual approaches relying on distances betwe...
Article
The influence of authors is mostly based on their capacity to form specific self-contained and/or active research communities or topics while also inspiring fruitful spin-off research derived from those communities or topics. Accurately estimating author influence and its indirect effect in inspiring external research creativity must thus be based...
Chapter
Full-text available
While graph embedding aims at learning low-dimensional representations of nodes encompassing the graph topology, word embedding focus on learning word vectors that encode semantic properties of the vocabulary. The first finds applications on tasks such as link prediction and node classification while the latter is systematically considered in natur...
Article
The 15th International Conference on Webometrics, Informetrics and Scientometrics & 20th COLLNET Meet-ing (COLLNET 2019) was held on November 6-8, 2019 in Dalian (China). It has been hosted by the WISE Lab in Da-lian University of Technology (DLUT), co-organized by the “Committee of Theory of Science of Science and Discipline Construction, Chinese...
Article
In the first part of this paper, we shall discuss the historical context of Science of Science both in China and at world level. In the second part, we use the unsupervised combination of GNG clustering with feature maximization metrics and associated contrast graphs to present an analysis of the contents of selected academic journal papers in Scie...
Chapter
The complexity of urban segregation challenges researchers to develop powerful and complex mathematical tools for assessing it. With more and more fine-grained and massive data becoming available these last years, individual-based models are now made possible in practice. Very recently, a mathematical object called multiscalar fingerprint [1], cont...
Chapter
Full-text available
Introduced in the context of machine learning, the Feature F-measure is a statistical feature selection metric without parameters that allows to describe classes through a set of salient features. It was shown efficient for classification, cluster labeling and clustering model quality measurement. In this paper, we introduce the Node F-measure, its...
Article
Full-text available
Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this...
Chapter
Feature maximization is an alternative measure, as compared to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures, like Euclidean distance or correlation distance. One of the key advantages of this measure taking inspiration both from Galois lattice theory and information retrieval is that it is operat...
Conference Paper
Full-text available
La F-Mesure de trait est une métrique de sélection de variables statistique sans paramètres qui a montré de bonnes performances pour la classification, l’étiquetage de clusters ou encore la mesure de qualité des clusters. Dans cet article, nous proposons d’évaluer son utilisation dans le contexte des graphes de terrain et de leur structure communau...
Chapter
Feature maximization (F-max) is an unbiased quality estimation metric of unsupervised classification (clustering) that favours clusters with a maximal feature F-measure value. In this article we show that an adaptation of this metric within the framework of supervised classification allows efficient feature selection and feature contrasting to be p...
Conference Paper
This paper deals with a major challenge in clustering that is optimal model selection. It presents new efficient clustering quality indexes relying on feature maximization, which is an alternative measure to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures such as Euclidean distance or correlation di...
Conference Paper
This paper deals with a major challenge in clustering that is optimal model selection. It presents new efficient clustering quality indexes relying on feature maximization, which is an alternative measure to usual distributional measures relying on entropy, Chi-square metric or vector-based measures such as Euclidean distance or correlation distanc...
Conference Paper
In this paper we first propose a state of the art on the methods for the visualization and for the interpretation of textual data, and in particular of scientific data. We then shortly present our contributions to this field in the form of original methods for the automatic classification of documents and easy interpretation of their content throug...
Article
Full-text available
Dans cet article, nous proposons tout d'abord un état de l'art des méthodes pour la visualisation et l'interprétation des données textuelles, et en particulier des données scientifiques. Nous présentons ensuite nos contributions à ce domaine, sous la forme de méthodes originales pour la classification automatique de documents et l'interprétation fa...
Chapter
This paper presents new cluster quality indexes which can be efficiently applied for a low-to-high dimensional range of data and which are tolerant to noise. These indexes relies on feature maximization, which is an alternative measure to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures such as Eucli...
Conference Paper
Full-text available
In this paper, we aim to give insights about the self-organization of scientific collaboration. To that aim, we describe a new framework to monitor the evolution of a collaboration graph that models the co-authorship of research papers authors. We use community structure of the network as a high-level description of its self-organization and thus c...
Conference Paper
As regard to the evolution of the concept of text and to the continuous growth of textual information of multiple nature which is available online, one of the important issues for linguists and information analysts for building up assumptions and validating models is to exploit efficient tools for textual analysis, able to adapt to large volumes of...
Article
Classifications which group together verbs and a set of shared syntactic and semantic properties have proven to be useful in both linguistics and Natural Language Processing tasks. However, most existing approaches for automatically acquiring verb classes fail to associate the verb classes produced with an explicit characterisation of the syntactic...
Chapter
Feature maximization is an alternative measure, as compared to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures, like Euclidean distance or correlation distance. One of the key advantages of this measure is that it is operational in an incremental mode both on clustering and on traditional classifica...
Conference Paper
Full-text available
Labelling maximization (F-max) is an unbiased metric for estimation of the quality of non-supervised classification (clustering) that promotes the clusters with a maximum value of feature F-measure. In this paper, we show that an adaptation of this metric within the supervised classification allows to perform a selection of features and to calculat...
Article
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we show that a simple adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. The method is experie...
Conference Paper
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised class...
Article
This paper focuses on a subtask of the QUAERO1 research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this paper is to propose a new method for the classification of scientific papers, developed in the context of an international patents classifica...
Conference Paper
Full-text available
The development of dynamic information analysis methods, like incremental clustering, concept drift management and novelty detection techniques, is becoming a central concern in a bunch of applications whose main goal is to deal with information which is varying over time. These applications relate themselves to very various and highly strategic do...
Article
Full-text available
Face à l’évolution de la notion de texte et à la croissance continuelle de l’information textuelle, de multiple natures, disponible en ligne, un des enjeux important pour les linguistes, mais également pour les didacticiens, pour pouvoir étayer ou construire des hypothèses et valider des modèles, est celui de pouvoir disposer d’outils d’analyse tex...
Conference Paper
This paper deals with a new feature selection and feature contrasting approach for enhancing classification of both numerical and textual data. The method is experienced on different types of reference datasets. The paper illustrates that the proposed approach provides a very significant performance increase in all the studied cases clearly figurin...
Article
Full-text available
Analyse desévolutionsdesévolutions et des interactions entre do-maines scientifiques: GRAFSEL, association de la sélection de variables et de la représentation graphique. VSST 2013-7e colloque de Veille Stratégique, Scientifique et Technologique, Oct 2013, Nancy, France. Résumé Cet article présente l'application d'une nouvelle méthode de sélection...
Conference Paper
This paper deals with a new feature selection and feature contrasting approach for classification of highly imbalanced textual data with a high degree of similarity between associated classes. An example of such classification context is illustrated by the task of classifying bibliographic references into a patent classification scheme. This task r...
Conference Paper
Full-text available
The development of dynamic information analysis methods, like incremental clustering and novelty detection techniques, is becoming a central concern in a bunch of applications whose main goal is to deal with large volume of textual information which is varying over time. The purpose of the analysis and diachronic mapping is to track, for a given do...
Conference Paper
Full-text available
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised class...
Article
Full-text available
In the process or textual information analysis, like in the domain of technological survey through patents analysis, or in the domain of emerging research tracking through research papers analysis, the complexity of the studied concepts and the accuracy of the questions to be answered may often lead the analyst to partition his reasoning into viewp...
Article
To cope with the current defects of existing incremental clustering methods, an alternative approach for accurately analyzing textual information evolving over time consists in performing diachronic analysis. This type of analysis is based on the application of a clustering method on data associated with two, or more, successive periods of time, an...
Article
The IGNGF (Incremental Growing Neural Gas with Feature maximisation) method is a recent neural clustering method in which the use of a standard distance measure for determining a winner is replaced in IGNGF by cluster feature maximization. One main advantage of this method as compared to concurrent methods is that the maximized features used during...
Conference Paper
Full-text available
This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international patents classific...
Conference Paper
Full-text available
The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organizations is difficult, a single na...
Conference Paper
Full-text available
Automated classification and summarization of websites, as well as knowledge retrieval from web contents, are central challenges for performing accurate and focused webometrics studies. As global approaches based on open web and full webpages content fail to cope with such challenges, in this paper we first focus our approach on organizational and...
Book
Full-text available
The disambiguation of named entities is a challenge in many elds such as sciento- metrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organization is di cult, a single name...
Article
The objective of this paper is to propose a new unsupervised incremental approach in order to follow the evolution of research themes for a given scientific discipline in terms of emergence or decline. Such behaviors are detectable by various methods of filtering. However, our choice is made on the exploitation of neural clustering methods in a mul...
Conference Paper
Full-text available
We present a novel approach to the automatic acquisition of a Verbnet like classification of French verbs which involves the use (i) of a neural clustering method which associates clusters with features, (ii) of several supervised and unsupervised evaluation metrics and (iii) of various existing syntactic and semantic lexical resources. We evaluate...
Article
Full-text available
This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international classification pl...
Article
Full-text available
Learning algorithms proved their ability to deal with large amount of data. Most of the statistical approaches use defined size learning sets and produce static models. However in specific situations: active or incremental learning, the learning task starts with only very few data. In that case, looking for algorithms able to produce models with on...
Article
Neural clustering algorithms show high perfor- mance in the general context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm (IGNG) and the labeling maximization based incremental growing neural gas algorithm (IGNG-F). In t...
Conference Paper
This paper represents an attempt to throw some light on the quality and on the defects of some recent clustering methods, either they are incremental or not, on “real world data”. An extended evaluation of the methods is achieved through the use of textual datasets of increasing complexity. The third test dataset is a highly polythematic dataset th...
Conference Paper
Neural clustering algorithms show high performance in the general context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm (IGNG) and the label maximization based incremental growing neural gas algorithm (IGNG-F). In this p...
Conference Paper
Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision...
Article
Full-text available
Avec l’augmentation croissante de documents nécessaires aux entreprises ou aux administrations, ainsi que la profusion de données disponibles via Internet, les méthodes automatiques de fouilles de données (text mining, data mining) sont devenues incontournables. Elles font appel à des disciplines comme la linguistique, l’analyse de données (statist...
Conference Paper
The acquisition of new scientific knowledge and the evolution of the needs of the society regularly call into question the orientations of research. Means to recall and visualize these evolutions are thus necessary. The existing tools for research survey give only one fixed vision of the research activity, which does not allow performing tasks of d...
Article
Full-text available
Le sujet principal de notre travail d'habilitation concerne l'extension de l'approche systémique, initialement implantée dans le Système de Recherche d'Information NOMAD, qui a fait l'objet de notre travail de thèse, pour mettre en place un nouveau paradigme général d'analyse de données basé sur les points de vue multiples, paradigme que nous avons...
Article
Full-text available
Dans le cadre de la veille ou de l’analyse prospective, il est très courant d’avoir recours aux méthodes de clustering pour traiter de gros volumes de données textuelles. Les algorithmes de clustering affichent généralement de bonnes performances dans le cas où les corpus à traiter sont de nature homogène. Cela vaut particulièrement pour les algori...
Article
Full-text available
This paper introduces metadata issues in the framework of the WICRI project, a network of semantic wikis for communities in research and innovation, in which a wiki can be related to an institution, a research field or a regional entity. Metadata and semantic items play the strategic role to handle the quality and the consistency of the network, th...
Article
Full-text available
Ce papier propose une approche alternative aux indices classiques de qualité de clustering. Nos indices de Macro- et Micro- Rappel/Précision s'inspirent des modèles de classification symbolique en exploitant la répartition des propriétés des données associées aux classes. Nous en illustrons une application pour l'analyse de données textuelles qui p...
Article
Full-text available
Nous présentons une approche alternative pour l'évaluation de la qualité de classifications non supervisées de textes basée sur des critères de rappel, précision et F-mesure non supervisés, exploitant les descripteurs associées aux classes. La comparaison expérimentale du comportement des critères classiques avec notre approche est effectuée sur de...
Conference Paper
Full-text available
Neural clustering algorithms show high performance in the usual context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental neural gas algorithm (IGNG). Nevertheless, this paper highlights clearly the drastic decrease of performance of these algorithms, a...
Article
Full-text available
Nos travaux sur une nouvelle méthode de classification non supervisée (Germen) nous ont amenés à nous interroger sur la qualité des résultats obtenus. Le problème est d'estimer si une méthode de clustering est ‘meilleure' qu'une autre pour le type de données que nous traitons (données textuelles). Dans un premier temps, après avoir fait un état de...
Article
Full-text available
In the context of unsupervised classification, or clustering, the fact of not having a reference classification represents a heavy handicap to evaluate the performance of the algorithms. On their own side, traditional quality indexes (Inertia, DB...) do not allow to properly estimate the quality of the clustering in several cases, as in that one of...
Article
Full-text available
In the context of unsupervised classification, or clustering, the fact of not having a reference classification represents a heavy handicap to evaluate the performance of the algorithms. On their own side, traditional quality indexes (Inertia, DB...) do not allow to properly estimate the quality of the clustering in several cases, as in that one of...