About
31
Publications
5,831
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
95
Citations
Introduction
Current institution
Publications
Publications (31)
Natural language tasks like Named Entity Recognition (NER) in the clinical domain on non-English texts can be very time-consuming and expensive due to the lack of annotated data. Cross-lingual transfer (CLT) is a way to circumvent this issue thanks to the ability of multilingual large language models to be fine-tuned on a specific task in one langu...
Without any explicit cross-lingual training data, multilingual language models can achieve cross-lingual transfer. One common way to improve this transfer is to perform realignment steps before fine-tuning, i.e., to train the model to build similar representations for pairs of words from translated sentences. But such realignment methods were found...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be trained on a specific task in one language and give relatively good results on the same task in another language, despite having been pre-trained on monolingual tasks only. But, there is no consensus yet on whether those transformer-based models learn uni...
Previous literature has shown that it is possible to align word embeddings from different languages with unsupervised methods based on a distance-preserving mapping, with the assumption that the embeddings are isometric. However, these methods seem to work only when both embeddings are trained on the same domain. Nonetheless, we hypothesize that th...
This paper presents a new generative unsupervised learning algorithm based on a representation of the clusters distribution by histograms. The main idea is to reduce the model complexity through cluster-defined projections of the data on independent axes. The results show that the proposed approach performs efficiently compared with other algorithm...
Collaborative Clustering is a data mining task the aim of which is to use several clustering algorithms to analyze different aspects of the same data. The aim of collaborative clustering is to reveal the common underlying structure of data spread across multiple data sites by applying clustering techniques. The idea of collaborative clustering is t...
Among the variety of algorithms that have been developed for clustering, prototype-based approaches are very popular due to their low computational complexity, allowing real-life applications. In such algorithms, the data set is summarized by a small set of prototypes. Each prototype usually represents a cluster of objects. However, the definition...
The analysis of a dynamic data is challenging. Indeed, the structure of such data changes over time, potentially in a very fast speed. In addition, the objects in such data-sets are often complex. In this paper, our practical motivation is to perform users profiling, i.e. to follow users’ geographic location and navigation logs to detect changes in...
The research work presented in this thesis concerns the development of unsupervised learning approaches adapted to large relational and dynamic data-sets. The combination of these three characteristics (size, complexity and evolution) is a major challenge in the field of data mining and few satisfactory solutions exist at the moment, despite the ob...
Visualization methods are important to describe the underlying structure of a data set. When the data is not described as a vector of numerical values, a visualization can be obtained through the reordering of the corresponding similarity matrix. Although several methods of reordering exist, they all need the complete similarity matrix in memory. H...
In this paper, we propose an algorithm for the discovery and the monitoring of clusters in dynamic datasets. The proposed method is based on a Growing Neural Gas and learns simultaneously the prototypes and their segmentation using and estimation of the local density of data to detect the boundaries between clusters. The quality of our algorithm is...
Performance of cluster ensemble approaches is now known to be tightly related to both quality and diversity of input base clusterings. Cluster ensemble selection (CES) refers to the process of filtering the raw set of base clusterings in order to select a subset of high quality and diverse clusterings. Most of existing CES approaches apply one inde...