Nistor Grozavu

Nistor Grozavu
Université Paris 13 Nord | Paris 13 Nord · Galilee Institute

PhD

About

73
Publications
10,055
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
399
Citations

Publications

Publications (73)
Article
Full-text available
This paper introduces the concept of near duplicate dataset, a quasi-duplicate version of a dataset. This version has undergone an unknown number of row and column insertions and deletions (modifications on schema and instance). This concept is interesting for data exploration, data integration and data quality. To formalise these insertions and de...
Article
Non-negative matrix factorization (NMF) is an unsupervised algorithm for clustering where a non-negative data matrix is factorized into (usually) two matrices with the property that all the matrices have no negative elements. This factorization raises the problem of instability, which means whenever we run NMF for the same dataset, we get different...
Chapter
This paper introduces the concept of near duplicate dataset, a quasi-duplicate version of a dataset. This version has undergone an unknown number of row and column insertions and deletions (modifications on schema and instance). This concept is interesting for data exploration, data integration and data quality. To formalise these insertions and de...
Chapter
The feature selection process is a difficult task that can be tackled by various algorithms. Our work uses a subclass of metaheuristic algorithms called genetic algorithms (GA) to select the best subset of features that has given, for a machine learning algorithm, the best results (based on accuracy). GA are easy to implement and understand, and th...
Chapter
Semi Non-negative Matrix Factorization (SNMF) is a machine learning algorithm that is used to decompose large data matrices where the data matrix is unconstrained (i.e., it may have mixed signs). We develop the quantum version of SNMF using quantum gradient descent, and we show that the quantum version of SNMF provides an exponential speedup compar...
Article
Numerous parameters impact apatite (U-Th-Sm)/He (AHe) thermochronological dates, such as radiation damage, chemical content, crystal size and geometry, and their knowledge is essential for better geological interpretations. The present study investigates a new method based on advanced data mining techniques, to unravel the parameters that could pla...
Chapter
Full-text available
Quantum machine learning is a new area of research with the recent work on quantum versions of supervised and unsupervised algorithms. In recent years, many quantum machine learning algorithms have been proposed providing a speed-up over the classical algorithms. In this paper, we propose an analysis and a comparison of three quantum distances for...
Chapter
Full-text available
Non-negative matrix factorization is a machine learning technique that is used to decompose large data matrices imposing the non-negativity constraints on the factors. This technique has received a significant amount of attention as an important problem with many applications in different areas such as language modeling, text mining, clustering, mu...
Article
Collaborative Clustering is a data mining task the aim of which is to use several clustering algorithms to analyze different aspects of the same data. The aim of collaborative clustering is to reveal the common underlying structure of data spread across multiple data sites by applying clustering techniques. The idea of collaborative clustering is t...
Conference Paper
Graph clustering techniques are very useful for detecting densely connected groups in large graphs. Many existing graph clustering methods mainly focus on the topological structure, but ignore the vertex properties. Existing graph clustering methods have been recently extended to deal with nodes attribute. In this paper we propose a new method whic...
Chapter
Data anonymization is the process of de-identifying sensitive data while preserving its format and data type. The masked data can be a realistic or a random sequence of data, dependent on the technique used for anonymization. Individual privacy can be at risk if a published data set is not properly de-identified. The most known approach of anonymiz...
Conference Paper
Full-text available
Collaborative clustering is a recent learning paradigm concerned with the unsupervised analysis of complex multi-view data using several algorithms working together. Well known applications of collaborative clustering include multi-view clustering and distributed data clustering, where several algorithms exchange information in order to mutually i...
Conference Paper
Full-text available
Graph clustering techniques are very useful for detecting densely connected groups in large graphs. Many existing graph clustering methods mainly focus on the topological structure, but ignore the vertex properties. Existing graph clustering methods have been recently extended to deal with nodes attribute. First we motivate the interest in the stud...
Article
Full-text available
Collaborative filtering is a well-known technique for recommender systems. Collaborative filtering models use the available preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. Collaborative filtering suffers from the data sparsity problem when users only rate a small set of items which...
Article
Full-text available
The number of website is increasing speedily, and clients purchase their website from the enterprise that suggests them the best domain name with a good price. In order to give the relevant domain name, enterprise is always eager to have a good system of suggestion that suits the client request. Recommender system has been an effective key solution...
Article
Full-text available
Recent studies have shown that the use of a priori knowledge can significantly improve the results of unsupervised classification. However, capturing and formatting such knowledge as constraints is not only very expensive requiring the sustained involvement of an expert but it is also very difficult because some valuable information can be lost whe...
Conference Paper
Full-text available
In this paper, we present a new approach combining topological un-supervised learning with ontology based reasoning to achieve both : (i) automatic interpretation of clustering, and (ii) scaling ontology reasoning over large datasets. The interest of such approach holds on the use of expert knowledge to automate cluster labeling and gives them high...
Conference Paper
Tag recommendation has become one of the most important ways of an organization to index online resources like articles, movies, and music in order to recommend it to potential users. Since recommendation information is usually very sparse, effective learning of the content representation for these resources is crucial to accurate the recommendatio...
Conference Paper
One of the dimension reduction (DR) methods for data-visualization, t-distributed stochastic neighbor embedding (t-SNE), has drawn increasing attention. t-SNE gives us better visualization than conventional DR methods, by relieving so-called crowding problem. The crowding problem is one of the curses of dimensionality, which is caused by discrepanc...
Article
Full-text available
Collaborative clustering is a recent field of Machine Learning that shows similarities with both ensemble learning and transfer learning. Using a two-step approach where different clustering algorithms first process data individually and then exchange their information and results with a goal of mutual improvement, collaborative clustering has show...
Conference Paper
Full-text available
L'utilisation des connaissances a priori peut fortement améliorer la classification non-supervisée. L'injection de ces connaissances sous forme de contraintes sur les données figure parmi les techniques les plus efficaces de la littérature. Cependant, la génération des contraintes est très coûteuse et demande l'intervention de l'expert ; la sémanti...
Article
Full-text available
L'utilisation des connaissances a priori peut fortement améliorer la classification non-supervisée. L'injection de ces connaissances sous forme de contraintes sur les données figure parmi les techniques les plus efficaces de la littérature. Cependant, la génération des contraintes est très coûteuse et demande l'intervention de l'expert ; la sémanti...
Article
In this paper, we present a new approach combining topological unsupervised learning with ontology based reasoning to achieve both: (i) automatic interpretation of clustering, and (ii) scaling ontology reasoning over large datasets. The interest of such approach holds on the use of expert knowledge to automate cluster labeling and gives them high l...
Conference Paper
Full-text available
Collaborative clustering is a recent field of Machine Learning that shows similarities with both transfer learning and ensemble learning. It uses two-step approaches where different clustering algorithms first process data individually and then exchange their information and results with a goal of mutual improvement. In this article, we introduce a...
Conference Paper
This paper introduces a new approach for clustering large datasets based on spectral clustering and topological unsupervised learning. Spectral clustering method needs to construct an adjacency matrix and calculate the eigen-decomposition of the corresponding Laplacian matrix [4] which are computational expensive and is not easy to apply on large-s...
Conference Paper
Recommendation systems provide the facility to understand a person’s taste and find new, desirable content for them based on aggregation between their likes and rating of different items. In this paper, we propose a recommendation system that predict the note given by a user to an item. This recommendation system is mainly based on unsupervised top...
Conference Paper
Full-text available
The aim of collaborative clustering is to reveal the common underlying structures found by different algorithms while analyzing data. The fundamental concept of collaboration is that the clustering algorithms operate locally but collaborate by exchanging information about the local structures found by each algorithm. In this framework, the one purp...
Article
Opinon Mining is the field of computational study of peopel's emotional behavior expressed in text. The purpose of this article is to introduce a new framework for emotion (opinion) mining based on topological unsupervised learning and hierarchical clustering. In contrast to supervised learning, the problem of clustering characterization in the con...
Conference Paper
Full-text available
The aim of collaborative clustering is to reveal the common structure of data which are distributed on different sites. The topological collaborative clustering, based on Self-Organizing Maps (SOM) is an unsupervised learning method which is able to use the output of other SOMs from other sites during the learning. This paper investigates the impac...
Patent
Full-text available
A system for information retrieval within a database of large size includes a first module for extracting the descriptors associated with each object in the database, and for constructing a table containing the objects and the value of a descriptor associated with an object. The system also includes a second module for applying a number of classifi...
Conference Paper
Full-text available
In many cases, databases are in constant evolution, new data is arriving continuously. Data streams pose several unique problems that make obsolete the applications of classical data analysis methods. Indeed, these databases are constantly on-line, growing with the arrival of new data. In addition, the probability distribution associated with the d...
Conference Paper
A common phenomenon in biological experiments is that it is not possible to obtain complete measurements for all the samples. Note that some microarrays are very informative, but very expensive to have them for all the samples. However, we can use publicly available background knowledge about the potential links between the components of different...
Conference Paper
Full-text available
The purpose of this article is to introduce a new collaborative multi-view clustering approach based on a probabilistic model. The aim of collaborative clustering is to reveal the common underlying structure of data spread across multiple data sites by applying clustering techniques. The strength of the collaboration between each pair of data repos...
Conference Paper
The aim of collaborative clustering is to reveal the common structure of data distributed on different sites. In this paper, we present a new approach for the topological collaborative clustering using a generative model, which is the Generative Topographic Mappings (GTM). In this case, maps representing different sites could collaborate without re...
Article
Full-text available
The aim of collaborative clustering is to reveal the common structure of data distributed on different sites. In this paper, we present a formalism of topological collaborative clustering using prototype-based clustering techniques; in particular we formulate our approach using Kohonen's Self-Organizing Maps. Maps representing different sites could...
Conference Paper
Full-text available
In this paper, we propose a study on the use of weighted topological learning and matrix factorization methods to transform the representation space of a sparse dataset in order to increase the quality of learning, and adapt it to the case of transfer learning. The matrix factorization allows us to find latent variables, weighted topological learni...
Conference Paper
Full-text available
This paper addresses the problem of detecting a subset of the most relevant features and observations from a dataset through a local weighted learning paradigm. We introduce a new learning approach, which provides simultaneously Self-Organizing Map (SOM) and double local weighting. The proposed approach is computationally simple, and learns a diffe...
Conference Paper
Full-text available
The aim of collaborative clustering is to reveal the common structure of data which are distributed on different sites. The topological collaborative clustering (based on Kohonen Self-Organizing Maps) allows to take into account other maps without recourse to the data in an unsupervised learning. In this paper, the approach is presented in the case...
Article
Full-text available
This paper introduces a new topological clustering formalism, dedicated to categorical data arising in the form of a binary matrix or a sum of binary matrices. The proposed approach is based on the principle of the Kohonen's model (conservation of topological order) and uses the Relational Analysis formalism by optimizing a cost function defined as...
Conference Paper
Full-text available
This paper addresses the problem of cluster characterization by selecting a subset of the most relevant features for each cluster from a categorical dataset in an autonomous way. The proposed autonomous model is based on the Relational Topological Clustering (RTC) associated with a statistical test which allows to detect the most important variable...
Article
Full-text available
This paper introduces a relational topological map model, dedicated to multidimensional categorial data (or qualitative data) arising in the form of a binary matrix or a sum of binary matrices. This approach is based on the principle of Kohonen's model (conservation of topological order) and uses the Relational Analysis formalism by maximizing a mo...
Conference Paper
Full-text available
Newman and Girvan [12] recently proposed an objective function for graph clustering called the Modularity function which allows automatic selection of the number of clusters. Empirically, higher values of the Modularity function have been shown to correlate well with good graph clustering. In this paper we propose an extended Modularity measure for...
Conference Paper
Full-text available
This paper introduces a new topological clustering formalism, dedicated to categorical data arising in the form of a binary matrix or a sum of binary matrices. The proposed approach is based on the principle of the Kohonen's model (conservation of topological order) and uses the Relational Analysis formalism by optimizing a cost function defined as...
Chapter
Full-text available
In this chapter, we formally defined the problem of clustering and we presented an original and new approach of fusion/ensemble/consensus/aggregation clustering. The main idea was to find a clustering (or partition) of observations that represents the best consensus between several other clustering related to the same data set. The goal of the prop...
Article
Full-text available
In this paper we propose a new automatic learning model which allows the simultaneously topological clustering and feature selection for quantitative datasets. We explore a new topological organization algorithm for categorical data clustering and visualization named RTC (Relational Topological Clustering). Generally, it is more difficult to perfor...
Article
This paper studies the extension of the Modularity measure for categorical data clustering. It first shows the relational data presentation and establishes the relationship between the extended Modularity and the Relational Analysis criterion. Two extensions are presented in this work: the early integration and the intermediate integration approach...
Conference Paper
Internet offers to its users an ever-increasing number of information. Among those, the multimodal data (images, text, video, sound) are widely requested by users, and there is a strong need for effective ways to process and to manage it, respectively. Most of existed algorithms/frameworks are doing only images annotations and the search is doing b...
Conference Paper
Full-text available
This paper presents a new learning strategy for the clustering algorithms based on Self-Organizing Map. Our contribution relies on the competitive phase of this unsupervised learning algorithm and proposes a new strategy for choosing the most active cell/neuron. This new strategy is to choose the most active neuron taking into account its historica...
Conference Paper
Full-text available
We introduce a new learning approach, which provides simultaneously self-organizing map (SOM) and local weight vector for each cluster. The proposed approach is computationally simple, and learns a different features vector weights for each cell (relevance vector). Based on the self-organizing map approach, we present two new simultaneously cluster...
Conference Paper
Full-text available
This paper addresses the problem of selecting a subset of the most relevant features from a dataset through a weighted learning paradigm.We propose two automated feature selection algorithms for unlabeled data. In contrast to supervised learning, the problem of automated feature selection and feature weighting in the context of unsupervised learnin...