• Home
  • Djamel Abdelkader Zighed
Djamel Abdelkader Zighed

Djamel Abdelkader Zighed
University of Lyon, Lyon, France · Human Science Institute

PhD

About

188
Publications
23,782
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,965
Citations
Additional affiliations
January 2011 - December 2015
Lumière University Lyon 2
Position
  • Professor, Executive Director
January 1984 - present
Lumière University Lyon 2
Position
  • Professor

Publications

Publications (188)
Preprint
Full-text available
In the scientific digital libraries, some papers from different research communities can be described by community-dependent keywords even if they share a semantically similar topic. Articles that are not tagged with enough keyword variations are poorly indexed in any information retrieval system which limits potentially fruitful exchanges between...
Conference Paper
In this article we address the problem of expanding the set of papers that researchers encounter when conducting bibliographic research on their scientific work. Using classical search engines or recommender systems in digital libraries, some interesting and relevant articles could be missed if they do not contain the same search key-phrases that t...
Conference Paper
This paper proposes a novel approach in incorporating several metadata such as citations, co-authorship, titles, and keywords to identify real authors in author disambiguation task. Classification schemes make use of these variables to identify authorship. The methodology performed in this paper is: (1) coarse grouping of article by the use of focu...
Article
This paper focuses on the detection of likely mislabeled instances in a learning dataset. In order to detect potentially mislabeled samples, two solutions are considered which are both based on the same framework of topological graphs. The first is a statistical approach based on Cut Edges Weighted statistics (CEW) in the neighborhood graph. The se...
Book
This book is a collection of representative and novel works done in Data Mining, Knowledge Discovery, Clustering and Classification that were originally presented in French at the EGC'2012 Conference held in Bordeaux, France, on January 2012. This conference was the 12th edition of this event, which takes place each year and which is now successful...
Conference Paper
Full-text available
This paper describes SONDY, a tool for analysis of trends and dynamics in online social network data. SONDY addresses two audiences: (i) end-users who want to explore social activity and (ii) researchers who want to experiment and compare mining techniques on social data. SONDY helps end-users like media analysts or journalists understand social ne...
Article
Full-text available
Online social networks play a major role in the spread of information at very large scale. A lot of effort have been made in order to understand this phenomenon, ranging from popular topic detection to information diffusion modeling, including influential spreaders identification. In this article, we present a survey of representative methods deali...
Book
Full-text available
L'analyse statistique implicative (ASI) est une méthode d'analyse de données non symétrique, conçue par Régis Gras il y a plus de trente ans. A travers thèses, articles de revues, livres et colloques, elle a été développée et l’est encore par lui, par des doctorants ou avec la collaboration d'équipes de recherche universitaires en France et à l'étr...
Book
Full-text available
The recent and novel research contributions collected in this book are extended and reworked versions of a selection of the best papers that were originally presented in French at the EGC’2011 Conference held in Brest, France, on January 2011. EGC stands for "Extraction et Gestion des connaissances" in French, and means "Knowledge Discovery and Man...
Article
Full-text available
In many application domains, the choice of a proximity measure affect directly the result of classification, comparison or the structuring of a set of objects. For any given problem, the user is obliged to choose one proximity measure between many existing ones. However, this choice depend on many characteristics. Indeed, according to the notion of...
Conference Paper
Full-text available
Online discussions became increasingly widespread with the Web 2.0: no matter the distance, whether you know the person or not, you can discuss and exchange ideas with people all over the world through forums, blogs, and newsgroups. The news websites have extensively used forums in order to encourage the reader being a real participant in the infor...
Article
Full-text available
The expansion of web user roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas and opinions, and form social networks through the web. The interaction level among users leads to the appearance of several social roles which can be characterized as positions, behaviors, or virtual identities. These roles may be...
Book
Full-text available
During the last decade, Knowledge Discovery and Management (KDM or, in French, EGC for Extraction et Gestion des connaissances) has been an intensive and fruitful research topic in the French-speaking scientific community. In 2003, this enthusiasm for KDM led to the foundation of a specific French-speaking association, called EGC, dedicated to supp...
Conference Paper
Full-text available
Analyzing the social roles inside on-line communities became a big challenge nowadays. The on-line communities formed around exchange platforms (e.g., forums) create an increasing source of data for analyzing user’s behavior. This paper proposes an exploratory analysis of communities in news website based on its sub-communities. Actually, we assume...
Conference Paper
Full-text available
Web forums are a huge data source. They allow people to interact with unknown individuals. Studying forums shows that the interaction is not obvious only through the structure but also through the content of the post. Taking into account this observation, we extract a social network with different kinds of relationships i.e. the structural relation...
Conference Paper
Full-text available
Forums on the Internet are an overwhelming source of knowledge considering the number of topics treated and users who participate in these discussions. This volume of data is difficult to comprehend for a person with respect for the large number of posts. Our work proposes a new formal framework for synthesizing information contained in these forum...
Chapter
BELMANDT is the collective pseudonym under which a group of mathematicians and computer scientists have decide to publish their common works on Pretopology and to give a nod to BOURBAKI. These works would never have been realized without the dynamic leadership of professor Marcel Brissaud, now retired, and who is of course associated to the publica...
Article
Full-text available
The choice of a proximity measure between objects has a direct impact on the results of any operation of supervised or unsupervised classification, comparison, evaluation or structuring a set of objects. For a given problem, the user is prompted to choose one among the many existing proximity measures. However, according to the notion of topologica...
Conference Paper
Full-text available
Ensembles of randomized trees such as Random Forests are among the most popular tools used in machine learning and data mining. Such algorithms work by introducing randomness in the induction of several decision trees before employing a voting scheme to give a prediction for unseen instances. In this paper, randomized trees ensembles are studied in...
Chapter
Full-text available
Many algorithms of machine learning use an entropy measure as optimization criterion.Among the widely used entropy measures, Shannon’s is one of the most popular. In some real world applications, the use of such entropy measures without precautions, could lead to inconsistent results. Indeed, the measures of entropy are built upon some assumptions...
Book
Full-text available
During the last decade, the French-speaking scientific community developed a very strong research activity in the field of Knowledge Discovery and Management (KDM or EGC for “Extraction et Gestion des Connaissances” in French), which is concerned with, among others, Data Mining, Knowledge Discovery, Business Intelligence, Knowledge Engineering and...
Article
This is an edited book, not an article! See https://www.researchgate.net/publication/231315510_Advances_in_Knowledge_Discovery_and_Management
Conference Paper
Decision trees generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuous-valued data, the associated attributes must be discretized into several intervals using a set of crisp cut points. One drawback of decision trees is their instability, i.e., small data deviations ma...
Conference Paper
We extend the framework of spatial autocorrelation analysis on Reproducing Kernel Hilbert Space (RKHS). Our results are based on the fact that some geometrical neighborhood structures vary when samples are mapped into a RKHS, while other neighborhood structures do not. These results allow us to design a new measure for measuring the goodness of a k...
Conference Paper
Many supervised induction algorithms require discrete data, however real data often comes in both discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction model. Usually, discretization and other types of statist...
Conference Paper
Most of the real data often comes in a mixed format (discrete or continuous), however many supervised induction algorithms require discrete data. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction models. Most of the existing discretizatio...
Article
Full-text available
Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes...
Article
The healthcare industry produces a constant flow of data, creating a need for deep analysis of databases through data mining tools and techniques resulting in expanded medical research, diagnosis, and treatment. Data Mining and Medical Knowledge Management: Cases and Applications presents case studies on applications of various modern data mining m...
Article
A multimedia index makes it possible to group data according to similarity criteria. Traditional index structures are based on trees and use the k-Nearest Neighbors (k-NN) approach to retrieve databases. Due to some disadvantages of such an approach, the use of neighborhood graphs was proposed. This approach is interesting, but it has some disadvan...
Conference Paper
Ontology learning from text is considered as an appealing and a challenging approach to address the shortcomings of the hand-crafted ontologies. In this paper, we present OLEA, a new framework for ontology learning from text. The proposal is a hybrid approach combining the pattern-based and the distributional approaches. It addresses key issues in...
Chapter
The goal of any clustering algorithm producing flat partitions of data, is to find both the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in a clustering process, which can lead to an objective selection of the optimal number of clus...
Conference Paper
The goal of any clustering algorithm producing flat partitions of data is to find the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in the clustering process, which can lead to an objective selection of the optimal number of clusters...
Conference Paper
Full-text available
We propose to evaluate the quality of decision trees grown on imbalanced datasets with a splitting criterion based on an asymmetric entropy measure. To deal with the class imbalance problem in machine learning, especially with decision trees, different authors proposed such asymmetric splitting criteria. After the tree is grown a decision rule has...
Conference Paper
The goal of any clustering algorithm producing flat partitions of data, is to find both the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in a clustering process, which can lead to an objective selection of the optimal number of clus...
Conference Paper
Full-text available
The goal of any clustering algorithm is to find the optimal clustering solution with the optimal number of clusters.In order to evaluate a clustering solution, a number of validity indices are used during or at the end of a clustering process.They can be internal, external or relative.In this paper, we provide two main contributions: First, we pres...
Conference Paper
Decision tree induction has been widely used to generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuousvalued data, the associated attributes must be discretized in advance or during the learning process. We generate discretization points by performing resampling on...
Chapter
Full-text available
Implicative statistics criteria have proven to be valuable interestingness measures for association rules. Here we highlight their interest for classification trees. We start by showing how Gras� implication index may be defined for rules derived from an induced decision tree. This index is especially helpful when the aim is not classification itse...
Book
This book constitutes the refereed proceedings of the Third International Workshop on Mining Complex Data, MCD 2007, held in Warsaw, Poland, in September 2007, co-located with ECML and PKDD 2007. The 20 revised full papers presented were carefully reviewed and selected; they present original results on knowledge discovery from complex data. In cont...
Conference Paper
This paper brings two contributions in relation with the semantic heterogeneous (documents composed of texts and images) information retrieval: (1) A new context-based semantic distance measure for textual data, and (2) an IR system providing a conceptual and an automatic indexing of documents by considering their heterogeneous content using a doma...
Conference Paper
A major lack in the existing semantic similarity methods is that no one takes into account the context or the considered domain. However, two concepts similar in one context may appear completely unrelated in another context. In this paper, our first-level approach is context-dependent. We present a new method that computes semantic similarity in t...
Conference Paper
Full-text available
Having a reliable semantic similarity measure between words/concepts can have major effect in many fields like information retrieval and information integration. A major lack in the existing semantic similarity measures is that no one takes into account the actual context or the considered domain. However, two concepts similar in one context may ap...
Conference Paper
Full-text available
Les mesures d'entropie, dont la plus connue est celle de Shannon, ont été proposées dans un contexte de codage et de transmission d'information. Néanmoins, dès le milieu des ann{ées soixante, elles ont été utilisées dans d'autres domaines comme l'apprentissage et plus particulièrement pour construire des graphes d'induction et des arbres de déc...
Conference Paper
In this paper, we face two problems in classical semantic similarity measures. Firstly, the context-dependency problem in knowledge-base measures since no one takes into account the context of the target domain. That is, a multisource context-dependent approach is presented. Secondly, the coverage problem with these measures since similarities can...
Chapter
Induction graphs, which are a generalization of decision trees, have a special place among the methods of Data Mining. Indeed, they generate lattice graphs instead of trees. They perform well, are capable of handling data in large volumes, are relatively easy for a non-specialist to interpret, and are applicable without restriction on data of any t...
Article
Full-text available
This paper explains how text mining was used within the context of a research project on social dialogue regimes, jointly undertaken by the University of Geneva, the University of Lyon 2 and the International Institute of Labour Studies of the International Labour Organisation (ILO). The research project, which was made possible through the generou...
Conference Paper
Full-text available
This paper highlights the interest of implicative statistics for classification trees. We start by showing how Gras� implication index may be defined for the rules derived from an induced decision tree. Then, we show that residuals used in the modeling of contingency tables provide interesting alternatives to Gras� index. We then consider two main...
Article
Retrieving hidden information in image databases is a difficult task because of their complex structure and the subjectivity related to their interpretation. In this situation the use of an index is primordial. We propose an effective method for locally updating neighborhood graphs which constitute our index. This method is based on an intelli...
Conference Paper
Full-text available
La fouille de donn{\'e}es textuelles constitue un champ majeur du traitement automatique des donn{\'e}es. Une large vari{\'e}t{\'e} de conf{\'e}rences, comme TREC, lui sont consacr{\'e}es. Dans cette {\'e}tude, nous nous int{\'e}ressons {\`a} la fouille de textes juridiques, dans l�objectif est le classement automatique de ces textes. Nous util...
Conference Paper
La d{\'e}couverte d�informations cach{\'e}es dans les bases de donn{\'e}es multim{\'e}dias est une t�che difficile {\`a} cause de leur structure complexe et {\`a} la subjectivit{\'e} li{\'e}e {\`a} leur interpr{\'e}tation. Face {\`a} cette situation, l�utilisation d�un index est primordiale. Un index multim{\'e}dia permet de regrouper les donn{\...
Conference Paper
This paper presents a step in a long process of analyzing, structuring, and retrieving multimedia databases. Indeed, we propose to bring an improvement to an existing content based image retrieval approach. We propose an effective method for locally updating neighborhood graphs which constitute our multimedia index. This method is based on an...
Article
Full-text available
In this paper we present a new entropy measure to grow decision trees. This measure has the characteristic to be asymmetric, allowing the user to grow trees which better correspond to his ex-pectation in terms of recall and preci-sion on each class. Then we propose decision rules adapted to such trees. Experiments have been realized on real medical...
Article
Full-text available
We propose a new statistical approach for characterizing the class separability degree in Rp. This approach is based on a nonparametric statistic called �the Cut Edge Weight�. We show in this paper the principle and the experimental applications of this statistic. First, we build a geometrical connected graph like Toussaint�s Relative Ne...
Article
Decision tree methods generally suppose that the number of categories of the attribute to be predicted is fixed. Breiman et al., with their Twoing criterion in CART, considered gathering the categories of the predicted attribute into two supermodalities. In this paper, we propose an extension of this method. We try to merge the categories in an...
Article
Full-text available
Decision tree methods generally suppose that the number of categories of the attribute to be predicted is fixed. Breiman et al., with their Twoing criterion in CART, considered gathering the categories of the predicted attribute into two supermodalities. In this article, we propose an extension of this method. We try to merge the categories in an o...
Article
We propose a new statistical approach for characterizing the class separability degree in ℝp. This approach is based on a non-parametric statistic called ‘the cut edge weight’. We show in this paper the principle and the experimental applications of this statistic. First, we build a geometrical connected graph like Toussaint's Relative Neighbo...
Article
Search algorithms in image databases usually return k nearest neighbours (kNN) of an image according to a similarity measure. This approach presents some anomalies and is based on assumptions that are not always satisfied. We have examined the causes of these anomalies and we have concluded that image query models have to exploit topological pr...
Conference Paper
Neighborhood graphs are an effective and very widespread technique in several fields. But, in spite of the neighborhood graphs interest, their construction algorithms suffer from a very high complexity what prevents their implementation for great data volumes processing applications. With this high complexity, the update task is also affected....
Conference Paper
Full-text available
This paper is concerned with the neighbourhood-based supervised learning of a continuous class. It deals with identifying and handling outliers. We first explain why and how to use the neighbourhood graph issued from predictors in the prediction of a continuous class. Global quality of the representation is evaluated by a neighbourhood autocorrelat...
Article
The purpose of this study was to determine whether reading performance is equivalent between the initial mammogram on a viewbox and the digitalized screen display. We randomly selected forty-nine mammograms revealing cancer and the same number of normal or benign mammograms. The benign diagnosis was confirmed after a two-year follow-up and a second...
Article
The purpose of this study was to determine whether reading performance is equivalent between the initial mammogram on a viewbox and the digitalized screen display. We randomly selected forty-nine mammograms revealing cancer and the same number of normal or benign mammograms. The benign diagnosis was confirmed after a two-year follow-up and a second...
Article
In supervised learning, all the instances of the learning sample must be preclassified. In some cases, the labelling is done subjectively by an expert. For instance, in text categorisation, each text is assigned to one or more labels or categories. In such cases, the labelling is inconsistent because it may be different from an expert to another an...
Article
Full-text available
Data mining and knowledge discovery aim at producing useful and reliable models from the data. Unfortunately some databases contain noisy data which perturb the generalization of the models. An important source of noise consists of mislabelled training instances. We offer a new approach which deals with improving classification accuracies by using...
Conference Paper
In the context of complex datamining, we propose in this article an image database representation using topological graphs. Each image is represented as a point in a multidimensional space R p, using numerical features automatically extracted from image. These points are gathered in a topological graph. Graph exploration may be compared to database...
Conference Paper
Dans le contexte de la fouille de donn{\'e}es complexes, nous proposons dans cet article la repr{\'e}sentation d�une base d�images {\`a} l�aide des graphes topologiques. Chaque image est repr{\'e}sent{\'e}e comme un point dans l�espace multidimensionnel Rp {\`a} l�aide des caract{\'e}ristiques num{\'e}riques automatiquement extraites {\`a} parti...
Article
Cet article discute des possibilit�es de mesurer la qualit�e de l�ajustement d�arbres d�induction aux donn�ees comme cela se fait classiquement pour les mod`eles statistiques. Nous montrons comment adapter aux arbres d�induction les statistiques du khi-2, notamment celle du rapport de vraisemblance utilis�ee dans le cadre de la mod�elisation de...
Conference Paper
In this paper we propose a topological1 model for image database query using neighborhood graphs. A related neighborhood graph is built from automatically extracted low-level features, which represent images as points of space. Graph exploration correspond to database browsing, the neighbors of a node represent similar images. In order to perform...
Conference Paper
Full-text available
This paper is concerned with the goodness-of-fit of induced decision trees. Namely, we explore the possibility to measure the goodness-of-fit as it is classically done in statistical modeling. We show how Chi-square statistics and especially the Log-likelihood Ratio statistic that is abundantly used in the modeling of cross tables, can be adapted f...
Conference Paper
Full-text available
This paper is concerned with the determination, in a crosstable, of the simultaneous merging of rows and columns that maximizes the association between the row and column variables. We present an heuristic, first introduced in [21], and discuss its complexity and reliability. The heuristic reduces drastically the complexity of the exhaustive sc...
Conference Paper
Full-text available
Decision tree methods generally suppose that the number of categories of the attribute to be predicted is fixed. Breiman et al., with their Twoing criterion in CART, considered gathering the categories of the predicted attribute into two superclasses. In this paper, we propose an extension of this method. We try to merge the categories in an optima...
Conference Paper
Full-text available
Cet article est consacr{\'e} {\`a} l�{\'e}valuation statistique des descriptions de tables de contingence fournies par les arbres d�induction. On se limite au cas particulier de donn{\'e}es cat{\'e}gorielles. Trois aspects sont successivement abord{\'e}s. i) La nature de l�ajustement en apprentissage supervis{\'e}, o� l�on souligne la distinctio...
Conference Paper
In this paper, we propose a technique for detection and segmentation of skin color areas. This method is based on data mining and image analysis techniques for skin model definition. This model is able to classify skin-color and non skin color pixels using different color spaces. Our method is using data mining techniques in order to produce classi...
Conference Paper
In this paper, we build predictors which are able to detect the executive curriculum vitw (CV). The corpus used is composed by executive and non-executive CV and is very unbalanced. Indeed it is composed by more than 90% of non-executive CV. Low structure, scattered information, strongly symbolic representation are some of the characteristics that...
Conference Paper
Full-text available
Our research thematic deals with the representation quality and the outlier detection in supervised learning. The prediction of a continuous value is referred to as regression learning. In this case, once constructed the neighbourhood graph resulting from the predictors. We prodused in a recent work to evaluate the representation quality using neig...
Conference Paper
Data interactive exploration and knowledge visualization are too ofteiz neglected in Data Mini tools. Ongoing work presented in lb is paper aims at filling this gap. Making the hypothesis that too much data kills data, we propose to build graphically displayed interactive contingency. Manipulation primitives, inspired from dynamic queries and OL4P,...