Abdel EnnajiUniversité de Rouen Normandie | UR
Abdel Ennaji
About
78
Publications
15,794
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
946
Citations
Publications
Publications (78)
The study of contemporary tweet-based Entity Linking (EL) systems reveals a lack of a standard definition and a consensus on the task. Specifically, identifying what should be annotated in texts remains a recurring question. This prevents proper design and fair evaluation of EL systems. To tackle this issue, the present paper introduces a set of ru...
Codebook-based writer characterization is an effective technique that has been investigated in a number of recent studies on identification and verification of writers. These methods divide a set of writing samples into small units (fragments or graphemes) and cluster these patterns to produce a codebook. Writer of a handwritten sample is then char...
Gender Classification from handwriting is still considered to be challenging due to homogeneous vision comparing male and female handwritten documents. This paper presents a new method based on Cloud of Line Distribution (COLD) and Hinge feature for distinguishing the gender from handwriting. The SVM classifier combination decides the assigned clas...
Text indexing aims to take the full advantage of textual data to help intelligent programs to make relevant decisions. In order to explore a large amount of textual documents, and to disclose semantic information hidden in unstructured documents, like texts, an effective indexation system is required. In this paper, we propose a new approach for in...
Indexing unstructured documents aims to build a list of words, or concepts, which will simplify the exploration of their exploration later on. The most used model for text modeling is the Vector Space Model. In spite of the simplicity of this model in its implementation and its wide use in different researches in the field of text mining and inform...
In this paper, we bring an improvement to the classical fuzzy model of classification by implementing a new approach which based on radial basis functions for the Arabic documents classification. This approach takes into account the concept of semantic vicinity by calculating of the similarity degree between terms in relation to the documents. We c...
Automatic keyphrases extraction is to extract a set of phrases that are related to the main topics discussed in a document. They have served in several areas of text mining such as information retrieval and classification of a large text collection. Consequently, they have proved their effectiveness. Due to its importance, automatic keyphrases extr...
The increase of textual information published in Arabic language on the internet, public libraries and administrations requires implementing effective techniques for the extraction of relevant information contained in large corpus of texts. The purpose of indexing is to create a document representation that easily find and identify the relevant inf...
Extracting knowledge from text data and taking its full advantage has been an im-portant way to reduce its computation and accelerate processing, especially for large amounts of data. Thus, different approaches and methodologies for model-ing and representing textual data have been proposed. In this paper, a graph-based approach for automatic index...
This competition is aimed at classification of writer demographics from offline handwritten documents using the QUWI database. QUWI is a bilingual database comprising writing samples of same individuals in Arabic and English. This allows evaluating the performance of different systems in a more challenging multi-script environment. This paper prese...
Documents indexing is the main step in a conventional document classification or information retrieval framework. This study aims to highlight the influence of features' type on the efficiency of a classification system. Empirical results on Arabic dataset reveal that the choice of extracted feature's type has a significant impact on conserving sem...
L'indexation des documents est une phase cruciale dans le processus de fouille de textes. Elle permet de représenter les documents par les descripteurs les plus pertinents vis-à-vis de leurs contenus. À ce propos, plusieurs approches sont proposées dans la littérature, notamment pour l'anglais, mais elles sont inexploitables par les documents en la...
Libraries contain huge amounts of arabic printed historical documents which cannot be available on-line because they do not have a searchable index. The word spotting idea has previously been suggested as a solution to create indexes for such a collecton of documents by matching word images. In this paper we present a word spotting method for arabi...
Biometric identification of persons has mainly been based on fingerprints, face, iris and other similar attributes. We propose a handwriting-based biometric identification system using a large database of Arabic handwritten documents. The system first extracts, from each handwritten sample, a set of features including run lengths, edge-hinge and ed...
In this paper, we propose a hybrid system for contextual and semantic indexing of Arabic documents, bringing an improvement to classical models based on n-grams and the TFIDF model. This new approach takes into account the concept of the semantic vicinity of terms. We proceed in fact by the calculation of similarity between words using an hybridiza...
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in...
Codebook-based representations have been effectively employed for writer identification. Most of the codebook-based methods generate a codebook by clustering a set of patterns extracted from an independent data set. The probability of occurrence of the codebook patterns in a given writing is then used to characterize its author. This study investig...
This paper presents a method for multilingual artificial text detection and extraction from still images. The proposed detection scheme relies on a cascade of spatial transforms followed by a box counting based fractal dimension approach to exploit the self-similar redundancy of patterns in the shapes of characters in the text. The detected text re...
Libraries contain huge amounts of Arabic printed historical documents which cannot be available on-line because they do not have a searchable index. The word spotting idea has previously been suggested as a solution to create indexes for such a collection of documents by matching word images. In this paper we present a word spotting method for Arab...
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in...
In this paper, we propose a system for contextual and semantic Arabic
documents classification by improving the standard fuzzy model. Indeed,
promoting neighborhood semantic terms that seems absent in this model by using
a radial basis modeling. In order to identify the relevant documents to the
query. This approach calculates the similarity betwee...
In this paper a new approach based on dynamic selection of ensembles of classifiers is discussed to improve handwritten recognition system. For pattern classification, dynamic ensemble learning methods explore the use of different classifiers for different samples, therefore, may get better generalization ability than static ensemble learning metho...
Identifying the writer of a handwritten document has been an active research area over the last few years with applications in biometrics, forensics, smart meeting rooms and historical document analysis. In this paper, we present a new writer identification system based on a retrieval mechanism. Texture based edge-hinge and run-length features are...
Recognizing the writer of a handwritten document has been an active research area over the last few years and is at the heart of many applications in biometrics, forensics and historical document analysis. In this paper, we present a novel approach for text-independent writer recognition from Arabic handwritten documents. To characterize the handwr...
In the field of document image processing, the text/graphic separation is a major step that conditions the performance of the recognition and indexing systems. That involves identifying and separating the graphical and textual components of a document image. In this context, it is important to implement approaches that effectively address these pro...
The separation of text / image is a major step in the processing of documents images. It consists of separating the document into two classes: text and image. In this context, it is important to implement approaches that can handle such documents. This paper presents a new method of separating text / image into a document image. The method develope...
In this paper, we extended the vectorial model of Salton [9], [11], [12] and [14], by adapting the TF-IDF parameter by its
combination with the Okapi formula for index terms extraction and evaluation of the in order to identify the relevant concepts
which represent a document.Indeed, we have proposed a new measure TFIDF-ABR which takes in considera...
The first observation concerning Arabian manuscript reveals the complexity of the task, especially for the used classifiers
ensemble. One of the most important steps in the design of a multi-classifier system (MCS), is the its components choice (classifiers).
This step is very important to the overall MCS performance since the combination of a set...
In this paper, we present a new approach to the temporal order restoration of the off-line handwriting. After the pre- processing steps of the word image, a suitable algorithm makes it possible to segment its skeleton in three types of strokes. After that, we developed a genetic algorithm GA in order to optimize the best trajectory of these segment...
In this paper we present a system of the off-line handwriting recognition. Our recognition system is based on temporal order restoration of the off-line trajectory. For this task we use a genetic algorithm (GA) to optimize the sequences of handwritten strokes. To benefit from dynamic informations we make a sampling operation by the consideration of...
In this paper, we describe an off-line unconstrained handwritten Arabic word recognition system based on segmentation-free approach and semi-continuous hidden Markov models (SCHMMs) with explicit state duration. Character durations play a significant part in the recognition of cursive handwriting. The duration information is still mostly disregarde...
We describe an offline unconstrained Arabic handwritten word recognition system based on segmentation-free approach and discrete hidden Markov models (HMMs) with explicit state duration. Character durations play a significant part in the recognition of cursive handwriting. The duration information is still mostly disregarded in HMM-based automatic...
This paper describes an off-line segmentation-free handwritten Arabic words recognition system. The described system uses discrete HMMs with explicit state duration of various kinds (Gauss, Poisson and gamma) for the word classification purpose. After preprocessing, the word image is analyzed from right to left in order to extract from it a sequenc...
Résumé : Dans cet article nous proposons une approche combinée pour la reconnaissance hors-ligne des mots manuscrits arabes dans un vocabulaire limité. Cette approche est basée sur une combinaison séquentielle d'une approche globale avec une approche analytique. L'approche globale (utilisée afin de filtrer les entrées du lexique) modélise chaque mo...
This article deals with the use of genetic algorithms to optimize the architecture of a neural network. After a brief recall of our original neural network (named Yprel network), we show that a simulated-annealing-like technique has been advantageously replaced by genetic operators. Indeed, tests on character recognition (NIST handwritten database)...
This paper presents a multi-classifier system design controlled by the topology of the learning data. Our work also introduces a training algorithm for an incremental self-organizing map (SOM). This SOM is used to distribute classification tasks to a set of classifiers. Thus, the useful classifiers are activated when new data arrives. Comparative r...
This paper introduces a new scheme for the general problem of classification task-solving by designing a multi-classifier system. The distribution process respects the data topology in the feature space in order to reach reliable decisions. To this end we use a self-organizing network which gives a graph that represents the data topology. During th...
Summary form only given. In this paper, we present a system of restoration of temporal order in the offline Arabic handwritten tracing. The word image, captured in level of gray from a scanner with a resolution of 300 dpi, passes by four stages of preprocessing: binarization, filtering, smoothing and elimination of the diacritical signs. A first al...
This paper presents a multi-classifier system design controlled by the topology of the learning data. Our work also introduces a training algorithm for an incremental self-organizing map (SOM). This SOM is used to distribute classification tasks to a set of classifiers. Thus, the useful classifiers are activated when new data arrives. Comparative r...
An incremental and growing network model is introduced which is able to learn the topological relations in a given set of input vectors by means of a simple Hebb-like learning rule. We propose a new algorithm for a SOM which can learn new input data (plasticity) without degrading the previously trained network and forgetting the old input data (sta...
This paper presents a multi-classifier system design controlled by the topology of the learning data. Our work also introduces a training algorithm for an incremental self-organizing map (SOM). This SOM is used to distribute classification tasks to a set of classifiers. Thus, the useful classifiers are activated when new data arrives. Comparative r...
Le travail présenté dans cet article tente d'apporter une contribution dans l'optique de la conception d'un système d'apprentissage incrémental. La démarche adoptée consiste à mettre en place un système de classification multiple où un ensemble de classifieurs de base sont pilotés par une carte neuronale auto-organisatrice. Celle-ci permet de rendr...
This article describes an approach to designing a distributed and modular neural classifier. This approach introduces a new hierarchical clustering that enables one to determine reliable regions in the representation space by exploiting supervised information. A multilayer perceptron is then associated with each of these detected clusters and charg...
L'asthme represente une maladie chronique frequente qui, malgre des traitements efficaces disponibles, reste insuffisamment controlee. Dans ce cadre, nous proposons la mise au point d'un systeme d'aide a la decision pour la prise en charge de l'asthme. Notre premiere contribution se situe dans le choix du raisonnement a partir de cas (RaPC) comme m...
Asthma is a distressing disease, affecting up to 7% of the French population and causing considerable morbidity and mortality. A medical decision support system such can help physicians to control this chronic disease. Thanks to the health care network (RESALIS) of Fedialis Médica (disease management branch from GlaxoSmithKline), asthma consultatio...
Asthma is a distressing disease, affecting up to 7% of the French population and causing considerable morbidity and mortality. A medical decision support system such can help physicians to control this chronic disease. Thanks to the health care network (RESALIS) of Alliance Médica (disease management branch from GlaxoSmithKline), asthma consultatio...
A method for Arabic and Latin text block differentiation for printed and handwritten scripts is proposed. This method is based on a morphological analysis for each script at the text block level and a geometrical analysis at the line and the connected component level. In this paper, we present a brief survey, of existing methods used for scripts di...
An Arabic text analysis system called AABATAS (affixal approach-based Arabic text analysis system) is proposed. AABATAS recognizes and categorizes the words while identifying their morphological and grammatical characteristics. It is based on a new approach for Arabic word recognition called affixal approach. This affixal approach is guided by the...
This paper focuses on the problem of cluster analysis when data present high variations of density. The proposed method is based upon a hierarchical clustering and enables one to determine the clusters without any assumption on their number nor their statistical distribution. This method is used to design an efficient distributed neural classifier...
Describes an automatic method for building distributed neural classifiers for pattern recognition. The methodology is based on the detection of reliable regions in the representation space, i.e. clusters exclusively composed of patterns from the same class. This detection is performed using a hierarchical clustering method associated with the super...
This article describes recent improvements of an original neural network building method which could be applied in the particular
case of 2 input neurones. After a brief recall of the main building principles of a neural net, authors introduce the capability
for a neurone to receive more than 2 inputs. Two problems then arise: how to chose the inpu...
This paper provides a guide to evolving-architecture neural
networks for a beginner in multi-layer perceptrons. All the quoted
methods aim at automatically fitting a neural network architecture to a
particular classification task. Several kinds of evolving architectures
are exposed. Some neural networks start small and become bigger and
bigger duri...
This article describes a new approach to the automated construction of a distributed neural classifier. The methodology is based upon supervised hierarchical clustering which enables one to determine reliable regions in the representation space. The proposed methodology proceeds by associating each of these regions with a Multi-Layer Perceptron (ML...
In this paper we present a scheme of classification based on a
particular processing element (neuron) called yprel. The main
characteristics of the approach are: (1) an yprel classifier is a set of
yprels networks, each network being associated with a particular class;
(2) the learning is supervised and conducted class by class; (3) the
structure o...
In this paper we present a scheme of classification based on a particular processing element (neuron) called Yprel. The main characteristics of the approach are: (i) an Yprel classifier is a set of Yprels networks, each network being associated with a particular class; (ii) the learning is supervised and conducted class by class; (iii) the structur...
In this paper we present a scheme of classification based on a
particular processing element (“neuron”) called Yprel. The
main characteristics of the approach are: (1) a Yprel classifier is a
set of Yprel nets, each net being associated to a particular class; (2)
the learning is supervised and conducted class by class; (3) the
structure of the net...
We present two connectionist modular approaches which are potentially able to deal with real applications as their size does not increase drastically with the size of the problem. The first model relics on a very simple cooperation of modular MLP networks specially designed for some sub-tasks. The second is based on a new methodology using a partic...
This article describes a new algorithm to treat time incremental data by a hierarchical clustering. Although hierarchical clustering techniques enable one to automatically determine the number of clusters in a data set, they are rarely used in industrial applications, because a large amount of memory is required when treating more than 10,000 eleme...
This article describes an automatic method for building of distributed neural classifiers for pattern recognition. The methodology is based upon the detection of reliable regions in the representation space, i.e. clusters exclusively composed of patterns from the same class. This detection is performed using a hierarchical clustering method associa...
Asthma is a distressing disease, affecting up to 7% of the French population and causing considerable morbidity and mortality. A medical decision support system can help physicians to control this chronic disease. Thanks to the health care network (RESALIS ) of Alliance Médica Society, asthma consultation data were collected. We chose Case-Based R...
Résumé Nous proposons dans cet article une méthode originale de détermination du nombre et de la composition des agrégats (i.e. classes au sens non supervisé) présents dans une base de données à partir de l'analyse d'une hiérarchie indicée. Notre méthode est basée sur le principe d'une coupure multi-niveaux dans la hiérarchie permettant d'adapter l...
Ce papier fait suite aux travaux initiés dans [Kanoun02] portant sur la reconnaissance de textes arabes imprimés. Le prototype proposé ici s'appuie de la même manière sur l'utilisation d'un moteur de vérification morpho-syntaxique de la langue arabe pour filtrer les hypothèses générées à la suite d'une reconnaissance analytique de chaque mot. L'arc...