Karim Benzineb’s research while affiliated with Shanghai University of International Business and Economics and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


Automated Patent Classification
  • Article

January 2011

·

160 Reads

·

38 Citations

Karim Benzineb

·

Patent classifications are built to set up some order in the growing number and diversity of inventions, and to facilitate patent information searches. The need to automate classification tasks appeared when the growth in the number of patent applications and of classification categories accelerated in a dramatic way. Automated patent classification systems use various elements of patents’ content, which they sort out to find the information most typical of each category. Several algorithms from the field of Artificial Intelligence may be used to perform this task, each of them having its own strengths and weaknesses. Their accuracy is generally evaluated by statistical means. Automated patent classification systems may be used for various purposes, from keeping a classification well organized and consistent, to facilitating some specialized tasks such as prior art search. However, many challenges remain in the years to come to build systems which are more accurate and allow classifying documents in more languages.


myClass: A Mature Tool for Patent Classification
  • Conference Paper
  • Full-text available

October 2010

·

171 Reads

·

19 Citations

In this task 2,000 patents in three languages (English, French and German) were to be classified among approximately 600 categories. We used a classifier based on neural networks of the Winnow type. This classi- fier is already used for similar tasks in professional applications. We test- ed three different approaches to improve the classification accuracy: the first one aimed at solving the issue of poorly-documented categories, the second one was meant to enrich the overall training corpus and the third one was based on the processing of the corpus' collocations. Although we ranked first in this competition, none of the three approaches mentioned above provided for a clear improvement in classification accuracy; our re- sults were essentially due to the implementation of the classification algo- rithm itself.

Download

UniGE Experiments on Prior Art Search in the Field of Patents

September 2009

·

115 Reads

·

2 Citations

Lecture Notes in Computer Science

In this experiment led at the University of Geneva (UniGE), we evaluated several similarity measures as well as the relevance of using automated classification to filter out search results. The patent field is particularly well suited to classification-based filtering because each patent is already classified. Our results show that such a filtering approach does not improve searching performances, but it does not have a negative impact on recall either. This last observation allows considering classification as a possible tool to reduce the search space without reducing the quality of search results.


Fig. 1. Example of WordNet-based document annotation 
Table 1 . Baseline results in terms of MAP
Table 2 . WSD-based runs results in terms of MAP
Table 3 . Official results in terms of MAP for the monolingual task
Analysis of Word Sense Disambiguation-Based Information Retrieval

September 2008

·

138 Reads

·

10 Citations

Lecture Notes in Computer Science

Several studies have tried to improve retrieval performances based on automatic Word Sense Disambiguation techniques. So far, most attempts have failed. We try, through this paper, to give a deep analysis of the reasons behind these failures. During our participation at the Robust WSD task at CLEF 2008, we performed experiments on monolingual (English) and bilingual (Spanish to English) collections. Our official results and a deep analysis are described below, along with our conclusions and perspectives.


UNIGE Experiments on Robust Word Sense Disambiguation

January 2008

·

22 Reads

·

2 Citations

This task was meant to compare the results of two d ifferent retrieval techniques: the first one was ba sed on the words found in documents and query texts; the secon d one was based on the senses (concepts) obtained b y disambiguating the words in documents and queries. The underlying goal was to come up with a more prec ise knowledge about the possible improvements brought b y word sense disambiguation (WSD) in the informatio n retrieval process. The proposed task structure was interesting in that it drew up a clear separation b etween the actors (humans or computers): those who provide the corpus, those who disambiguate it, and those who q uery it. Thus it was possible to test the universality and t he interoperability of the methods and algorithms i nvolved.


Figure 1. Exemple de recherche bi-texte (sur le site de MetaRead)  
Figure 2. Les trois gestionnaires de base dans les deux modes principaux d'utilisation.
Figure 3. Structures associées à la gestion du dictionnaire de termes. Nous utilisons une recherche linéaire pour gérer les collisions. Le déplacement linéaire permet une meilleure réutilisation des caches 8 par rapport à un déplacement quadratique. Le facteur de charge de la table est toujours inférieur à 0.5. En résumé, la structure de hashage implémentée favorise une table compacte avec un accès aléatoire déterminé par la fonction de hashage. La gestion des collisions par une méthode linéaire assure une proximité au premier accès.  
Figure 8. Parallélisation du gestionnaire de segments d'index  
Construire un moteur d'indexation

August 2006

·

1,179 Reads

·

5 Citations

Ingénierie des systèmes d information

Nous présentons ici un moteur d'indexation ayant fait l'objet d'un transfert technologique entre l'université et l'industrie. Ce moteur est actuellement intégré dans des applications utilisées par les organisations internationales. Les corpus indexés sont volumineux et multilingues. En partant des spécificités du cahier des charges, nous examinons les choix d'architecture et de technologies effectués pour répondre aux contraintes de performance et de volumétrie. L'utilisation optimale des ressources de mémoire, de calcul et de stockage est discutée. Le séquençage et la parallélisation des processus sont examinés. ABSTRACT. We present here an indexing engine which is covered by a technology transfer agreement between the University and the private sector. This engine is currently included in various applications used by international organizations. The document collections which are indexed are large and multilingual. The particular elements of the technical specifications are the starting point of our analysis; then we look at the design and technology choices made to meet the performance and volume constraints. The optimal use of memory, calculation and storage resources is discussed. The serialization and parallelization of processes are analyzed. MOTS-CLÉS : indexation, document, performance, architecture.

Citations (6)


... The second best scoring team in MAP was UCM, which did attain MAP and GMAP improvements using WSD (from 38.34 MAP – 15.28 GMAP in their best non-WSD run to 39.57 MAP – 16.17 GMAP in their best WSD run) [36]. The third best scoring team in MAP was GENEVA who achieved lower scores on both MAP and GMAP using WSD information [22]. The fourth best team, IXA, obtained better MAP results using WSD information (from 38.10 to 38.99 MAP), but lower GMAP (from 15.72 to 15.52) [34]. ...

Reference:

CLEF 2008: Ad hoc track overview
UNIGE Experiments on Robust Word Sense Disambiguation
  • Citing Article
  • January 2008

... Çeşitli makalelerde kNN, SVM, naive Bayes, karar ağacı ve lojistik regresyon gibi yöntemler kullanılmıştır [14,21,22]. Daha sonra, yapay sinir ağı yöntemleri [23,24] ve sezgisel yöntemler [25,26] kullanılarak sınıflandırma çalışmaları gerçekleştirilmiştir. Bilimsel çalışmaların büyük veri üzerinde yoğunlaşmasıyla birlikte CNN [27,28], RNN ve BiLSTM gibi algoritmaların kullanıldığı çalışmalar literatürde yerini almıştır. Önceden eğitilmiş kelime gömme sistemleri ile birlikte veri kümelerini eğitmek için fazla zaman harcamadan hazır veri kümeleri kullanılarak analizler yapılmış ve bu analizlerde yine ön işleme süreci gerçekleştirilmiştir [29]. ...

Automated Patent Classification
  • Citing Article
  • January 2011

... In the last decades, there is a large number of works focusing on building machine learning models for patent classification. Earlier works used traditional machine learning methods such as k-Nearest neighbors (k-NN) [6], support vector machine (SVM) [6,26,5], Naive Bayes (NB) [6,5], k-means clustering [10] and artificial neural networks [24,8]. ...

myClass: A Mature Tool for Patent Classification

... In supervised methodology [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25], sense disambiguation of words is performed with the help of previously created learning sets. These learning sets contain related sentences for a particular sense of an ambiguous word. ...

Analysis of Word Sense Disambiguation-Based Information Retrieval

Lecture Notes in Computer Science

... Patent classification requires to index very large training corpora (in this experience, the size of the raw corpus was 85 Gb). Thus we linked our classifier to a well-performing home-made indexer [3] and search engine. Both of those tools have functions which are specifically adapted to sup- port the classifier, in particular through various linguistic processing such as pattern or collocation detection. ...

Construire un moteur d'indexation

Ingénierie des systèmes d information