Project

ALP: Arabic Linguistic Tool

Goal: ALP is an Arabic linguistic pipeline that performs the following tasks:
* Arabic words tokenization (segmentation);
* Arabic words POS-tagging;
* Arabic lemmatization;
* Arabic named entity recognition;
* Arabic chunking.

The tool is free for research and personal tool.

The tool web page:

http://arabicnlp.pro


The tool is free for research and personal tool.

Updates
0 new
5
Recommendations
0 new
6
Followers
0 new
49
Reads
3 new
4235

Project log

Abed Alhakim Freihat
added a research item
This paper presents ALP, an entirely new linguistic pipeline for natural language processing of text in Modern Standard Arabic. Contrary to the conventional pipeline architecture , we solve common NLP operations of word segmentation, POS tagging, and named entity recognition as a single sequence labeling task. Based on this single component , we also introduce a new lemmatizer tool that combines a machine-learning-based and dictionary-based approaches , the latter providing increased accuracy, robustness, and flexibility to the former. The presented pipeline configuration results in a faster operation and is able to provide a solution to the challenges of processing Modern Standard Arabic, such as the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels. https://doi.org/DOI Received ..; revised ..; accepted .. Abstract:
Abed Alhakim Freihat
added an update
The tool contains now the Arabic chunking (the analysis of sentences that groups constituent parts of sentences (nouns, verbs, adjectives, etc.) to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.)) module (The opennlp machine learning based chunking).
you can test the new Arabic chunking tool at :
your feedback is appreciated.
 
Abed Alhakim Freihat
added a research item
Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this presentation, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former.
Abed Alhakim Freihat
added a research item
Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this paper, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former. Our evaluations yield a performance of over 98% for the entire lemmatization pipeline. The lemmatizer tools are freely downloadable for private and research purposes.
Abed Alhakim Freihat
added a research item
Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key pre processing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this paper, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former. Our evaluations yield a performance of over 98% for the entire lemmatization pipeline. The lemmatizer tools are freely downloadable for private and research purposes.
Abed Alhakim Freihat
added an update
The tool contains now the Arabic lemmatizion (extracting the dictionary forms of the words) module (The opennlp machine learning based lemmatizer).
you can test the Arabic lemmatization tool at :
The current lemmatizer does not deal with verb ambiguities in Arabic. There are ambiguous cases in Verbs like the verb "يسيل" which may be the surface form of the following verb lemmas {"سال ","أسال ", "سيل "}. In the current version it returns one lemma only. In the next upgrade, the lemmatizer will be able to return all possible lemmas of such ambiguous verb forms. The next release is expected to be by the end of July.
In any case,  the accuracy of the current lemmatizer (600 sentences, 11000 tokens) is 99%.
Again: the tool is free for research.
 
Abed Alhakim Freihat
added 2 research items
The presentation of the paper "A Single-Model Approach for Arabic Segmentation, POS-Tagging and Named Entity Recognition" presented at the the 2nd International Conference on Natural Language and Speech Processing ICNLSP 2018, 25-26,April,2018, Algiers, Algeria
Abed Alhakim Freihat
added a research item
This paper presents an entirely new, one-million-word annotated corpus for a comprehensive, machine-learning-based preprocessing of text in Modern Standard Arabic. Contrarily to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifes downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is publicly free for research purposes.
Abed Alhakim Freihat
added an update
A demo interface for the Arabic Linguistic Pipeline (ALP) is here:
 
Abed Alhakim Freihat
added a project goal
ALP is an Arabic linguistic pipeline that performs the following tasks:
* Arabic words tokenization (segmentation);
* Arabic words POS-tagging;
* Arabic lemmatization;
* Arabic named entity recognition;
* Arabic chunking.
The tool is free for research and personal tool.
The tool web page:
The tool is free for research and personal tool.