Supervised collaboration for syntactic annotation of Quranic Arabic

Language Resources and Evaluation (Impact Factor: 0.66). 01/2013; 47(1):1-30. DOI: 10.1007/s10579-011-9167-7

ABSTRACT The Quranic Arabic Corpus ( is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation
including part-of-speech tagging, morphological segmentation (Dukes and Habash 2010) and syntactic analysis using dependency grammar (Dukes and Buckwalter 2010). The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400year-old
central religious text of Islam. This project contrasts with other Arabic treebanks by providing a deep linguistic model based
on the historical traditional grammar known as i′rāb (إعراب). By adapting this well-known canon of Quranic grammar into a familiar tagset, it is possible to encourage online
annotation by Arabic linguists and Quranic experts. This article presents a new approach to linguistic annotation of an Arabic
corpus: online supervised collaboration using a multi-stage approach. The different stages include automatic rule-based tagging,
initial manual verification, and online supervised collaborative proofreading. A popular website attracting thousands of visitors
per day, the Quranic Arabic Corpus has approximately 100 unpaid volunteer annotators each suggesting corrections to existing
linguistic tagging. To ensure a high-quality resource, a small number of expert annotators are promoted to a supervisory role,
allowing them to review or veto suggestions made by other collaborators. The Quran also benefits from a large body of existing
historical grammatical analysis, which may be leveraged during this review. In this paper we evaluate and report on the effectiveness
of the chosen annotation methodology. We also discuss the unique challenges of annotating Quranic Arabic online and describe
the custom linguistic software used to aid collaborative annotation.

KeywordsCollaborative annotation–Arabic–Treebank–Quran–Corpus

  • Source
    Edited by Hardie, A., Atwell,E, 01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Muslims considers Quran as the greatest of the heavenly Books; the most complete, perfect and free of any alterations or variations. The message of the Qur'an is couched in various literary structures, which are widely believed to be the most perfect example of the Arabic language. By general consensus of Muslim rhetoricians, the Qur'anic idiom is regarded to be sublime. Due to its grand linguistic mechanism and selective usage of words, a perfect translation is an extremely difficult endeavor. Even native Arabic speaking people of contemporary age also find it difficult to comprehend the exact meaning of some words, idioms or statements mentioned in this noble scripture. This paper has two main objectives, first: to consolidate the challenges of Quranic translations from different researchers point of view, second: to proposes a conceptual model in order to overcome the challenges identified in this study. The conceptual model is intended towards building a knowledge based Quran translation that can be easily and emphatically provide the exact meaning, synonyms, derivation, root and cause of particular words, idioms, statements or verses mentioned in the Quran.
    Proceedings of the International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, Taibah University, Madinah, Saudi Arabia; 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Noble Qur'an is considered to be the central religious text of Islam. Any linguistic or literary research with the use computational technologies of this text is benefitted by billions of people around the world. It has been observed that this research approach of Qur'anic Computation has strongly established its base in the research and application. This study reviews the evolution of computational effort on the noble book of Quran, both from the research and application point of view. The purpose of this review has been achieved through an exploratory study of several research literatures and various applications documentation. Based on this objective, the study notice that Quranic Computation developed through various researches and application has common goals of achieving easy understanding of the Quran, but have chosen distinct complementary methodology and techniques to achieve it. A snowball technique of collection, classification and categorization of articles or documents from 1997 to 2011 has adopted in the review.
    Proceedings of the International Conference on Advances in Information Technology for the Holy Quran and Its Sciences., Taibah University, Madinah, Saudi Arabia; 12/2013

Full-text (2 Sources)

Available from
May 17, 2014