
Eric AtwellUniversity of Leeds · School of Computing
Eric Atwell
PhD Leeds
Teaching and research in Artificial Intelligence for language
About
418
Publications
409,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,786
Citations
Introduction
Additional affiliations
October 1984 - present
October 1984 - present
October 1984 - present
Publications
Publications (418)
The immense volume of online information has made verifying claims’ credibility more complex, increasing interest in automatic fact-checking models that classify evidence into binary or multi-class verdicts. However, there are few studies on predicting textual verdicts to explain claims’ credibility. This field focuses on generating a textual verdi...
This study focuses on the detection of bias in news articles from a British research-intensive university, given the substantial significance of higher education institutions as information sources and their considerable influence in shaping public opinion. While prior research has underscored the existence of bias in news content, there has been l...
Social media's fast-growing popularity and convenient approval of unknown accounts have promoted an environment where unidentified users can act maliciously, for instance, by spreading fake news. Even though these social networks have been motivating researchers to deter such occurrences, they have not overcome this dilemma due to the immense volum...
Social media's fast-growing popularity and convenient approval of unknown accounts have promoted an environment where unidentified users can act maliciously, for instance, by spreading fake news. Even though these social networks have been motivating researchers to deter such occurrences, they have not overcome this dilemma due to the immense volum...
Many research efforts have managed to segment Hadith into Isnad and Matan without studying whether the Hadith contains its two main parts or not. This paper sought to classify Arabic Hadith into three different classes: Isnad, Matan and Full Hadith. This is considered to be the first step towards segmenting Hadith. In addition, this paper aimed to...
The rapidly increasing popularity of social networking sites and the widespread acceptance of anonymous users have encouraged an environment where unidentified accounts can act maliciously and propagate fake news. The motivation behind that could either be to begin hype or to gain individuals' attention and negatively impact society. Several studie...
Nowadays, younger generation is much more exposed to technology than previous generations used to. The recent advances in artificial intelligence (AI) and particularly natural language processing (NLP) and understanding (NLU) make it possible to reinforce and widespread the adoption of AI chatbots in education not only to help students in their adm...
The fifth edition of the "CheckThat! Lab" is one of the 2022 Conference and Labs of the Evaluation Forum (CLEF) and aims to evaluate advances supporting three factuality-related tasks, covering several languages. Our team (SCUoL) participated in task 3A, which concentrates on multi-class fake news detection of English news articles. This paper desc...
Question answering is a specialized area in the field of NLP that aims to extract the answer to a user question from a given text. Most studies in this area focus on the English language, while other languages, such as Arabic, are still in their early stage. Recently, research tend to develop question answering systems for Arabic Islamic texts, whi...
As a consequence of the recent advances in artificial intelligence and educational technologies, the education sector is witnessing significant changes and transformations through massive use of intelligent systems with the goal to assist students in their learning experience and teachers in delivering academic knowledge in a better way while reduc...
The text of the Qur’an has been analysed, segmented and annotated by linguists and religious scholars, using a range of representations and formats, Quranic resources in different scopes and formats can be difficult to link due to their complexity. Qur’an segmentation and annotation can be represented in a heterogeneous structure (e.g., CSV, JSON,...
Mature students transitioning into their first year of higher education face many difficulties that affect their motivation, participation and success. Their feelings of being disconnected from their peers and from their institutions are among the key barriers to the successful completion of their courses. Encouraging online student engagement amon...
Due to the increasing numbers of Hadith forgeries, it has become necessary to use artificial intelligence to assist those looking for authentic Hadiths. This paper presents detailed research on ways to automatically detect Hadith authenticity in Arabic Hadith texts. It examines the utilization of deep learning-based and prediction by partial matchi...
Nowadays, younger generation is much more exposed to technology than previous generations used to. The recent advances in artificial intelligence (AI) and particularly natural language processing (NLP) and understanding (NLU) make it possible to reinforce and widespread the adoption of AI chatbots in education not only to help students in their adm...
Many people nowadays tend to explore social media to obtain news and find information about various events and activities. However, an abundance of misleading and false information is spreading every day for many purposes, dramatically impacting societies. Therefore, it is vitally important to identify false information on social media to help indi...
Semantic similarity analysis in natural language texts is getting great attention recently. Semantic analysis of the Quran is especially challenging because it is not simply factual but encodes subtle religious meanings. Investigating similarity and relatedness between the Quranic verses is a hot topic and can promote the acquisition of the underly...
Predefined web surveys are often used to collect course evaluations from students in higher education institutions. These institutions use the evaluations to adjust their courses’ pedagogical standards and lecture style to cope with an increasingly uncertain and complex world. Many limitations to using web surveys have been reported such as low res...
In this paper we explore the use of Prediction
by partial matching (PPM) compression
based to segment Hadith into its two
main components (Isnad and Matan). The
experiments utilized the PPMD variant of
the PPM, showing that PPMD is effective
in Hadith segmentation. It was also tested
on Hadith corpora of different structures.
In the first experimen...
Predefined web surveys are often used to collect course evaluations from students in higher education institutions. These institutions use the evalua-tions to adjust their courses’ pedagogical standards and lecture style to cope with an increasingly uncertain and complex world. Many limitations to us-ing web surveys have been reported such as low r...
In recent years, research in Natural Language Processing (NLP) on Arabic has garnered significant attention. This includes research about classification of Arabic dialect texts, but due to the lack of Arabic dialect text corpora this research has not achieved a high accuracy. Ara-bic dialects text classification is becoming important due to the inc...
Semantic similarity analysis in natural language texts is getting great attention recently. Semantic analysis of the Quran is especially challenging because it is not simply factual but encodes subtle religious meanings. Investigating similarity and relatedness between the Quranic verses is a hot topic and can promote the acquisition of the underly...
The primary religious text of Islam is the Quran. The Hadith—the second source—refers to any action, saying, order or silent approval of the holy prophet Muhammad that has been delivered through a chain of narrators. Each Hadith has an Isnad—the chain of narrators—and a Matan—the act of the Prophet Muhammad. In contrast to the Quran, some Hadiths,...
The Quran is known for its linguistic and spiritual value. It comprises knowledge and topics that govern different aspects of people’s life. Acquiring and encoding this knowledge is not a trivial task due to the overlapping of meanings over its documents and passages. Analysing a text like the Quran requires learning approaches that go beyond word...
The Quran is known for its linguistic and spiritual value. It comprises knowledge and topics that govern different aspects of people’s life. Acquiring and encoding this knowledge is not a trivial task due to the overlapping of meanings over its documents and passages. Analysing a text like the Quran requires learning approaches that go beyond word...
This article describes the process of gathering and constructing a bilingual parallel corpus of Islamic Hadith, which is the set of narratives reporting different aspects of the prophet Muhammad's life. The corpus data is gathered from the six canonical Hadith collections using a custom segmentation tool that automatically segments and annotates th...
The primary religious text of Islam is the Quran. The Hadith—the second source—refers to any action, saying, order or silent approval of the holy prophet Muhammad that has been delivered through a chain of narrators. Each Hadith has an Isnad—the chain of narrators—and a Matan—the act of the Prophet Muhammad. In contrast to the Quran, some Hadiths,...
The occurrence of code-switching in online communication, when a writer switches among multiple languages, presents a challenge for natural language processing tools, since they are designed for texts written in a single language. To answer the challenge, this paper presents detailed research on ways to detect code-switching in Arabic text automati...
This study aims to construct a corpus-informed list of Arabic Formulaic Sequences (ArFSs) for use in language pedagogy (LP) and Natural Language Processing (NLP) applications. A hybrid mixed methods model was adopted for extracting ArFSs from a corpus, that combined automatic and manual extracting methods, based on well-established quantitative and...
This Quran as the central religious text of Islam is widely regarded as the finest work in classical Arabic literature and plays an important role in Islam world. This paper studied and analyzed the Quran Chinese and English data, built the Quran Chinese and English words semantic knowledge base in which the grammar and semantic information of the...
Our Artificial Intelligence research group at the University of Leeds has collected, analysed and annotated Classical Arabic corpus resources: the Quranic Arabic Corpus with several layers of linguistic annotation; the QurAna Quran pronoun anaphoric co-reference corpus; the QurSim Quran verse similarity corpus; the Qurany Quran corpus annotated wit...
Over the past two decades, since around 2000, Arabic NLP researchers have investigated a variety of approaches to PoS-tagging and morphological analysis. In this paper, we present this research as a timeline or a list of events in chronological order, to illustrate the evolutionary development of the field.We present a timeline of 24 different appr...
Network security professions learning network intrusion should be able to see attack signatures and learn the different techniques to detect them. Wireshark is an open source cross-platform protocol analyzer with a user-friendly interface. Wireshark has a protocol dissector that supports over 2000 protocols. In the paper we assume that Network Intr...
This paper presents two sets of lexical items automatically extracted from the Arabic Quran, and denoting two different notions of linguistic salience: keyness and prosodic prominence. Our novel hypothesis investigates a possible correlation between them. Our novel findings discover distributionally significant keywords that also occur strategicall...
Modern Standard Arabic is the written standard across the Arab world; but there is an increasing use of Arabic dialects in social media, so this is appropriate as a source of a corpus for research on classifying Arabic dialect texts using machine learning algorithms. An important first step is annotation of the text corpus with correct dialect tags...
Ontology alignment is a necessary step for enabling interoperability between ontology entities and for avoiding redundancy and variation that may occur when integrating them. The automation of bilingual ontology alignment is challenging due to the variation an entity can be expressed in, in different ontologies and languages. The goal of this paper...
This paper introduces a novel resource for Arabic Qur'anic textual annotations: AQD, Arabic Qur'anic Database, providing an annotation-level search that draws on a number of available resources in a single query. In addition, it allows implementing a set of queries as rewrite rules, which is performed in a recursive way. The experiments show that o...
The main aim of developing a Quranic ontology is to facilitate the retrieval of knowledge from Al-Quran. Additionally, Quranic ontologies will enrich the raw Arabic and English Quran text with Islamic semantic tags. However, current Quran ontologies have different: scopes, formats, and entity names for the same concepts. Additionally, a single Qura...
We present a robust and accurate diacritization method of highly cited texts by automatically “borrowing” diacritization from similar contexts. This method of diacritization has been tested on diacritizing one book: “Riyad As-Salheen”, for the purpose of morphological annotation of the Sunnah Arabic Corpus. The original source of Riyad is about 48....
The practice and denotation of tense and aspect differ in Arabic and English, so there is a challenge when translating between the two languages, particularly when the appropriate translation depends on a range of linguistic contexts 1 , comprising also the context of use. In this paper, the Qur'anic Arabic corpus of verbs is used in Arabic with th...
There is so far only limited research that applies a corpus-based approach to the study of the Arabic language. The primary purpose of this paper is therefore to explore the verb systems of Arabic and English using the Quranic Arabic Corpus, focussing on their similarities and differences in tense and aspect as expressed by verb structures and thei...
There is so far only limited research that applies a corpus-based approach to the study of the Arabic language. The primary purpose of this paper is therefore to explore the verb systems of Arabic and English using the Quranic Arabic Corpus, focussing on their similarities and differences in tense and aspect as expressed by verb structures and thei...
We present Wasim, a web-based tool for semi-automatic morphosyntactic annotation of inflectional languages resources. The tool features high flexibility in segmenting tokens, editing, diacritizing, and labelling tokens and segments. Text annotation of highly inflectional languages (including Arabic) requires key functionality which we could not see...
Focusing on Classical Arabic, this paper in its first part evaluates morphological analysers and POS taggers that are available freely for research purposes, are designed for Modern Standard Arabic (MSA) or Classical Arabic (CA), are able to analyse all forms of words, and have academic credibility. We list and compare supported features of each to...
A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing this challenge is building an extensive reliable MWEs language resource LR with co...
The identification of domain-specific terms is a crucial step in many natural language processing applications. Term extraction is a process of obtaining a set of terms that represent the domain of a given text. The majority of term extraction research projects conducted for the Quran have used translated text instead of the original Classical Arab...
The Quranic Arabic Corpus is an important computational resource for research in Arabic. The main purpose of this paper is to provide some details of morphological and syntactic structures of Arabic and English verbs through computing studies of their use in the Quran. The paper will also highlight some investigations into the use of a sub-verb cor...
The aim of this study is to examine the challenges of handling verb tense and aspect in Arabic to English machine translation. A small corpus of selected Arabic sentences was submitted to Google Translate for a contrastive analysis of Arabic and English verb tense use. The main purpose of this study is to provide an understanding of morphology and...
This paper presents the compilation of a corpus of question-answer pairs for the holy Quran. The corpus has been manually collected from a wide range of sources, and designed to represent the Quran Arabic-English Question and Answer Corpus (QAEQ&AC). QAEQ&AC is a written, bilingual corpus, which comprises Arabic and English text. First, question-an...
Social media sites are the major source of user generated information on politics, products, ideas and services. Recently social media has become a value able resource for mining sentiment and opinions of public if the data is extracted from it reliably. In this study, a new framework is presented that uses social media network (twitter) stream dat...
Information Retrieval (IR) plays an important role in retrieving information related to the user’s query. IR relies on finding relevant data from a set of knowledge resources, such as the Quran. Finding information from the Quran can be based on metadata, indexing, or other content-based methods. The Quran is the most widely read book in the world...
Natural Language Processing Working Together with Arabic and Islamic Studies is a 2-year project funded by the UK Engineering and Physical Sciences Research Council (EPSRC) to study prosodic-syntactic mark-up in the Quran (Atwell et al 2013). Tajwīd or correct Quranic recitation is very important in Islam. The original insight informing this projec...
Given the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and other languages, there is a need to create dialect text corpora for use in Arabic natural language processing. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as...
In the field of Information Retrieval (IR), it may be difficult to answer a question posed by the user, because the search engine retrieves a ranked list of documents that may contain the answer inside the documents, but this needs extra effort from the user to search for the answer inside the documents, and there may be no answer. The alternative...
Given the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and other languages, there is a need to create dialect text corpora for use in Arabic natural language processing. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as...
This paper reviews search tools constructed for Information Retrieval from the Holy Quran. This paper evaluates these different search tools against 13 criteria depending on: search features, output features, precision of the retrieved verses, recall database size and types of database contents. Based on this survey, we conclude that most of the ex...
A question answering system is an information retrieval system that retrieves relevant short answers
that match the question, instead of retrieving relevant full documents in a standard information
retrieval system. In this study, we use three prototypes uses different resources for answers: MS
Access database in prototype1, text files in prototype...
In the field of information retrieval, it is very difficult to answer the question entered by the user, because the search engine retrieve a ranked documents that contain any key word or phrase inside the documents, this need another extra effort to search the answer inside the documents, and there may be no answer. The alternative of search engine...
The identification of relevant domain terms is a crucial step in numerous natural language processing applications. Term Extraction is a process of obtaining a set of terms that represent the domain of a given text. The majority of Term Extraction research projects conducted for the Qur’an have used translated text instead of the original text of t...
This paper reviews most of search tools constructed for the Holy Quran. Then, this paper evaluated these different search tools against 13 criteria depending on search features, output features, the precision of the retrieved verses, recall database size and types of database contents. Based on this Comparison, most of the Quranic search tools stil...
This paper describes an Arabic dialect identification system which we developed for the Discriminating Similar Languages (DSL) 2016 shared task. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. We experimented wit...
This paper presents the QAEQAS Quranic Arabic/English Question Answering System, which relies on a specialized search dataset corpus, and data redundancy. Our corpus is composed of questions along with their answers. The questions are phrased in many different ways in differing contexts to optimize Question Answering (QA) performance. As a complete...
This paper reviews and classifies most of the common types of search techniques that have been applied on the Holy Quran. Then, it addresses the limitations of these methods. Additionally, this paper surveys most existing Quranic ontologies and what are their deficiencies. Finally, it explains a new search tool called: a semantic search tool for Al...