About
82
Publications
90,129
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,771
Citations
Introduction
Aqil Azmi currently works as a professor at the Department of Computer Science, King Saud University. His research interests include Arabic natural language processing, computational biology, bioinformatics, and using computers for critical analysis of religious texts.
Skills and Expertise
Current institution
Education
September 1991 - December 1998
January 1986 - August 1987
August 1981 - December 1982
Publications
Publications (82)
Document classification is a classical problem in information retrieval, and plays an important role in a variety of applications. Automatic document classification can be defined as content-based assignment of one or more predefined categories to documents. Many algorithms have been proposed and implemented to solve this problem in general, howeve...
The end-case analysis of Arabic sentences is one of the keys to their meaning. This process is called i‘raab, a daunting task for the students. The outcome of the analysis is two-fold: (a) placing a proper diacritical marking on the endcases of individual words, and (b) providing a logical justification. Our objective is to generate a full i‘raab o...
Coronary heart disease (CHD) is a leading cause of death globally, with over 382,000 deaths in the USA alone in 2020. The early detection of CHD is critical in reducing mortality rates. Artificial intelligence (AI) is a constantly evolving field of computer science that employs computational models to extract insights from past data and provide rap...
Since the early 1980s, the legal domain has shown a growing interest in Artificial Intelligence approaches to tackle the increasing number of cases worldwide. TaSbeeb is a deep learning (DL)-based judicial decision support system (JDSS) designed for legal professionals in Saudi courts by retrieving judicial reasoning, Qur'anic verses, and hadiths f...
This study harnesses the linguistic diversity of Arabic dialects to create two expansive corpora from X (formerly Twitter). The Gulf Arabic Corpus (GAC-6) includes around 1.7 million tweets from six Gulf countries—Saudi Arabia, UAE, Qatar, Oman, Kuwait, and Bahrain—capturing a wide range of linguistic variations. The Saudi Dialect Corpus (SDC-5) co...
In the domain of automated story generation, the intricacies of the Arabic language pose distinct challenges. This study introduces a novel methodology that moves away from conventional event-driven narrative frameworks, emphasizing the restructuring of narrative constructs through sophisticated language models. Utilizing mBERT, our approach begins...
In the field of natural language processing, the task of generating story endings (SEG) requires not only a deep understanding of the narrative context but also the ability to formulate coherent conclusions. This study delves into the use of crosslingual transfer learning to address the challenges posed by the scarcity of Arabic data in SEG, propos...
Answering non-factoid questions, especially why-questions, poses a challenge for traditional question answering systems (QASs) that are predominantly designed for fact-based queries. Recent advancements in QASs have incorporated attention and memory mechanisms to better capture the intricate relationship between context and query, resulting in more...
The dataset of 104 Arabic documents and their human-generated summaries at 50% and 30%. This dataset is used in the publication,
AM Al-Numai and AM Azmi, "Arabic text abstractive summarizer using ant colony system", under peer review
High morphological languages are characterized by complex inflections and derivations, which can present challenges for natural language processing tasks such as summarization. Abstractive text summarization aims to generate a summary by understanding the meaning of the text, rather than solely relying on the words used in the original source. Howe...
The subfield of natural language processing (NLP) known as question answering (QA) involves providing answers to questions posed in natural language. Answering “why” questions has long been a challenging task for QA systems, given the complexity of the reasoning involved. In this paper, we propose a deep learning model for answering “why” questions...
The Qur’an is a fourteen centuries old divine book in Arabic language that is read and followed by almost two billion Muslims globally as their sacred religious text. With the rise of Islam, the Arabic language gained popularity and became the lingua franca for large swaths of the old world. Devout Muslims read the Qur’an daily seeking guidance and...
String matching is a classical computer science problem where we search for all the occurrences of a text string of size m, typically called pattern, in a string of size n, where both strings are drawn from the same alphabet. It is an essential task for many applications such as data mining, web search engines, bioinformatics, and natural language...
Abstract
Background
One of the worst pandemics of recent memory, COVID-19, severely impacted the public. In particular, students were physically and mentally affected by the lockdown and the shift from physical person-to-person classrooms to virtual learning (online classes). This increased the prevalence of psychological stress, anxiety, and depre...
Diacritic restoration (also known as diacritization, vowelization) is the process of inserting the correct diacritical markings into a text. Modern Arabic is typically written without diacritics, e.g., newspapers. This lack of diacritical markings often causes ambiguity, and though natives are adept at resolving, there are times they may fail. Diac...
Every day the world produces an enormous amount of textual data. This unstructured text is of little use unless it is labeled using a combination of categories, keywords, tags. Humans can never annotate such massive data, and with a growing divide between the daily produced data and those annotated, the only alternative is to mechanize it. Automati...
With the rapid growth in the number of tweets published daily on Twitter, automated classification of tweets becomes necessary for broad diverse applications (e.g., information retrieval, topic labeling, sentiment analysis, rumors detection) to better understand what these tweets are, and what the users are expressing in this social platform. Text...
Background and Aims
University students are commonly identified as susceptible, suffering from higher anxiety, stress, and depression than the overall population. During the Corona Virus Disease pandemic (COVID), education was shifted to the virtual learning environment. Students' ambiguity regarding academic accomplishment, imminent careers, chang...
Question answering is a subfield of information retrieval. It is a task of answering a question posted in a natural language. A question answering system (QAS) may be considered a good alternative to search engines that return a set of related documents. The QAS system is composed of three main modules; question analysis, passage retrieval, and ans...
We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic c...
Computational generation of stories is a subfield of computational creativity where artificial intelligence and psychology intersect to teach computers how to mimic humans’ creativity. It helps generate many stories with minimum effort and customize the stories for the users’ education and entertainment needs. Although the automatic generation of s...
Traditional information retrieval systems return a ranked list of results to a user’s query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become amb...
Arabic Natural Language Processing for Qur’anic Research: A Systematic Review
Arabic Natural Language Processing for Qur’anic Research: A Systematic Review
Text simplification (TS) reduces the complexity of the text in order to improve its readability and understand-ability, while possibly retaining its original information content. Over time, TS has become an essential tool in helping those with low literacy levels, non-native learners, and those struggling with various types of reading comprehension...
Stance detection is a relatively new concept in data mining that aims to assign a stance label (favor, against, or none) to a social media post towards a specific predetermined target. These targets may not be referred to in the post, and may not be the target of opinion in the post. In this paper, we propose a novel enhanced method for identifying...
This paper describes Faheem (adj. of understand), our submission to NADI (Nuanced Arabic Dialect Identification) shared task. With so many Arabic dialects being understudied due to the scarcity of the resources, the objective is to identify the Arabic dialect used in the tweet, at the country-level. We propose a machine learning approach where we u...
Smartphone-based periocular recognition (SPR) has gained significant attention because of the limitations of face and iris biometric modalities. For this problem, most of the existing methods employ hand-crafted features. On the other hand, deep convolutional neural networks (CNN), which learn features automatically, have shown outstanding performa...
As the number of electronic text documents is increasing so is the need for an automatic text summarizer. The summary can be extractive, compression, or abstractive. In the former, the more important sentences are retained, more or less in their original structure, while the second one involves reducing the length of each sentence. For the latter,...
Hadith is one of the most celebrated resources of Classical Arabic text. The hadiths, or Prophetic traditions (tradition for short), are narrations originating from the sayings and conduct of Prophet Muhammad. For Muslims, hadiths are the second most important source of Islamic jurisprudence after the Holy Qur’an. Each hadith consists of two parts,...
With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being...
Assessing student's essay writing and providing thoughtful feedback is a truly labor-intensive and time-consuming task. With human instructors already overwhelmed, the alternate is to consider a computer-based grading. Recent advances have generated renewed interest in automatic evaluation of essays (AEE). The AEEs instantaneous feedback and more c...
The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for au...
Real-word (also known as semantic or context-sensitive) spelling error is a class of error that escapes the typical spell checker which relies on dictionary look-up. This kind of error occurs when a user types a correctly spelled word–by mistake–when another is intended, e.g., “I want a peace (piece) of cake.” Further, these errors commonly arise i...
The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for au...
Social media opens up numerous possibilities to study human interaction and collective behavior in an unprecedented scale. It opened a whole new venue for research under the name “social computing”. Researchers are interested in profiling individuals (e.g., gender, age group), groups, community, and networking. We are interested in studying the col...
Identification of regulatory elements is essential for understanding the mechanism behind regulating gene expression. These regulatory elements—located in or near gene—bind to proteins called transcription factors to initiate the transcription process. Their occurrences are influenced by the GC-content or nucleotide composition. For generating synt...
The pioneering work of M.M. Al-Azami (1930-2017) in the field of Qur'an and Hadith
In this conference paper, I talked about the contribution of Sheikh M.M. Al-Azami's and how a single incident brought him to the Computational study of hadith and the Holy Qur'an. The conference took place on the first anniversary of his passing away. The lecture is available on youtube, https://youtu.be/s9qlj9MfXZ4, starting at minute 44:00.
Automated summaries help tackle the ever growing volume of information floating around. There are two broad categories: extract and abstract. In the former we retain the more important sentences more or less in their original structure, while the latter requires a fusion of multiple sentences and/or paraphrasing. This is a more challenging task tha...
Just published article. It can be viewed (limited time) by clicking the link
https://rdcu.be/7BIo
Most Arabs can read text written in Modern Standard Arabic (MSA). However, to easily express themselves, they may find it easier to switch to informal (colloquial) Arabic. The web is open for anyone to express him/herself freely, and people are expressing themselves through many social media platforms, such as blogs and forums increasingly in their...
ensuring the reliability of the information. Misinformation spreading has a strong relation especially in the context of breaking news, where the information released gradual, often starting as unverified information. Automatically identifying rumors from online social media especially micro-blogging websites is an important research. Recent resear...
The most effective technique for improving students writing skills is for them to get immediate instructor’s feedback and as often as possible. This, however, significantly expands the workload of the instructors. There is a growing need for automated systems to help students draft essays. Automated essay evaluation is increasingly popular in the f...
Modern Standard Arabic (MSA) is typically written without short vowels, which helps in clarifying the sense and meaning of the word. The short vowels are omitted since experienced Arabic readers can infer the meaning through the context. But there are cases where even the native Arabic speakers cannot resolve. The process of restoring the diacritic...
Question answering systems retrieve information from documents in response to queries. Most of the questions are who- and what -type questions that deal with named entities. A less common and more challenging question to deal with is the why -question. In this paper, we introduce Lemaza (Arabic for why ), a system for automatically answering why -q...
Identification of transcription factor binding sites or biological motifs is an important step in deciphering the mechanisms of gene regulation. It is a classic problem that has eluded a satisfactory and efficient solution. In this paper, we devise a three-phase algorithm to mine for biologically significant motifs. In the first phase, we generate...
A Question Answering (QA) system is concerned with building a system that automatically answer questions posed by humans in a natural language. Compared to other languages, little effort was directed towards QA systems for Arabic. Due to the difficulty of handling why-questions, most Arabic QA systems tend to ignore it. In this article, we specific...
One of the basic tasks in genomic research is the analysis of a sequence. An absent word in a sequence is a substring that does not occur in the given sequence. Many studies looked into finding the shortest absent words, with some recent studies noting that longer absent words are also of interest. A simple extension of the shortest ones is impract...
The fact that people freely express their opinions and ideas in no more than 140 characters makes Twitter one of the most prevalent social networking websites in the world. Being popular in Saudi Arabia, we believe that tweets are a good source to capture the public’s sentiment, especially since the country is in a fractious region. Going over the...
The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread u...
DNA motifs are short recurring patterns which are assumed to have some biological function. Most of the algorithms that solve this problem are computationally prohibitive. In this paper we extend a recent work that discovered identical string motifs. In the first phase of our three phase algorithm we report all the string motifs of all sizes. In th...
Motifs are short recurring patterns that are of much interest as they help us understand the mechanism behind regulating gene expression. This paper presents an efficient deterministic algorithm that exhaustively discovers all identical string motifs of all sizes that appears in all or most of the input strings. The input is a set of n different st...
The Holy Qur'an ordered the Muslims to follow the example of the Prophet Muhammad (PBUH) and so from the very beginning the Companions concerned themselves with following the Sunna (conduct or custom) of the Prophet, which was embodied in Hadiths narrating his words and deeds. The actual text of the Hadith is known as the matn which records the Pro...
Most of the existing Hadith retrieval software do simple keyword search. The same is true for Hadith applications intended for mobile devices. As mobiles are being powered by faster processors, we can develop smarter and powerful Hadith applications for these tiny devices. In this paper we report on Hadith application that we developed for Android...
The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts i...
Studies have shown a correlation between reading comprehension and the visual appearance of the displayed text. One of the factors that affect the visual look of a text is its alignment. The purpose of this paper is to develop and implement a sophisticated algorithm to output a properly justified Arabic text. Most of the tools geared for e-document...
Aara’ is a system for mining opinion polarity through the pool of comments that readers write anonymously at the online edition of Saudi newspapers. We use a nave Bayes classifier with a revised n-gram approach to extract the public opinion polarity, which is expressed in Arabic, classifying it into four categories. For training we manually marked...
A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respec...
In Modern Standard Arabic texts are typically written without diacritical markings. The diacritics are important to clarify the sense and meaning of words. Lack of these markings may lead to ambiguity even for the natives. Often the natives successfully disambiguate the meaning through the context; however, many Arabic applications, such as machine...
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a pri...
Hadiths are narrations originating from the words and deeds of Prophet Muhammad. Each hadith starts with a list of narrators involved in transmitting it. A hadith scholar judges a hadith based on the narration chain along with the individual narrators in the chain. In this chapter, we report on a method that automatically extracts the transmission...
Hadiths are narrations originating from the words and deeds of Prophet Muhammad. Each hadith starts with a list of narrators involved in transmitting it. A hadith scholar judges a hadith based on the narration chain along with the individual narrators in the chain. In this chapter, the authors report on a method that automatically extracts the tran...
For Muslims, Hadiths are the second source of Islamic jurisprudence after the Holy Qur’an. Hadiths are narrations originating from the words and deeds of Prophet Muhammad. There are two main components in each Hadith, the narration chain and the narrative text. A hadith scholar judges a Hadith based on the narration chain and the individuals involv...
The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into...
We present a global optimization algorithm and demonstrate its effectiveness in solving the protein structure prediction problem for a 70 amino-acid helical protein, the A-chain of uteroglobin. This is a larger protein than solved previously by our global optimiza- tion method or most other optimization-based protein structure prediction methods. O...
Automatic text summarization is an active research field. The rapid growth of the Web, and the associated information overloading, has injected new life into this research area. In certain languages there has been plenty of research in automatic text summarization. Arabic is not one of them. In this paper we present an automatic extractive Arabic t...
To help solve difficult global optimization problems such as those arising in molecular chemistry, smoothing the objective
function has been used with some efficacy. In this paper we propose a new approach to smoothing. First, we propose a simple
algebraic way to smooth the Lennard-Jones and the electrostatic energy functions. These two terms are t...
We present a global optimization algorithm and demonstrate its effectiveness in solving the protein structure prediction problem for a 70 amino-acid helical protein, the A-chain of uteroglobin. This is a larger protein than solved previously by our global optimization method or most other optimization-based protein structure prediction methods. Our...
R. Morris (see IEEE Trans. Comput., vol.TC-20, p.1578-9, 1971),
suggested adding an extra field to the fixed floating point system, so
that exponents can be stored more efficiently. The exponents are stored
in the smallest possible space, passing the extra bits to the mantissa.
The extra field is used to monitor the current length of the exponent....
A new computer simulation for a typist is described. This is unique in the sense that it is based on decisions made by the typist during the look-ahead process, takes care of hand overlapping during typing, and handles simultaneously the shifted and unshifted characters. The simulation is applied to several layouts (two of which are suggested by th...
In this paper we report on an inexpensive device that helps a person with physical disability to interact with the Web. The human computer interface is made through a head tracking pointer. The head tracking pointer consists of the Wii controller (Wiimote), a low cost and readily available game controller, and infrared (IR) LED. The Wiimote is used...
Thesis (Ph. D.)--University of Colorado, 1998. Includes bibliographical references (leaves [140]-144).
Thesis (M.S.)--University of colorado, 1987. Includes bibliographical references (leaves [43]-45).