Aqil Azmi

Aqil Azmi
  • PhD
  • Professor (Full) at King Saud University

About

82
Publications
90,129
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,771
Citations
Introduction
Aqil Azmi currently works as a professor at the Department of Computer Science, King Saud University. His research interests include Arabic natural language processing, computational biology, bioinformatics, and using computers for critical analysis of religious texts.
Current institution
King Saud University
Current position
  • Professor (Full)
Education
September 1991 - December 1998
University of Colorado Boulder
Field of study
  • Computer Science
January 1986 - August 1987
University of Colorado Boulder
Field of study
  • Electrical Engineering
August 1981 - December 1982
University of Michigan
Field of study
  • Electrical and Computer Engineering

Publications

Publications (82)
Article
Full-text available
Document classification is a classical problem in information retrieval, and plays an important role in a variety of applications. Automatic document classification can be defined as content-based assignment of one or more predefined categories to documents. Many algorithms have been proposed and implemented to solve this problem in general, howeve...
Article
Full-text available
The end-case analysis of Arabic sentences is one of the keys to their meaning. This process is called i‘raab, a daunting task for the students. The outcome of the analysis is two-fold: (a) placing a proper diacritical marking on the endcases of individual words, and (b) providing a logical justification. Our objective is to generate a full i‘raab o...
Article
Full-text available
Coronary heart disease (CHD) is a leading cause of death globally, with over 382,000 deaths in the USA alone in 2020. The early detection of CHD is critical in reducing mortality rates. Artificial intelligence (AI) is a constantly evolving field of computer science that employs computational models to extract insights from past data and provide rap...
Article
Full-text available
Since the early 1980s, the legal domain has shown a growing interest in Artificial Intelligence approaches to tackle the increasing number of cases worldwide. TaSbeeb is a deep learning (DL)-based judicial decision support system (JDSS) designed for legal professionals in Saudi courts by retrieving judicial reasoning, Qur'anic verses, and hadiths f...
Article
Full-text available
This study harnesses the linguistic diversity of Arabic dialects to create two expansive corpora from X (formerly Twitter). The Gulf Arabic Corpus (GAC-6) includes around 1.7 million tweets from six Gulf countries—Saudi Arabia, UAE, Qatar, Oman, Kuwait, and Bahrain—capturing a wide range of linguistic variations. The Saudi Dialect Corpus (SDC-5) co...
Article
Full-text available
In the domain of automated story generation, the intricacies of the Arabic language pose distinct challenges. This study introduces a novel methodology that moves away from conventional event-driven narrative frameworks, emphasizing the restructuring of narrative constructs through sophisticated language models. Utilizing mBERT, our approach begins...
Article
Full-text available
In the field of natural language processing, the task of generating story endings (SEG) requires not only a deep understanding of the narrative context but also the ability to formulate coherent conclusions. This study delves into the use of crosslingual transfer learning to address the challenges posed by the scarcity of Arabic data in SEG, propos...
Article
Full-text available
Answering non-factoid questions, especially why-questions, poses a challenge for traditional question answering systems (QASs) that are predominantly designed for fact-based queries. Recent advancements in QASs have incorporated attention and memory mechanisms to better capture the intricate relationship between context and query, resulting in more...
Data
The dataset of 104 Arabic documents and their human-generated summaries at 50% and 30%. This dataset is used in the publication, AM Al-Numai and AM Azmi, "Arabic text abstractive summarizer using ant colony system", under peer review
Article
Full-text available
High morphological languages are characterized by complex inflections and derivations, which can present challenges for natural language processing tasks such as summarization. Abstractive text summarization aims to generate a summary by understanding the meaning of the text, rather than solely relying on the words used in the original source. Howe...
Article
Full-text available
The subfield of natural language processing (NLP) known as question answering (QA) involves providing answers to questions posed in natural language. Answering “why” questions has long been a challenging task for QA systems, given the complexity of the reasoning involved. In this paper, we propose a deep learning model for answering “why” questions...
Article
Full-text available
The Qur’an is a fourteen centuries old divine book in Arabic language that is read and followed by almost two billion Muslims globally as their sacred religious text. With the rise of Islam, the Arabic language gained popularity and became the lingua franca for large swaths of the old world. Devout Muslims read the Qur’an daily seeking guidance and...
Article
Full-text available
String matching is a classical computer science problem where we search for all the occurrences of a text string of size m, typically called pattern, in a string of size n, where both strings are drawn from the same alphabet. It is an essential task for many applications such as data mining, web search engines, bioinformatics, and natural language...
Article
Full-text available
Abstract Background One of the worst pandemics of recent memory, COVID-19, severely impacted the public. In particular, students were physically and mentally affected by the lockdown and the shift from physical person-to-person classrooms to virtual learning (online classes). This increased the prevalence of psychological stress, anxiety, and depre...
Article
Full-text available
Diacritic restoration (also known as diacritization, vowelization) is the process of inserting the correct diacritical markings into a text. Modern Arabic is typically written without diacritics, e.g., newspapers. This lack of diacritical markings often causes ambiguity, and though natives are adept at resolving, there are times they may fail. Diac...
Article
Full-text available
Every day the world produces an enormous amount of textual data. This unstructured text is of little use unless it is labeled using a combination of categories, keywords, tags. Humans can never annotate such massive data, and with a growing divide between the daily produced data and those annotated, the only alternative is to mechanize it. Automati...
Article
Full-text available
With the rapid growth in the number of tweets published daily on Twitter, automated classification of tweets becomes necessary for broad diverse applications (e.g., information retrieval, topic labeling, sentiment analysis, rumors detection) to better understand what these tweets are, and what the users are expressing in this social platform. Text...
Article
Full-text available
Background and Aims University students are commonly identified as susceptible, suffering from higher anxiety, stress, and depression than the overall population. During the Corona Virus Disease pandemic (COVID), education was shifted to the virtual learning environment. Students' ambiguity regarding academic accomplishment, imminent careers, chang...
Article
Full-text available
Question answering is a subfield of information retrieval. It is a task of answering a question posted in a natural language. A question answering system (QAS) may be considered a good alternative to search engines that return a set of related documents. The QAS system is composed of three main modules; question analysis, passage retrieval, and ans...
Article
Full-text available
We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic c...
Article
Full-text available
Computational generation of stories is a subfield of computational creativity where artificial intelligence and psychology intersect to teach computers how to mimic humans’ creativity. It helps generate many stories with minimum effort and customize the stories for the users’ education and entertainment needs. Although the automatic generation of s...
Article
Full-text available
Traditional information retrieval systems return a ranked list of results to a user’s query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become amb...
Preprint
Full-text available
Arabic Natural Language Processing for Qur’anic Research: A Systematic Review
Preprint
Full-text available
Arabic Natural Language Processing for Qur’anic Research: A Systematic Review
Article
Full-text available
Text simplification (TS) reduces the complexity of the text in order to improve its readability and understand-ability, while possibly retaining its original information content. Over time, TS has become an essential tool in helping those with low literacy levels, non-native learners, and those struggling with various types of reading comprehension...
Article
Full-text available
Stance detection is a relatively new concept in data mining that aims to assign a stance label (favor, against, or none) to a social media post towards a specific predetermined target. These targets may not be referred to in the post, and may not be the target of opinion in the post. In this paper, we propose a novel enhanced method for identifying...
Preprint
Full-text available
This paper describes Faheem (adj. of understand), our submission to NADI (Nuanced Arabic Dialect Identification) shared task. With so many Arabic dialects being understudied due to the scarcity of the resources, the objective is to identify the Arabic dialect used in the tweet, at the country-level. We propose a machine learning approach where we u...
Article
Smartphone-based periocular recognition (SPR) has gained significant attention because of the limitations of face and iris biometric modalities. For this problem, most of the existing methods employ hand-crafted features. On the other hand, deep convolutional neural networks (CNN), which learn features automatically, have shown outstanding performa...
Chapter
As the number of electronic text documents is increasing so is the need for an automatic text summarizer. The summary can be extractive, compression, or abstractive. In the former, the more important sentences are retained, more or less in their original structure, while the second one involves reducing the length of each sentence. For the latter,...
Article
Full-text available
Hadith is one of the most celebrated resources of Classical Arabic text. The hadiths, or Prophetic traditions (tradition for short), are narrations originating from the sayings and conduct of Prophet Muhammad. For Muslims, hadiths are the second most important source of Islamic jurisprudence after the Holy Qur’an. Each hadith consists of two parts,...
Article
Full-text available
With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being...
Article
Full-text available
Assessing student's essay writing and providing thoughtful feedback is a truly labor-intensive and time-consuming task. With human instructors already overwhelmed, the alternate is to consider a computer-based grading. Recent advances have generated renewed interest in automatic evaluation of essays (AEE). The AEEs instantaneous feedback and more c...
Article
Full-text available
The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for au...
Article
Full-text available
Real-word (also known as semantic or context-sensitive) spelling error is a class of error that escapes the typical spell checker which relies on dictionary look-up. This kind of error occurs when a user types a correctly spelled word–by mistake–when another is intended, e.g., “I want a peace (piece) of cake.” Further, these errors commonly arise i...
Preprint
Full-text available
The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for au...
Article
Full-text available
Social media opens up numerous possibilities to study human interaction and collective behavior in an unprecedented scale. It opened a whole new venue for research under the name “social computing”. Researchers are interested in profiling individuals (e.g., gender, age group), groups, community, and networking. We are interested in studying the col...
Article
Full-text available
Identification of regulatory elements is essential for understanding the mechanism behind regulating gene expression. These regulatory elements—located in or near gene—bind to proteins called transcription factors to initiate the transcription process. Their occurrences are influenced by the GC-content or nucleotide composition. For generating synt...
Conference Paper
Full-text available
The pioneering work of M.M. Al-Azami (1930-2017) in the field of Qur'an and Hadith
Conference Paper
Full-text available
In this conference paper, I talked about the contribution of Sheikh M.M. Al-Azami's and how a single incident brought him to the Computational study of hadith and the Holy Qur'an. The conference took place on the first anniversary of his passing away. The lecture is available on youtube, https://youtu.be/s9qlj9MfXZ4, starting at minute 44:00.
Article
Full-text available
Automated summaries help tackle the ever growing volume of information floating around. There are two broad categories: extract and abstract. In the former we retain the more important sentences more or less in their original structure, while the latter requires a fusion of multiple sentences and/or paraphrasing. This is a more challenging task tha...
Article
Full-text available
Just published article. It can be viewed (limited time) by clicking the link https://rdcu.be/7BIo
Article
Full-text available
Most Arabs can read text written in Modern Standard Arabic (MSA). However, to easily express themselves, they may find it easier to switch to informal (colloquial) Arabic. The web is open for anyone to express him/herself freely, and people are expressing themselves through many social media platforms, such as blogs and forums increasingly in their...
Article
Full-text available
ensuring the reliability of the information. Misinformation spreading has a strong relation especially in the context of breaking news, where the information released gradual, often starting as unverified information. Automatically identifying rumors from online social media especially micro-blogging websites is an important research. Recent resear...
Article
Full-text available
The most effective technique for improving students writing skills is for them to get immediate instructor’s feedback and as often as possible. This, however, significantly expands the workload of the instructors. There is a growing need for automated systems to help students draft essays. Automated essay evaluation is increasingly popular in the f...
Article
Full-text available
Modern Standard Arabic (MSA) is typically written without short vowels, which helps in clarifying the sense and meaning of the word. The short vowels are omitted since experienced Arabic readers can infer the meaning through the context. But there are cases where even the native Arabic speakers cannot resolve. The process of restoring the diacritic...
Article
Full-text available
Question answering systems retrieve information from documents in response to queries. Most of the questions are who- and what -type questions that deal with named entities. A less common and more challenging question to deal with is the why -question. In this paper, we introduce Lemaza (Arabic for why ), a system for automatically answering why -q...
Article
Identification of transcription factor binding sites or biological motifs is an important step in deciphering the mechanisms of gene regulation. It is a classic problem that has eluded a satisfactory and efficient solution. In this paper, we devise a three-phase algorithm to mine for biologically significant motifs. In the first phase, we generate...
Article
Full-text available
A Question Answering (QA) system is concerned with building a system that automatically answer questions posed by humans in a natural language. Compared to other languages, little effort was directed towards QA systems for Arabic. Due to the difficulty of handling why-questions, most Arabic QA systems tend to ignore it. In this article, we specific...
Article
Full-text available
One of the basic tasks in genomic research is the analysis of a sequence. An absent word in a sequence is a substring that does not occur in the given sequence. Many studies looked into finding the shortest absent words, with some recent studies noting that longer absent words are also of interest. A simple extension of the shortest ones is impract...
Article
Full-text available
The fact that people freely express their opinions and ideas in no more than 140 characters makes Twitter one of the most prevalent social networking websites in the world. Being popular in Saudi Arabia, we believe that tweets are a good source to capture the public’s sentiment, especially since the country is in a fractious region. Going over the...
Article
Full-text available
The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread u...
Conference Paper
Full-text available
DNA motifs are short recurring patterns which are assumed to have some biological function. Most of the algorithms that solve this problem are computationally prohibitive. In this paper we extend a recent work that discovered identical string motifs. In the first phase of our three phase algorithm we report all the string motifs of all sizes. In th...
Article
Full-text available
Motifs are short recurring patterns that are of much interest as they help us understand the mechanism behind regulating gene expression. This paper presents an efficient deterministic algorithm that exhaustively discovers all identical string motifs of all sizes that appears in all or most of the input strings. The input is a set of n different st...
Conference Paper
Full-text available
The Holy Qur'an ordered the Muslims to follow the example of the Prophet Muhammad (PBUH) and so from the very beginning the Companions concerned themselves with following the Sunna (conduct or custom) of the Prophet, which was embodied in Hadiths narrating his words and deeds. The actual text of the Hadith is known as the matn which records the Pro...
Conference Paper
Full-text available
Most of the existing Hadith retrieval software do simple keyword search. The same is true for Hadith applications intended for mobile devices. As mobiles are being powered by faster processors, we can develop smarter and powerful Hadith applications for these tiny devices. In this paper we report on Hadith application that we developed for Android...
Article
The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts i...
Article
Full-text available
Studies have shown a correlation between reading comprehension and the visual appearance of the displayed text. One of the factors that affect the visual look of a text is its alignment. The purpose of this paper is to develop and implement a sophisticated algorithm to output a properly justified Arabic text. Most of the tools geared for e-document...
Article
Full-text available
Aara’ is a system for mining opinion polarity through the pool of comments that readers write anonymously at the online edition of Saudi newspapers. We use a nave Bayes classifier with a revised n-gram approach to extract the public opinion polarity, which is expressed in Arabic, classifying it into four categories. For training we manually marked...
Article
Full-text available
A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respec...
Article
Full-text available
In Modern Standard Arabic texts are typically written without diacritical markings. The diacritics are important to clarify the sense and meaning of words. Lack of these markings may lead to ambiguity even for the natives. Often the natives successfully disambiguate the meaning through the context; however, many Arabic applications, such as machine...
Article
Full-text available
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a pri...
Article
Full-text available
Hadiths are narrations originating from the words and deeds of Prophet Muhammad. Each hadith starts with a list of narrators involved in transmitting it. A hadith scholar judges a hadith based on the narration chain along with the individual narrators in the chain. In this chapter, we report on a method that automatically extracts the transmission...
Article
Hadiths are narrations originating from the words and deeds of Prophet Muhammad. Each hadith starts with a list of narrators involved in transmitting it. A hadith scholar judges a hadith based on the narration chain along with the individual narrators in the chain. In this chapter, the authors report on a method that automatically extracts the tran...
Article
Full-text available
For Muslims, Hadiths are the second source of Islamic jurisprudence after the Holy Qur’an. Hadiths are narrations originating from the words and deeds of Prophet Muhammad. There are two main components in each Hadith, the narration chain and the narrative text. A hadith scholar judges a Hadith based on the narration chain and the individuals involv...
Conference Paper
Full-text available
The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into...
Article
Full-text available
We present a global optimization algorithm and demonstrate its effectiveness in solving the protein structure prediction problem for a 70 amino-acid helical protein, the A-chain of uteroglobin. This is a larger protein than solved previously by our global optimiza- tion method or most other optimization-based protein structure prediction methods. O...
Article
Full-text available
Automatic text summarization is an active research field. The rapid growth of the Web, and the associated information overloading, has injected new life into this research area. In certain languages there has been plenty of research in automatic text summarization. Arabic is not one of them. In this paper we present an automatic extractive Arabic t...
Chapter
Full-text available
To help solve difficult global optimization problems such as those arising in molecular chemistry, smoothing the objective function has been used with some efficacy. In this paper we propose a new approach to smoothing. First, we propose a simple algebraic way to smooth the Lennard-Jones and the electrostatic energy functions. These two terms are t...
Article
Full-text available
We present a global optimization algorithm and demonstrate its effectiveness in solving the protein structure prediction problem for a 70 amino-acid helical protein, the A-chain of uteroglobin. This is a larger protein than solved previously by our global optimization method or most other optimization-based protein structure prediction methods. Our...
Conference Paper
Full-text available
R. Morris (see IEEE Trans. Comput., vol.TC-20, p.1578-9, 1971), suggested adding an extra field to the fixed floating point system, so that exponents can be stored more efficiently. The exponents are stored in the smallest possible space, passing the extra bits to the mantissa. The extra field is used to monitor the current length of the exponent....
Article
A new computer simulation for a typist is described. This is unique in the sense that it is based on decisions made by the typist during the look-ahead process, takes care of hand overlapping during typing, and handles simultaneously the shifted and unshifted characters. The simulation is applied to several layouts (two of which are suggested by th...
Article
Full-text available
In this paper we report on an inexpensive device that helps a person with physical disability to interact with the Web. The human computer interface is made through a head tracking pointer. The head tracking pointer consists of the Wii controller (Wiimote), a low cost and readily available game controller, and infrared (IR) LED. The Wiimote is used...
Article
Full-text available
Thesis (Ph. D.)--University of Colorado, 1998. Includes bibliographical references (leaves [140]-144).
Article
Full-text available
Thesis (M.S.)--University of colorado, 1987. Includes bibliographical references (leaves [43]-45).

Network

Cited By