Table 19 - uploaded by Karim Ouda
Content may be subject to copyright.
Similar publications
Semantic text similarity (STS), which measures the semantic similarity of sentences, is an important task in the field of NLP. It has a wide range of applications, such as machine translation (MT), semantic search, and summarization. In recent years, with the development of deep neural networks, the existing semantic similarity measurement has made...
Videodaten sind auf dem besten Wege zur bedeutendsten Informationsquelle
im World Wide Web zu werden. Bereits heute werden pro Minute mehr als 100
Stunden Videomaterial1 von den Benutzern auf Videoplattformen wie YouTube
eingestellt. Bei dieser gewaltigen Menge an unstrukturierten multimedialen Daten
wird auch die gezielte Informationssuche immer s...
This work proposes a personality-based recommender system to implement semantic searches on Internet Vehicles Sales Portals. The system is based on a typical recommender system architecture that has been extended to combine a hybrid recommendation approach with a machine learning classifier technique (k-NN). It proposes a combination of the Five Fa...
The lack of formal technical knowledge has been identified as one of the constraints to research and development on a group of crops collectively referred to as underutilized crops. Some information about these crops is available in informal sources on the web, for example on Wikipedia. However, this knowledge is not entirely authoritative, it may...
Citations
... Karim [9] embarked on creating the first Semantic Search and Intelligence System for the Quran, enabling advanced semantic searches, comprehensive Quranic text analysis, and effective data visualization for users and scholars. This ambitious project also aimed to consolidate prior research efforts from Leeds University and establish an open-source framework for Quranic analysis, fostering innovation in this field. ...
Semantic search is the process of retrieving relevant information from a large corpus of texts based on the meaning and context of the query. This paper is introduced in order to explore the use of large language models for semantic search of Quranic texts. The Quran, which is the central religious text of Islam, contains rich and complex linguistic and semantic features that pose challenges for traditional keyword-based search methods. This study investigates a semantic search approach utilizing. Large Language Models (LLM) embedding and assess the performance of LLM embedding in comparison to a baseline embedding-based search method using a set of queries that represent different semantic search levels. In addition, this study will also discuss the limitations and implications of using large language models for semantic search of Quranic texts and suggest directions for future research. A significant finding in this study is the consistent effectiveness of the LLM embedding across varying semantic complexities. This suggests that embedding using LLMs can capture deep semantic connections effectively. On the other hand, as a second finding, the state-of-the-art transformer, AraT5, outperforms LLM embeddings in low-level semantic searches, indicating potential for further LLM fine-tuning on Arabic text corpora
... Karim [9] embarked on creating the first Semantic Search and Intelligence System for the Quran, enabling advanced semantic searches, comprehensive Quranic text analysis, and effective data visualization for users and scholars. This ambitious project also aimed to consolidate prior research efforts from Leeds University and establish an open-source framework for Quranic analysis, fostering innovation in this field. ...
Semantic search is the process of retrieving relevant information from a large corpus of texts based on the meaning and context of the query. This paper is introduced in order to explore the use of large language models for semantic search of Quranic texts. The Quran, which is the central religious text of Islam, contains rich and complex linguistic and semantic features that pose challenges for traditional keyword-based search methods. This study investigates a semantic search approach utilizing. Large Language Models (LLM) embedding and assess the performance of LLM embedding in comparison to a baseline embedding-based search method using a set of queries that represent different semantic search levels. In addition, this study will also discuss the limitations and implications of using large language models for semantic search of Quranic texts and suggest directions for future research. A significant finding in this study is the consistent effectiveness of the LLM embedding across varying semantic complexities. This suggests that embedding using LLMs can capture deep semantic connections effectively. On the other hand, as a second finding, the state-of-the-art transformer, AraT5, outperforms LLM embeddings in low-level semantic searches, indicating potential for further LLM fine-tuning on Arabic text corpora.
... Various Muslim researchers, trying to put their contributions in this field, have proposed semantic search tools, built upon Quran ontology. But most of them just perform information retrieval e.g., Quran ontology proposed by [21] is being used in our research. ...
... As implied in literature review, there exist no ontological modeling in Urdu language Quran yet. Thus, to have a baseline system for our question answering system (SQASUQ), an ontology (QuranAnalysis) created by [21] is being translated in ...
In this digital epoch, essentially no consideration is given to the Urdu language in terms of Quran semantics. No comprehensive semantically structured knowledge base currently exists for the Urdu translation of the Holy Quran. Existing methods primarily rely on syntax-based searches and a collection of documents are retrieved, which may not fully meet the needs of the inquirer’s quest. To fill this gap, this research project localizes and extends a Quran ontology and proposes a framework for the semantic question answering system based on the Urdu language. Query categorization enables this system to deal with different factoid questions of following types: where (کہاں), who (کون), whose (کے ،کس کی، کس کا، کس), how many (کتنا ،کتنی، کتنے), to whom (کو، کس), when (کب), what (کیا). In comparison to existing Urdu information retrieval systems, our proposed system demonstrates gives precision of 0.6, recall of 0.25 and F1 score of 0.441 indicating its ability to generate answers that are highly relevant to the input queries. The development of this ontology-based question answering system significantly contributes to bridge the gap in Urdu language resources, particularly in the context of religious texts such as the Holy Quran. This system harmonizes innovation with wisdom, establishing its prowess as a tool bridging modernity and tradition. As we transform seekers’ engagement with the sacred text, we invite all on a journey that transcends syntax and limitations.
... The questions were made available but without their answers. Hamdelsayed and Atwell [58], Shmeisani, Tartir, Al-Na'ssaan, et al. [123], Ouda [102], and Hamoud and Atwell [61] also adopted a similar evaluation approach. This overview implies that evaluation of Arabic QA research based on Qur'an experts' judgement of systems' returned answers does not warrant fair performance comparisons due to the use of different sets of questions. ...
... In this section, we review existing keyword-based [61], and semantic-based [6], [55], [102], [123] Arabic QA systems on the Holy Qur'an. We conclude this section with some perceptions towards semantic ontology-based approaches, in addition to some prospects towards enhancing our proposed QA system. ...
... Ouda [102] developed a multi-purpose system (QuranAnalysis) which includes a question answering module that accepts a question in Arabic or English (the case of English question is not covered in this review). QuranAnalysis also adopts a semantic ontology-based approach. ...
In this dissertation, we address the need for an intelligent machine reading at scale (MRS) Question Answering (QA) system on the Holy Qur’an, given the permanent interest of inquisitors and knowledge seekers in this sacred and fertile knowledge resource. We adopt a pipelined Retriever-Reader architecture for our system to constitute (to the best of our knowledge) the first extractive MRS QA system on the Holy Qur’an. We also construct QRCD as the first extractive Qur’anic Reading Comprehension Dataset, composed of 1,337 question-passage-answer triplets for 1,093 question-passage pairs that comprise single-answer and multi-answer questions in modern standard Arabic (MSA). We then develop a sparse bag-of-words passage retriever over an index of Qur’anic passages expanded with Qur’an-related MSA resources to help in bridging the gap between questions posed in MSA and their answers in Qur’anic Classical Arabic (CA). Next, we introduce CLassical AraBERT (CL-AraBERT for short), a new AraBERT-based pre-trained model that is further pre-trained on about 1.05B-word Classical Arabic dataset (after being initially pre-trained on MSA datasets), to make it a better fit for NLP tasks on CA text such as the Holy Qur’an. We leverage cross-lingual transfer learning from MSA to CA, and fine-tune CL-AraBERT as a reader using a couple of MSA-based MRC datasets followed by fine-tuning it on our QRCD dataset, to bridge the above MSA-to-CA gap, and circumvent the lack of MRC datasets in CA. Finally, we integrate the retriever and reader components of the end-to-end QA system such that the top k retrieved answer-bearing passages to a given question are fed to the fine-tuned CL-AraBERT reader for answer extraction. We first evaluate the retriever and the reader components independently, before evaluating the end-to-end QA system using Partial Average Precision (pAP). We introduce pAP as an adapted version of the traditional rank-based Average Precision measure, which integrates partial matching in the evaluation over multi-answer and single-answer questions. Our experiments show that a passage retriever over a BM25 index of Qur’anic passages expanded with two MSA resources significantly outperformed a baseline retriever over an index of Qur’anic passages only. Moreover, we empirically show that the fine-tuned CL-AraBERT reader model significantly outperformed the similarly finetuned AraBERT model, which is the baseline. In general, the CL-AraBERT reader performed better on single-answer questions in comparison to multi-answer questions. Moreover, it has also outperformed the baseline over both types of questions. Furthermore, despite the integral contribution of fine-tuning with the MSA datasets in enhancing the performance of the readers, relying exclusively on those datasets (without MRC datasets in CA, e.g., QRCD) may not be sufficient for our reader models. This finding demonstrates the relatively high impact of the QRCD dataset (despite its modest size). As for the QA system, it consistently performed better on single-answer questions in comparison to multi-answer questions. However, our experiments provide enough evidence to suggest that a native BERT-based model architecture fine-tuned on the MRC task may not be intrinsically optimal for multi-answer questions.
... The questions were made available but without their answers. Hamdelsayed and Atwell [10], Shmeisani et al. [24], Ouda [19], and Hamoud and Atwell [13] also adopted a similar evaluation approach, although Hamoud and Atwell could have used part of their developed QA database for testing. This overview implies that evaluation of Arabic QA research based on Qur'an experts' judgment of systems' returned answers does not warrant fair performance comparisons due to the use of different sets of questions. ...
The absence of publicly available reusable test collections for Arabic question answering on the Holy Qur’an has impeded the possibility of fairly comparing the performance of systems in that domain. In this article, we introduce AyaTEC , a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task. AyaTEC includes 207 questions (with their corresponding 1,762 answers) covering 11 topic categories of the Holy Qur’an that target the information needs of both curious and skeptical users. To the best of our effort, the answers to the questions (each represented as a sequence of verses) in AyaTEC were exhaustive—that is, all qur’anic verses that directly answered the questions were exhaustively extracted and annotated. To facilitate the use of AyaTEC in evaluating the systems designed for that task, we propose several evaluation measures to support the different types of questions and the nature of verse-based answers while integrating the concept of partial matching of answers in the evaluation.
... Ta'a et al. (2017) study the relation of Quran and information technology, in terms of searching for classification of al Quran. Ouda (2015) also finds the same related issue as to who build "Intelligence System" also "Semantic Search" for the Quran. ...
... For example, in [3] contrasted their proposed Azhary, an Arabic Language lexical ontology against Arabic WordNet (AWN) in terms of words semantic meanings and relations. Meanwhile, in [25] compared his QuranAnalysis (QA) with the other twelve ontologies according to the nine list criteria proposed in [37]. Application-based is measured by the functionality of the ontology onto an actual software program or a use-case scenario (application). ...
... Both [7,33] implemented their ontologies using Protégé to represent the knowledge and manage the class hierarchy and relationships. In [25] integrated his QA Ontology into QA website to add more functionalities and smartness, and in [30] used PROMPT system to compare the results obtained. Apart from that, there is also a user-based evaluation where human experts assess on how well the ontology meets a set of predefined criteria, standards and requirements. ...
The Holy Quran ontology models are gaining popularity among researchers due to people's demands in understanding this divine book. Due to this, there are many studies and research have been conducted in this area to facilitate people's understanding of the Quran. The Quran knowledge is represented conforming to an ontology within a system framework. This also includes various concepts that are interrelated with the others. From the literature, however, the existing Quranic ontology models do not cover all concepts in the Quran, which limit them to domains such as place nouns, themes, pronouns, antonyms and Islamic knowledge in the Quran. Thus, this research aims to identify relevant research works from various electronic data sources using systematic literature review (SLR) method to provide a comprehensive review of this area. This paper presents a systematic review of the literature related to the existing ontology models, where it leads to disseminating the correct knowledge of the Quran using semantic technologies.
... This dataset includes 244 scientific articles, out of which 144 are for training and 100 are for test. (2) Quran English translation by Yousaf Ali [50,51]. (3) 500N-KPCrowd [52] dataset that is composed of news stories. ...
Automatic key concept extraction from text is the main challenging task in information
extraction, information retrieval and digital libraries, ontology learning, and text analysis.
The statistical frequency and topical graph-based ranking are the two kinds of potentially powerful and leading unsupervised approaches in this area, devised to address the problem. To utilize the potential of these approaches and improve key concept identification, a comprehensive performance analysis of these approaches on datasets from different domains is needed. The objective of the study presented in this paper is to perform a comprehensive empirical analysis of selected frequency and topical graph-based algorithms for key concept extraction on three different datasets, to identify
the major sources of error in these approaches. For experimental analysis, we have selected TF-IDF, KP-Miner and TopicRank. Three major sources of error, i.e., frequency errors, syntactical errors and semantical errors, and the factors that contribute to these errors are identified. Analysis of the results reveals that performance of the selected approaches is significantly degraded by these errors. These findings can help us develop an intelligent solution for key concept extraction in the future.
The purpose of this research is to build an ontology of the masters appearing in the Babylonian Talmud (BT). The ontology built so far has been shared as a Linked Open Data and it will be linked to existing vocabularies. This work has been developed in the context of the Babylonian Talmud Translation Project, where more than eighty Talmudists are working together, since 2012, at the translation (comprehensive of explicative notes and glossaries) of the Talmud into Italian. The construction of the resource has involved the application of tools leveraging on computational linguistics approaches. The ontology, already describing more than 500 masters, constitutes the first portion of a more comprehensive Talmudic Knowledge Base where the text itself, the terminology, the entities, and the concepts constituting the BT will be formalized and linked to each other.
This research attempts to build a new model of semantic analysis to the Qur’anic text, which is called encyclopedic semantics. This research contributes to the improvement of Qur’anic semantic analysis of Izutsu and as an alternative model. This article employs a qualitative method and research and development (R & D) with analytical descriptive to the data gathered. Analysis of the data consist of several steps, namely analyzing Izutsu’s model of Qur’anic semantics, identifying and verifying several limitations on Izutsu’s model, building a new design of Qur’anic semantics to perfection, and lastly, demonstrating the new model in the semantics applied to the Qur’anic text. This research shows that based on the laxity found in Izutsu’s model, this research creates a new model of semantics analysis to the Qur’an called encyclopedic semantics as an alternative model to the existing model from Izutsu. The semantics model of encyclopedic perfects the Izutsu model with significant differences. The semantics model of encyclopedic aims at rendering the meanings of the Qur’an from global to particular. In contrast, the Izutsu model is to achieve meanings from particular to global. Besides, encyclopedic model of semantics is part of mawdlu’iy (thematic) method of interpreting the Qur’an. This model gains its legitimation from Islamic tradition. The sample of this model is shown in the application of the Qur’anic word maṭar.