About
76
Publications
47,498
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
828
Citations
Citations since 2017
Introduction
Additional affiliations
March 2016 - May 2016
Publications
Publications (76)
Automatic aggregation of similar words into semantically related groups (or clusters) is of interest to many natural language processing (NLP) applications. Extracting semantically related words and quasi-synonyms from text is a relatively new research area for the under-resourced Arabic language. Previous attempts addressed the problem of single-w...
This article reports on designing and implementing a multiclass sentiment classification approach to handle the imbalanced class distribution of Arabic documents. The proposed approach, sentiment classification of Arabic documents (SCArD), combines the advantages of a clustering-based undersampling (CBUS) method and an ensemble learning model to ai...
Among the different types of cancer, breast cancer is the most common cancer affecting females in Jordan. Recurrent breast cancer after treatment is a significant concern for patients and oncologists. Developing countries like Jordan suffer from a lack of quality data on computational medicine (CM). This paper discusses the design, construction, an...
Cyberbullying (CB) is classified as one of the severe misconducts on social media. Many CB detection systems have been developed for many natural languages to face this phenomenon. However, Arabic is one of the under-resourced languages suffering from the lack of quality datasets in many computational research areas. This paper discusses the design...
Click fraud is a serious problem facing online advertising business. The malicious intent of clicking online ads either committed by humans or by non-humans, forced financial losses on advertisers utilizing pay-per-click advertising. Non-human traffic is usually designed to inflate web traffic for fraudulent purposes. In this paper, we demonstrate...
Social networks facilitate communication between people from all over the world. Unfortunately, the excessive use of social networks leads to the rise of antisocial behaviors such as the spread of online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive\offensive and hate detection become a crucial part of cyberharass...
Being the main spiritual source and reference for Muslims, The Holy Qur’ān can be recited in ten recitations (Qiraat). Each recitation (Qiraah) possesses certain features and characteristics that can be discriminated using Tajweed rules, which can best be defined as the elocution rules for reciting The Holy Qur’ān. This paper describes our efforts...
Abstract Advancement in information technology has resulted in massive textual material that is open to appropriation. Due to researchers’ misconduct, a plethora of plagiarism detection (PD) systems have been developed. However, most PD systems on the market do not support the Arabic language. In this paper, we discuss the design and construction o...
the problem of Aspect Based Sentiment Analysis (ABSA) has been studied very well in English language, and there are many approaches have been suggested for ABSA. However, there are other languages did not find enough attention in this field. One of these languages: Arabic language that covers a large area of the world. ABSA contains several tasks....
Arabic text recognition is a challenging task because of the cursive nature of Arabic writing system, its joint writing scheme, the large number of ligatures and many other challenges. Deep Learning DL models achieved significant progress in numerous domains including computer vision and sequence modelling. This paper presents a model that can reco...
Arabic text recognition is a challenging task because of the cursive nature of Arabic writing system, its joint writing scheme, the large number of ligatures and many other challenges. Deep Learning (DL) models achieved significant progress in numerous domains including computer vision and sequence modelling. This paper presents a model that can re...
Conventional textual documents clustering algorithms suffer from several shortcomings, such as the slow convergence of the immense high-dimensional data, the sensitivity to the initial value, and the understandability of the description of the resulted clusters. Although many clustering algorithms have been developed for English and other languages...
Nowadays, cyber hate speech is increasingly growing, which forms a serious problem worldwide by threatening the cohesion of civil societies. Hate speech relates to using expressions or phrases that are violent, offensive or insulting for a person or a minority of people. In particular, in the Arab region, the number of Arab social media users is gr...
Social media has played a significant role in marketing and advertising. Monitoring the attitude of customers and analyzing their written sentiments to evaluate their opinions toward a particular product, topic or situation becomes essential to improve the product quality and customer services. Due to the importance of sentiment analysis (SA), a pl...
An advertisement (ad) click fraud occurs when a user or a bot clicks on an ad with a malicious intent where advertisers need to pay for those fake clicks. Click-fraud is a serious problem for the online advertising industry. Our study demonstrates a hybrid approach using a two-level fingerprint to detect the illegitimate bots targeting ad click fra...
Event detection is essential for decision makers
to understand the events surrounding their real world. Social
media microblogging platforms play a significant role in our
life. One of these platforms is Twitter, which has an extreme
high exchange rate and accordingly has become a valuable and
relevant source for many political and social events. E...
Drug-drug interactions are generally harmful. This is usually manifested when the patient suffers from more than one disease for which drugs are prescribed and/or more than one drug is needed to be prescribed. The problem is made worse by the wide range of available drugs and the complexity which characterizes the variety of possible interactions o...
Thalassemia is considered one of the most common genetic blood disorders that has received excessive attention in the medical research fields worldwide. Under this context, one of the greatest challenges for healthcare professionals is to correctly differentiate normal individuals from asymptomatic thalassemia carriers. Usually, thalassemia diagnos...
Information extraction from Arabic as well as other languages text is commonly implemented over restricted text domains. Approaching open text domains is challenging, because of the syntactic, semantic and pragmatics ambiguities and variations in text. For the purpose of approaching more relaxed versions of Arabic text domains, Fasha et al. (Fasha...
This study reports on the construction of a one million word English-Arabic Political Parallel Corpus (EAPPC), which will be a useful resource for research in translation studies, language learning and teaching, bilingual lexicography, contrastive studies, political science studies, and cross-language information retrieval. It describes the phases...
Databases of mobile ad hoc network (MANET) are stored in MANET nodes. The communication between mobile servers and clients is influenced by mobility and constrained battery energy of nodes. To enhance data availability, MANET databases create various replicas of every data object and allocate them on different mobile hosts. There are various data r...
Breast cancer is the second most frequent human neoplasm that accounts for one quarter of all cancers in females. Among the other types of cancers, it is considered to be the main cause of death in women in most countries. An efficient classifier for accurately helping physicians to predict this chronic disease is in high demand. One approach for s...
The application of various data mining (DM) tools and techniques in extracting useful information that are potentially valuable and significant is a trend in the research community that affects the education decision maker. Big data in education is also an important type of data to highlight because of the increasing number of online learning syste...
This paper presents a proposed a model for extraction information from Arabic-based controlled text domains. We define controlled text domains as the text domains that are not restricted in terms of their linguistic features or their knowledge types yet they are not totally undetermined in these respects. A two-phase Information Extraction (IE) sch...
In this paper an efficient pattern matching ap-proach, based on a multithreading sliding window technique, is proposed to improve the efficiency of the common sequential exact pattern matching algorithms including: (i) Brute Force, (ii) Knuth-Morris-Pratt and (iii) Boyer-Moore. The idea is to divide the text under-search into blocks, each block is...
Association Classification (AC) technique is a predictive approach that has been investigated widely in the last two decades. Many researchers attempted to use AC in real-world applications such as: text classification, medical diagnoses, fraud detection and website phishing. However, there are a few concerns about using this technique and they are...
We set out to discover whether or not the summaries produced by our Arabic text summarization software were potentially useful to a wide range of people. 1200 students at the University of Jordan were each given a copy of a newspaper article and a system-generated summary and asked to classify the summary as Rejected (R), Not-Related (N), Satisfact...
Text visualization has become a significant tool that facilitates knowledge discovery and insightful presentation of large amounts of data. This paper presents a visualization system for exploring Arabic text called ViStA. We report about the design, the implementation and some of the experiments we conducted on the system. The development of such...
Advancement in technology turns the big world into one small village. Regardless of what country you are living in, what language you are speaking or understanding, you should be able to benefit from the accumulated knowledge available on the Internet. Unfortunately, this is not the case with English being the de facto language of most programming...
This paper presents a historical Arabic corpus named HAC. At this early embryonic stage of the project, we report about the design, the architecture and some of the experiments which we have conducted on HAC. The corpus, and accordingly the search results, will be represented using a primary XML exchange format. This will serve as an intermediate e...
In this paper, we present an Arabic dialogue
system (also referred to as a conversational agent)
intended to interact with hotel customers and generate
responses about reserving a hotel room and other
services. The system uses text-based natural language
dialogue to navigate customers to the desired answers.
We describe the two main modules used in...
In this paper, we present an Arabic dialogue
system (also referred to as a conversational agent)
intended to interact with hotel customers and generate
responses about reserving a hotel room and other
services. The system uses text-based natural language
dialogue to navigate customers to the desired answers.
We describe the two main modules used in...
The research focus in our paper is twofold: (a) to examine the extent to which simple Arabic sentence structures comply with the Government and Binding Theory (GB), and (b) to implement a simple Arabic Context Free Grammar (CFG) parser to analyze input sentence structures to improve some Arabic Natural Language Processing (ANLP) Applications. Here...
The research focus in our paper is twofold: (a) to examine the extent to which simple Arabic sentence structures comply with the Government and Binding Theory (GB), and (b) to implement a simple Arabic Context Free Grammar (CFG) parser to analyze input sentence structures to improve some Arabic Natural Language Processing (ANLP) Applications. Here...
In this paper, we employ the Government and Binding theory (GB) to present a system that analyzes the syntactic structure of some simple Arabic sentences structures. We consider different word orders in Arabic and show how they are derived. We include an analysis of Subject-Verb-Object (SVO), Verb-Object-Subject (VOS), Verb-Subject-Object (VSO), no...
The amount of unstructured textual data on the Internet has been increased dramatically. Text visualization becomes a significant tool that facilitates knowledge discovery and insightful presentation of large amounts of data. In this paper we present a technique of the visual exploration of Arabic text documents. We apply Latent Semantic Indexing (...
In this paper, we present a simple mining technique named the Quran Mining Technique (QMT) in an attempt to automatically classify the Suras (i.e. chapters) of the Quran based on predefined set of 10 themes. QMT is composed mainly of two phases: a preprocessing phase and a classification phase. In the first phase, we manually label a set of represe...
We present a hybrid approach to the problem of Arabic text summarization. Our approach focuses on segment extraction and ranking using heuristic methods that assign weighted scores to segments of text. Also, we use a text categorization system and the Arabic WordNet to identify the thematic structure of the input text in order to select the most re...
In this chapter, we present the findings of a study carried out at the University of Jordan to evaluate ICTE, an accredited, one-year high diploma training program in ICT Education. The ICTE program aims at preparing Jordanian teachers to use information and communication technology in education. In this study, we attempt to examine the program in...
Problem statement: The issue of having robust and fragile watermarking is still main focus for various researchers worldwide. Performance of a watermarking technique depends on how complex as well as how feasible to implement. These issues are tested using various kinds of attacks including geometry and transformation. Watermarking techniques in co...
The majority of Arabic text available on the web is written without short vowels (diacritics). Diacritics are commonly used
in religious scripts such as the holy Quran (the book of Islam), Al-Hadith (the teachings of Prophet Mohammad (PBUH)), children’s
literature, and in some words where ambiguity of articulation might arise. Internet Arabic users...
In this paper, we present and analyze the results of the application of Arabic query-based text summarization system - AQBTSS - in an attempt to produce a query-oriented summary for a single Arabic document. For this task, we adapted the traditional vector space model (VSM) and the cosine similarity measure to find the most relevant passages extrac...
One of the most popular clustering techniques is the k-means clustering algorithm.However, the utilization of the k-means is severely limited by its high computational complexity. Inthis study, we propose a new strategy to accelerate the k-means clustering algorithm through the PartialDistance (PD) logic. The proposed strategy avoids many unnecessa...
In this paper we discuss the enhancement of Arabic passage retrieval for both diacritisized and nondiacritisized text. Most previous work suggested that retrieval start with pre-processing the Arabic text to remove the diacritical marks (short vowels) to unify the text. In most cases, this process causes considerable ambiguity at the word level in...
In this paper, we present an automated support environment to reduce the time and efforts required to produce and maintain a reusable specification document. Our proposed model has two operation modes: the first one is the forward mode in which our model automatically converts English natural language requirements into UML class diagram models. Whi...
Requirement engineering is a fundamental step in the production of high quality software. Many attempts have been conducted to automate some aspects of the requirements engineering process. In this paper, we present a framework that provides the requirements engineers with an environment, which accepts English natural language requirements as input...
Modern Arabic text is written without diacritical marks (short vowels), which causes considerable ambiguity at the word level in the absence of context. Exceptional from this is the Holy Quran, which is endorsed with short vowels and other marks to preserve the pronunciation and hence, the correctness of sensing its words. Searching for a word in v...
The World Wide Web (WWW) today is so vast that it has become more and more difficult to find answers to questions using standard search engines. Current search engines can return ranked lists of documents, but they do not deliver direct answers to the user. The goal of Open Domain Question Answering (QA) systems is to take a natural language questi...
We describe the design and implementation of a question answering (QA) system called QARAB. It is a system that takes natural language questions expressed in the Arabic language and attempts to provide short answers. The system's primary source of knowledge is a collection of Arabic newspaper text extracted from Al-Raya, a newspaper published in Qa...
In this chapter, we present the findings of a study carried out at the University of Jordan to evaluate ICTE, an accredited, one-year high diploma training program in ICT Education. The ICTE program aims at preparing Jordanian teachers to use information and communication technology in education. In this study, we attempt to examine the program in...