
Mohsen A. RashwanCairo University | CU · Department of Electronics and Communication Engineering
Mohsen A. Rashwan
PhD
About
152
Publications
59,372
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,049
Citations
Citations since 2017
Introduction
Additional affiliations
Education
September 1985 - May 1987
Publications
Publications (152)
Semantic Textual Similarity (STS) is the task of identifying the semantic correlation between two sentences of the same or different languages. STS is an important task in natural language processing because it has many applications in different domains such as information retrieval, machine translation, plagiarism detection, document categorizatio...
The problem of region of interest (RoI) in document layout analysis and document recognition has recently become an essential topic in OCR'ing systems. Arabic manuscript layout analysis and OCRing recognition using language detection, document category, and region of interest (RoI) with Keras and TensorFlow are terms of the state-of-the-art that sh...
The way in which people speak reveals a lot about where they are from, where they were raised, and also where they have recently lived. When communicating in a foreign language or second language, accents from one’s first language are likely to emerge, giving an individual a ‘strange’ accent. This is a great and challenging problem. Not particularl...
Any natural language may have dozens of accents. Even though the equivalent phonemic formation of the word, if it is properly called in different accents, humans do have audio signals that are distinct from one another. Among the most common issues with speech, the processing is discrepancies in pronunciation, accent, and enunciation. This research...
In this paper, automatic segmentation system was built using the Kaldi toolkit at phoneme level for Quran verses data set with a total speech corpus of (80 hours) and its corresponding text corpus respectively, with a size of 1100 recorded Quran verses of 100 non-Arab reciters. Initiated with the extraction of Mel Frequency Cepstral Coefficients MF...
The recent surge of social media networks has provided a channel to gather and publish vital medical and health information. The focal role of these networks has become more prominent in periods of crisis, such as the recent pandemic of COVID-19. These social networks have been the leading platform for broadcasting health news updates, precaution i...
Due to the rapid developments in technology and the sudden expansion of social media use, Dialect Arabic has become an important source of data that needs to be addressed when building Arabic corpora. In this paper, thirty-three Arabic corpora are surveyed to show that despite all of the developments in the literature, Saudi dialect (SD) corpora st...
In this paper, we describe the Automatic Speech Recognition (ASR) system developed by the team of RDI in the framework of the 2019 Multi-Genre Broadcast (MGB-5) challenge in Arabic language. The challenge of this year is considered as a task of building a system for transcribing Morocan Dialect Arabic speech, using a big audio corpus of primarily M...
Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks capabilities. Generally, neural networks have demonstrated success compared to conventional n-gram...
In this paper, we describe a detailed approach to develop a botnet detection system using machine learning (ML) techniques. Detecting botnet member hosts, or identifying botnet traffic has been the main subject of many research efforts. This research aims to overcome two serious limitations of current botnet detection systems: First, the need for D...
Arabic Modern texts are commonly written without diacritization, which is a critical task for other Arabic processing tasks as word sense disambiguation, automatic speech recognition, and text to speech, where word meaning or pronunciation is decided based on the diacritic signs assigned to each letter.
This paper presents a novel approach for aut...
This paper presents a system for improving the quality of pronunciation error detection and correction for Qur'an recitation by Non-Arabic speakers. Most of the classical speech recognition systems are built using the Hidden Markov Model (HMM) with a Mixture of Gaussian Model (GMM). This paper attempts to enhance the GMM-HMM model's performance by...
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effectiv...
Document layout analysis is a key step in the process of converting document images into text. Arabic language script is cursive and written in different styles which cause some challenges in the analysis of Arabic text documents. In this paper, we introduce an approach for Arabic documents layout analysis. In that approach, the document is segment...
This paper proposes a new optical camouflage system that uses RGB-D cameras, for acquiring point cloud of background scene, and tracking observers eyes. This system enables a user to conceal an object located behind a display that surrounded by 3D objects. If we considered here the tracked point of observer s eyes is a light source, the system will...
In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network (SOM). The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and...
Automated segmentation of speech signals has been under research for over 30 years. Many speech processing systems require segmentation of Speech waveform into principal acoustic units. Segmentation is a process of breaking down a speech signal into smaller units. Segmentation is the very primary step in any voiced activated systems like speech rec...
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
This issue includes the following articles; P1121606475, Author="M. AbdelDayem and H. Hemeda and A. Sarhan", Title="Enhanced User Authentication through Keystroke Biometrics for Short-Text and Long-Text Inputs" P1121602462, Author="Eslam Mahmoud and Ahmed M. Elmogy and Amany Sarhan", Title="Enhancing Grid Local Outlier Factor Algorithm for better O...
In this paper, a system is proposed to prepare a digital or a scanned Quran version for a verification process. The system handles the skew errors in the scanned image, Text extraction from ornamentation, a successful line segmentation for Arabic scripts, verse pattern detection for different versions, and powerful diacritics classifier. The propos...
Gaussian Mixture Models (GMM) has been the most common used models in pronunciation verification systems. The recently introduced Deep Neural Networks (DNN) has proved to provide significantly better discriminative models of the acoustic space. In this paper, we introduce our efforts to upgrade the models of a Computer Aided Language Learner (CAPL)...
Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper pro...
Ranking is an important task in the field of information retrieval. Ranking may be used in different modules in natural language processing such as search engines. In this paper, we introduce a competitive ranking system which combines three different modules. The system participated in SemEval 2016 question ranking task for the Arabic language. Th...
Ranking is an important task in the field of information retrieval. Ranking may be used in different modules in natural language processing such as search engines. In this paper, we introduce a competitive ranking system which combines three different modules. The system participated in SemEval 2016 question ranking task for the Arabic language. Th...
Vector-based approaches proved their validity during the past few years as promising techniques for word and sentence representation. Automatic short answer grading is a challenging problem in natural language processing that can reduce a lot of human effort, accordingly research was fo-cused towards exploiting several vector representations to sol...
Arabic spelling errors occur in different types of documents, such as handwritten by non
experienced users, optical character recognition (OCR) documents and machine translated
documents. Many researchers had tried to solve this dilemma but till now there is no a radical
solution.
This paper proposes a hybrid system based on the confusion matrix an...
Many researchers have been investigating the task of plagiarism detection lately. In this paper we present RDI system for intrinsic plagiarism detection (RDI_RID). RDI_RID system was the only system that participated in intrinsic track of the Arabic language plagiarism detection competition. RDI_RID system achieved a PlagDet (Plagiarism Detection s...
Extrinsic plagiarism detection gathered the attention of many researchers lately. Plagiarism process began to be more and more difficult to be detected due to appearance of other sophisticated plagiarism approaches other than direct copy and paste such as (phrase rephrasing, word shuffling, semantic substitution, etc…). In this paper, we present RD...
Extrinsic plagiarism detection gathered the attention of many researchers lately. Plagiarism process began to be more and more difficult to be detected due to appearance of other sophisticated plagiarism approaches other than direct copy and paste such as (phrase rephrasing, word shuffling, semantic substitution, etc…). In this paper, we present RD...
In this work we propose a fully automatic pre-processing technique to enhance the digital camera captured images and rectify the different known types of distortion to improve the performance of OCR applications. Our proposed approach depends on the features of the text lines and letters and doesn't need any especial equipment. Experimental results...
Polysemous words acquire different senses and meanings from their contexts. Representing words in vector space as a function of their contexts captures some semantic and syntactic features for words and introduces new useful relations between them. In this paper, we exploit different vectorized representations for words to solve the problem of Cros...
A lot of work has been done to give the individual words of a certain language adequate representations in vector space so that these representations capture semantic and syntactic properties of the language. In this paper, we compare different techniques to build vectorized space representations for Arabic, and test these models via intrinsic and...
In this paper, we aim to move ontology-based Arabic NLP forward by experimenting with the generation of a comprehensive Arabic lexical ontology using multiple language resources. We recommend a combination of MUHIT, WordNet and SUMO and use a simple method to link them, which results in the generation of an Arabic-lexicalized version of the SUMO on...
Emotion conversion using a small speech corpus is very important for expressive text to speech systems. Applying the unit selection paradigm for intonation conversion has been widely used for different languages using different intonation units. In this paper, an emotion conversion system is proposed for expressive Arabic speech. This system combin...
This paper describes a speech-enabled Computer Aided Pronunciation Learning (CAPL) system. This system was developed for teaching Arabic pronunciations to non-native speakers. A challenging application of that system is teaching the correct recitation of the Holy Qur'an. This system uses a state of the art speech recognizer to detect errors in the...
The Arabic language belongs to a group of languages
that require diacritization over their characters. Modern Standard
Arabic (MSA) transcripts omit the diacritics, which are essential
for many machine learning tasks like Text-To-Speech (TTS) systems.
In this work Arabic diacritics restoration is tackled under a
deep learning framework that include...
Many researchers have been investigating the task of plagiarism detection lately. In this paper we present RDI system for intrinsic plagiarism detection (RDI_RID). RDI_RID system was the only system that participated in intrinsic track of the Arabic language plagiarism detection competition. RDI_RID system achieved a PlagDet (Plagiarism Detection s...
This paper presents an optical character/text recognition (OCR) system for cursive scripts like those of Arabic, Urdu, Persian, Kurdish, etc. This OCR system is a large-scale one in the sense of architecture, training data size, and state-of-the-art performance. The paper introduces the theoretical derivation and experimental assessment of our two...
Although datasets represent a critical part of research and development activities, botnet research suffers from a serious shortage of reliable and representative datasets. In this paper, we explain a new approach to build a botnet experimentation platform completely from off-the-shelf open sources. This work aims to fill the gap in botnet research...
Traditional keyword based search is found to have some limitations. Such as word sense ambiguity, and the query intent ambiguity which can hurt the precision. Semantic search uses the contextual meaning of terms in addition to the semantic matching techniques in order to overcome these limitations. This paper introduces a query expansion approach u...
In this paper, Arabic diacritics restoration
problem is tackled under the deep learning
framework presenting Confused Subset
Resolution (CSR) method to improve
the classification accuracy, in addition to
Arabic Part-of-Speech (PoS) tagging
framework using deep neural nets. Special
focus is given to syntactic diacritization,
which still suffer low a...
Most of opinion mining works need lexical resources for opinion which recognize the polarity of words (positive/ negative) regardless their contexts which called prior polarity. The word prior polarity may be changed when it is considered in its contexts, for example, positive words may be used in phrases expressing negative sentiments, or vice ver...
In this paper, phonetic editor system for learning English speaking will be introduced. Methods and the architecture of systems used to edit new lessons into proposed dictionary will be discussed taken into consideration pronunciation effects. Speak Correct system will be presented, which uses state of the art automatic speech recognition (ASR) and...
Literacy and adult education are an essential objective for realizing development and increasing production for any country. Egypt is one of the countries that still has high rate of illiteracy is around 30% of the adult population (age range 15-45). In Saudi Arabia the distant regions faces a similar challenge. Traditional literacy classes proved...
In this paper we introduce the SpeakCorrect system which is a Computer Aided Pronunciation Training (CAPT) system for native Arabic students of English. The system is designed with optimized performance for the target users group. It is L1 dependent system and only the frequent pronunciation errors of native Arabic speakers are examined. Several ad...
In this paper, phonetic editor system for learning English speaking will be introduced. Methods and the architecture of systems used to edit new lessons into proposed dictionary will be discussed taken into consideration pronunciation effects. Speak Correct system will be presented, which uses state of the art automatic speech recognition (ASR) and...
the aim of this paper is to introduce a new technique
that enhances online translation from English to Arabic for a
specific domain. This enhancement is achieved by training a new
"Arabic online engine translation" to "Arabic manual translation"
model that corrects common errors in the online translation.
This paper focuses on two popular online tr...
The aim of this paper is to introduce a new technique that enhances online translation from English to Arabic for a specific domain. This enhancement is achieved by training a new "Arabic online engine translation" to "Arabic manual translation" model that corrects common errors in the online translation. This paper focuses on two popular online tr...
Language resources are important factor in any NLP application. However, the language resource support for Arabic is poor because the existing Arabic language resources are either scattered, inconsistent or even incomplete. In this paper we discuss the notion of having an integrated Arabic resource leveraging various pre-existing ones. We present a...
Large amounts of ground truth data is vital for building, testing, analyzing and improving the performance of character recognizers especially those using segmentation based routines. Ground truth information, the annotation, can be associated with the document images at the paragraph level, the sentence level, the word level, and up until the char...
In this paper we propose a segmentation system for unconstrained Arabic online
handwriting. An essential problem addressed by analytical-based word recognition system.
The system is composed of two-stages the first is a newly special designed hidden Markov
model (HMM) and the second is a rules based stage. In our system, handwritten words are
b...
Email has become an essential communication tool in modern life, creating the need to manage the huge information generated. Email classification is a desirable feature in an email client to manage the email messages and categorize them into semantic groups. Statistical artificial intelligence and machine learning is a typical approach to solve suc...
Recognizing old documents is highly desirable since the demand for quickly
searching millions of archived documents has recently increased. Using Hidden Markov
Models (HMMs) has been proven to be a good solution to tackle the main problems of
recognizing typewritten Arabic characters. These attempts however achieved a remarkable
success for omn...
Adaptation is a property of intelligent machines to
update its knowledge according to actual situation. Self-learning
machines (SLM) as defined in this paper are those learning by
observation under limited supervision, and continuously adapt
by observing the surrounding environment. The aim is to mimic
the behavior of human brain learning from surr...