Content uploaded by Junaid Qadir

Author content

All content in this area was uploaded by Junaid Qadir on Feb 02, 2021

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Received January 4, 2021, accepted January 19, 2021, date of publication January 22, 2021, date of current version February 1, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3053917

Sentence-Level Classification Using Parallel

Fuzzy Deep Learning Classifier

FATIMA ES-SABERY 1, ABDELLATIF HAIR1, JUNAID QADIR 2, BEATRIZ SAINZ-DE-ABAJO 3,

BEGOÑA GARCÍA-ZAPIRAIN 4, (Member, IEEE), AND ISABEL DE LA TORRE-DÍEZ3

1Department of Computer Science, Faculty of Sciences and Technology, Sultan Moulay Slimane University, Beni Mellal 23000, Morocco

2Department of Electronics, Quaid-i-Azam University, Islamabad 45320, Pakistan

3Department of Signal Theory, Communications and Telematics Engineering, University of Valladolid, 47011 Valladolid, Spain

4eVIDA Research Group, University of Deusto, 48007 Bilbao, Spain

Corresponding authors: Beatriz Sainz-De-Abajo (beasai@tel.uva.es) and Fatima Es-Sabery (fatima.essabery@gmail.com)

This work was supported by the eVida Research Group, University of Deusto, Bilbao, Spain, under Grant IT 905-16.

ABSTRACT At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook,

and Twitter, users honestly communicate their opinions and ideas about events, services, and products.

Owing to this rise in the number of social platforms and their extensive use by people, enormous amounts

of data are produced hourly. However, sentiment analysis or opinion mining is considered as a useful

tool that aims to extract the emotion and attitude from the user-posted data on social media platforms by

using different computational methods to linguistic terms and various Natural Language Processing (NLP).

Therefore, enhancing text sentiment classiﬁcation accuracy has become feasible, and an interesting research

area for many community researchers. In this study, a new Fuzzy Deep Learning Classiﬁer (FDLC) is

suggested for improving the performance of data-sentiment classiﬁcation. Our proposed FDLC integrates

Convolutional Neural Network (CNN) to build an effective automatic process for extracting the features

from collected unstructured data and Feedforward Neural Network (FFNN) to compute both positive and

negative sentimental scores. Then, we used the Mamdani Fuzzy System (MFS) as a fuzzy classiﬁer to

classify the outcomes of the two used deep (CNN+FFNN) learning models in three classes, which are:

Neutral, Negative, and Positive. Also, to prevent the long execution time taking by our hybrid proposed

FDLC, we have implemented our proposal under the Hadoop cluster. An experimental comparative study

between our FDLC and some other suggestions from the literature is performed to demonstrate our offered

classiﬁer’s effectiveness. The empirical result proved that our FDLC performs better than other classiﬁers

in terms of true positive rate, true negative rate, false positive rate, false negative rate, error rate, precision,

classiﬁcation rate, kappa statistic, F1-score and time consumption, complexity, convergence, and stability.

INDEX TERMS Deep learning, convolutional neural network (CNN), sentiment analysis, feedfor-

ward neural network (FFNN), fuzzy logic, Hadoop framework, MapReduce, Hadoop Distributed File

System (HDFS).

I. INTRODUCTION

Social network platforms like Instagram, Twitter, Youtube,

LinkedIn, and Facebook have been considered essential and

indispensable in our daily activities. Day-to-day, billions of

social media users disseminate billions of personal or pro-

fessional posts [1]. For example, Marketers use social media

to spread professional posts that endeavor to present, pro-

mote, advertise, and market their products, services, events,

and brand names. On the other hand, the customers interact

The associate editor coordinating the review of this manuscript and

approving it for publication was Jing Bi .

with the marketers’ posts by express their feelings, opinions,

ideas, attitudes about the presented products or services [2].

Further, the marketers gather the customer’s feedback, study,

and analyze it using the sentiment analysis tool. The main

objective from they are doing these operations is to improve

the quality of their products and services, enhance their

offerings by adding other privileges, and improve their brand

performance [1], [2].

Sentiment Analysis (SA) plays a signiﬁcant role in Busi-

ness Intelligence (BI). In BI, it uses to get responses for

questions such as, ‘Why is product sales so low?’, ‘Have cus-

tomer’s needs are fully satisﬁed by utilizing our products?’,

VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 17943

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

‘What did they like the most about our services?, What

did they dislike the most?’,‘Are our customers pleased by

using our products/services or require more?’. We employ

sentiment mining tools and techniques to ﬁnd the rele-

vant answers to these questions [3]. Such as NLP methods

for data pre-processing like word stemming, word lemma-

tization, and effect of negation. Machine Learning (ML)

methods for sentiment classiﬁcation like Support Vector

Machines (SVM) [4], K-Nearest Neighbors (KNN) [5], Ran-

dom Forest (RF) [6], Logistic Regression (LR) [7], Naive

Bayesian (NB) [8] or Decision Tree (DT) classiﬁer [9]. Deep

Learning (DL) methods like Convolutional Neural Network

(CNN) [10], Feedforward Neural Network (FFNN) [11],

Long Short-Term Memory (LSTM) [12], Gated Recurrent

Unit (GRU) [13], or Recurrent Neural Network (RNN) [14]

which are used for automatically extracting features from the

given text and for performing the sentiment classiﬁcation.

and vectorization methods like word embeddings (word2vec,

glove) [15], tf-idf [16], bag-of-words [17], fast-text, n-gram

or character-level [18], which are used for representing the

given text by numerical values.

SA is also termed opinion mining and pursues to recognize

people’s moods or emotions toward an entity like events,

products, individuals, services, or topics. At present, SA has

been mainly deemed a sentiment classiﬁcation task in the

context of ML, that is to say, each expressed sentiment in

the given text is classiﬁed as positive, neutral, or negative.

Data-driven techniques, involving ML and DL methods, are

considered as one accurate and efﬁcient solution to carry

out the sentiment classiﬁcation task. The application of ML

methods has proved that they are a powerful tool for clas-

sifying expressed sentiments in the given text. In particular,

SVM, NB, DTs, RF, and LR methods, etc. which are used

extensively with high accuracy in wide application ﬁelds

that include sentiment analysis, such as cyberhate detec-

tion [19] movie and product reviews [20], [21], abusive lan-

guage detection [22], cyberbullying identiﬁcation [23], and

social media [24]. In addition to classical ML algorithms as

presented earlier, there are likewise DL algorithms such as

CNN, FFNN, LSTM, GRU, and RNN, which are presently

preferred for sentiment classiﬁcation.

Over the ages, several methods have been proposed to

supply users with an effective process for classifying senti-

ments. These methods have evolved from dictionary-based

approaches to ML techniques and presently to DL models.

According to past comparative studies and analyses that are

carried out on SA using both ML and DL. DL is more efﬁcient

than ML in sentiment classiﬁcation issues due to massive

data [25]. Fig. 1 proves that the accuracy of conventional ML

algorithms is better for a lesser size of data. As the volume

of data rises beyond a particular number, the accuracy of

conventional ML algorithms becomes constant. In contrast,

the accuracy of DL algorithms raises with respect to the rise

in the volume of data.

A principal difference between ML approaches and DL

models is in the manner that the features are extracted.

FIGURE 1. The accuracy of ML approaches compared to the accuracy of

DL models with respect to data size.

As known that the accuracy of a DL or ML approach is

extremely dependent on a good feature extraction process

from given data. Classical ML techniques utilize hand-crafted

engineering features by employing many feature extraction

methods, and thence apply the learning methods. This process

takes high time-consuming and extracts incomplete features.

In contrast, the features are extracted automatically in the case

of DL, which is the powerful point of DL models against ML

techniques. Due to the fast-growing size of data in this era

and to the good performance of DL models on the massive

size of data, we decide to combine both CNN and FFNN

in order to build an effective automatic process for extract-

ing the features from the given social media dataset. But

although the use of the most progressing DL models, there

is ingrained ambiguity in NLP that needs more solutions.

Based on several studies [26], [27], the fuzzy logic theory

is deemed the most effective solution to deal with vagueness

data. Beer [26] demonstrates in his paper that the fuzzy logic

theory’s strength is its resemblance to human reasoning and

natural language. One substantial property of the fuzzy logic

theory is the used technique for computing with terms. This

technique serves to transform the terms into numerical values

for reasoning, inference, and computing. Fuzzy logic supplies

us with an eligible manner to handle linguistic issues at

fuzziness data. Zadeh et al. [27] discuss in their work that no

other technique resolves these linguistic issues. Accordingly,

to advantages introduced in [27] about fuzzy logic theory in

the area of fuzziness data, we decide to employ this theory in

our work to deal with the inherent uncertainty in the social

media data.

The fuzzy logic theory is deemed as an ameliorated version

of the deterministic logic theory. i.e., in fuzzy logic theory,

the linguistic variable takes a real value between 0 and 1

instead of, in deterministic logic, each variable takes either

0 or 1. The principal purpose of the fuzzy logic theory is

transforming a white and black issue into a grey problem [28].

17944 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

In the descriptions of set theory, deterministic or conventional

logic assumes that the set of linguistic variables as the crisp

linguistic variables, which signiﬁes that each linguistic vari-

able’s measured membership degree in the crisp collection is

equal to 1. In other words, the linguistic variable ultimately

belongs to the crisp set. In contrast, fuzzy logic is regarding

the collection of linguistic variables as the fuzzy collection,

which indicates that each linguistic variable’s membership

value in a fuzzy group is varied from 0 to 1. In other words,

each linguistic variable belongs partially to the fuzzy col-

lection. The membership degree is measured by a particular

Membership Function (MF) such as triangular MF, Gaussian

MF, and trapezoidal MF [2], [3].

In order to enhance the classiﬁcation rate of the

sentence-level classiﬁcation and to overcome the previously

discussed shortcomings as best as we can, we introduce a

new Fuzzy Deep Learning Classiﬁer (FDLC) to recognize

the polarity (negative, positive, neutral) of sentiment sen-

tences. Our methodology mainly contains ﬁve parts: One

part is the data-preprocessing steps in order to reduce the

noisy data and enhance the data quality; The second part

is the application of word embedding methods to convert

the text-based data to numerical-based data. The third part

is the proposed hybrid DL model that combines CNN and

FFNN in order to build an effective automatic process for

extracting the features from collected unstructured-data and

compute both Positive Sentimental Score (PSS) and Negative

Sentiment Score (NSS), and the four-part is the MFS which

is used as a fuzzy classiﬁer to classify the outcomes of the

two used deep (CNN+FFNN) learning models into three

classes Neutral, Negative and Positive. Finally, we use the

Hadoop framework to parallelize our FDLC to overcome the

long execution time problem. In addition–to prove the per-

formance of our suggested FDLC– an experimental compar-

ative study between our model and some other models from

the literature is carried out, and the practical result proved

that our FDLC performs better than other models in terms

of true positive rate, true negative rate, false positive rate,

false negative rate, error rate, precision, classiﬁcation rate,

kappa statistic, F1-score and time consumption, complexity,

convergence and stability. The contribution of our work is

fundamentally incarnated into six aspects.

1) Data preprocessing techniques are employed to

enhance the text based-data quality of the given dataset

by removing the existing noise data.

2) Word embeddings methods like word2vec, glove, and

fast-text are applied on the given dataset to transform

texts into numerical data.

3) DL methods FFNN and CNN with various parameters

are employed for computing the NSS and PSS.

4) Fuzzy logic theory (MFS) is applied as the fuzzy

classiﬁer on the outcomes of the previous step (NSS

and PSS) in order to classify the sentences of the

used dataset into three labels negative, neutral and

positive. An elevated accuracy has been accomplished

because the suggested FDLC elicits more accurate

features automatically and various entities from the

given dataset.

5) Hadoop framework is used to parallelize our introduced

FDLC in order to avoid the long execution time issue.

6) Multiple experiments are carried out to prove the effec-

tiveness of our introduced FDLC, and this FDLC is

compared with several selected classiﬁers from the

literature. The experimental results indicate that the

suggested model achieves a better classiﬁcation rate

than the other classiﬁers on the given dataset.

The rest of this work is arranged as follows: Section 2

introduces some similar works selected from the literature

in detail. Section 3 explains the basic concepts of MFS,

FFNN and CNN in brief. Section 4 describes our suggested

FDLC. Section 5 presents the experimental outcomes and

comparative study results to demonstrate the effectiveness

classiﬁcation of the proposed classiﬁer, and Section 6 intro-

duces the conclusions and future work.

II. LITERATURE REVIEW

The objective of the SA process is to extract the emotional

polarity included in the given sentence. Because of that,

Many researchers proposed various sentiment classiﬁcation

approaches based on ML, DL, or hybrid methods. This

section presents recently published models in the literature.

Authors of the paper [29] proposed a new approach that

combines the multi-scale CNN and LSTM, in order to raise

the classiﬁcation rate and decrease the error rate of the SA.

They handle the review text of various commodities in a dif-

ferent manner, while preserving the shared features between

each review data, and integrates the global extracted features

and local founded features of the review data to enhance

the effectiveness of the classiﬁcation process. Their sug-

gested classiﬁer is based on two principal phases: Firstly,

a multi-task training system is applied to detect private, and

shared features from a review data of different kinds of

commodities. The second phase is the sentence representation

using the word embedding methods [30]. A comparative

study is carried out by the authors to compare their pro-

posed model with other approaches like LR, RF, NB, KNN,

SVM, XGBoost, and Gradient Boosting DT. The experimen-

tal results of the comparative study prove that the SA effec-

tiveness of their suggested approach is more accurate than

other selected methods from the literature with the accuracy

equal to 86.25%.

Lan et al. [31] introduced a new deep learning system

called SRCLA that combines the cross-layer attention pro-

cess and the stacked residual RNN and in order to extract

more numerous linguistic features then use it for the SA task.

The primary purpose of SRCLA is to construct a stacked

RNN to identify and ﬁlter various kinds of semantic features

and then use the new designed cross-layer attention system

in order to ameliorate the ﬁltering task. Based on SRCLA,

more linguistic features can be identiﬁed, and hence the effec-

tiveness of sentiment classiﬁcation can be enhanced. The

authors of this work applied both models Stacked-BiLSTM,

VOLUME 9, 2021 17945

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

and their model SRCLA on four datasets TREC, SST-1,

MR, and SST-2. The experimental outcomes proved that

SRCLA attains 3.0% amelioration over SST-1 dataset, 2.0%

reﬁnement over SST-2 dataset, 2.5% amelioration over MR

dataset, and 1.5% reﬁnement over TREC dataset, compared

with Stacked-BiLSTM.

Lin et al. [32] developed a new model in order to enhance

the effectiveness of SA. The proposed classiﬁer incorporates

the capability of Bi-LSTM to select local series of features

and the power of multi-Head Attention to identify thorough

features. They use a comparative study to improve the pro-

posed model and enhances its performance.

Liu et al. [33] proposed a hybrid approach that incorpo-

rates the DL models with ML methods to perform better

sentiment classiﬁcation. The proposed model is a bilingual

sentiment classiﬁcation model that is applied to Turkish and

Chinese language datasets. This proposed method combines

RNN, LSTM, NB, SVM, and word embedding. Their exper-

imental results proved that the accuracy of their suggested

approach could attain 89% and performs better than any other

ML or DL approach individually.

Jain et al. [34], the tweets about renewable energy were

classiﬁed to be positive, neutral, or negative using ﬁve dif-

ferent ML algorithms, which are SVM, KNN, NB, AdaBoost,

and Bagging. The authors of this work choose the informa-

tion gain function and CfsSubsetEvaluation [34] to extract

the features from the given datasets. They implement their

proposed approach using WEKA Tool and R-Studio. Due to

the experimental outcomes, they are deduced that the SVM

Algorithm outperforms other ML algorithms when integrated

with the CfsSubsetEvaluation function.

Alecet al. [35] developed a novel deep learning classi-

ﬁer to sentiment classiﬁcation through the combination of

LSTM with CNN at the kernel level. Their incorporation

of LSTM and CNN schema generated a new hybrid deep

learning model with elevated accuracy when they applied

it on the Internet Movie Database. In addition, the authors

present in this paper multiple scheme variations of their sug-

gested model in order to demonstrate their attempts to raise

accuracy while decrease overﬁtting. They performed many

experiments with various regularization methods, kernel size

and network architectures, to design ﬁve models with high

performance for comparing with other approaches in the

literature. These models achieved 89% in the accuracy metric

when they used to predict the polarity score of reviews from

the Internet Movie dataset.

Xing et al. [36] applied a novel neural network architec-

ture for SA of the market review dataset. They combined

the evolving clustering method with LSTM deep learning

model. Experimental results of the application of the pro-

posed methodology on sentences from StockTwits prove that

the suggested framework achieves good accuracy compared

with other existing methods from the literature.

In [37], the authors propose a novel methodology for SA.

This methodology combines the CNN, word embedding,

RNN, and polarity lexicons. In addition, the proposed system

is composed of three CNNs in order to extract high-level

features from noisy word embedding representations. The

outcome of these three CNNs is aggregated and employed

as an entering variable of a fully-connected Multilayer Per-

ceptron. Experimental results accomplished by their classiﬁer

were more competitive in several subtasks.

Wang et al. [38] proposed a growing deep belief network

with transfer learning (TL-GDBN), which is an improved

version of the traditional deep belief network (DBN). Their

suggested model’s objective is to overcome a shortcoming

of DBN, which is it is difﬁcult to determine the DBN’s

optimal structure fastly. Their contribution is carried out in

four aspects. First, a simple DBN architecture consists of

one hidden layer is implemented and then pre-trained, and

the obtained weight parameters after the pre-trained pro-

cess are kept. Second, They applied TL to transfer the data

from the kept weight parameters in the previous step to

newly attached units, and hidden layers. The learning process

is repeated until achieved the stopping criterion; hence a

growing DBN architecture is constructed. Third, the weight

parameters obtained after the pre-training process of the pro-

posed TL-GDBN are ﬁne-tuned by applying layer-by-layer

partial least square regression from top to bottom. Finally,

The experimental results prove that their proposed model

gives good performance compared to other models in the

literature.

In [39], the authors developed a new hybrid model for

binary SA. They combine the global pooling mechanism and

one bidirectional long short-term memory (BiLSTM) layer

along. They evaluate their work using three widely practiced

datasets based on public opinions about movies. Their pro-

posed model achieved an accuracy of 80.500%, 85.780%, and

90.585% on MR, SST2 and IMDb datasets, respectively.

Fuzzy logic is proposed to handle uncertainty and vague-

ness data. The strength of fuzzy logic is its resemblance to

human reasoning and natural language. The use of fuzzy

logic in the SA gives good classiﬁcation accuracy in sev-

eral works from the literature. Therefore, the effectiveness

of fuzzy logic in the sentiment classiﬁcation area is based

on the methodology adopted in the classiﬁcation problem

by popular fuzzy logic systems. The adopted methodology

deems a classiﬁcation issue to be a ‘degree of grey’ issue

rather than a ‘black and white’ issue [40], which is similar

to the sentiment analysis problem, because the sentiment

analysis is considered a ‘degree of grey’ problem. For that,

we found various works in the literature using the fuzzy logic

systems for resolving sentiment classiﬁcation problems.

Wu et al. [41] proposed a fuzzy logic-based method for

SA. Their suggested model is summarized in ﬁve stages,

which are the phase in which the data is collected, manually

data labelling and examination, text pre-preprocessing, The

four-phase that severs to extract the essential features, and the

ﬁnal step is the application of the fuzzy classiﬁer. The fuzzy

classiﬁer consists of three parts, which are the fuzziﬁcation

process, IF-THEN fuzzy rules, and the defuzziﬁcation pro-

cess. The authors of this work have compared their proposed

17946 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

approach with the keyword search approach. The experimen-

tal result demonstrates that their fuzzy logic-based method

achieved good performance compared to the keyword search

techniques in terms of correctness rate and extracted tweets.

Biltawi et al. [3] suggested a lexicon-based method to com-

pute the emotional score of Arabic text employing a fuzzy

logic theory. The suggested method be composed of two pri-

mary parts. Firstly, Arabic sentences are given weight value.

Secondly, the fuzzy logic classiﬁer is used to determine the

sentiment score to the classiﬁed Arabic review. The proposed

model has experimented on a massive dataset that contains

reviews about Arabic book, and which is a sentiment analysis

dataset that contains over 63000 book reviews (8224 negative

reviews, 12201 neutral reviews, and 42831 positive reviews).

The experimental results displayed 84.04%, 94.87%, 89.13%,

and 80.59% for precision, recall, F1-measure, and accuracy,

respectively.

In [42], the authors have developed a novel fuzzy

rule-based model for SA problems. The novelty of their

contributions are i) the suggested method is unsupervised

and can be applied to any dataset and to any dictionary.

ii) the creation of nine fuzzy rules to classify the sentiment

sentences. They have implemented their suggested approach

employing three different dictionaries: AFINN, VADER, and

SentiWordNet. The proposed method was tested on nine twit-

ter datasets. The experimental result showed that the approach

which employs the VADER dictionary takes the least time in

execution than the process which employ the SentiWordNet

dictionary. For the recall and precision measures, the methods

which use the VADER or AFINN dictionary achieved better

performance compared to the SentiWordNet dictionary.

Abdul-Jaleel et al. [43] introduced a new model to solve

the SA issue. This model combines a genetic algorithm with

fuzzy logic theory. The incomes to this suggested classi-

ﬁer are a collection of extracted features from a sentiment

sentence, and the outcome of this classiﬁcation model is

the classiﬁcation’ decision for classiﬁed sentiment sentence.

They compared their proposed classiﬁcation system with the

keyword search method by computing both accuracy and

incremental rates. In terms of the accuracy, the introduced

model performed better than this method, where the accuracy

of the proposed system is equal to (98.75%), but the accuracy

of this method is equal to (95.7%). For the incremental rate,

the suggested approach is capable of extracting sentiment

sentences more than the keyword search approach.

Wang et al. [44] developed a new model to improve the

training speed, modelling efﬁciency and robustness of the

Deep belief network. Their proposed model incorporates

Fuzzy Neural Network and Sparse Deep Belief Network.

A Fuzzy Neural Network is implemented for supervised mod-

elling in order to reduce the gradient spread. The sparse deep

belief network is deemed a pre-training model to achieve

weight parameter-initialisation fast and get feature vectors.

The empirical outcomes reveal that their proposed model

gives good performance compared to other existing tech-

niques in the literature.

Motivated by the strengths of DL and fuzzy logic tech-

niques in the SA ﬁeld, in the present work, we develop a new

hybrid fuzzy-deep learning approach, that basically integrates

the CNN, FFNN deep learning networks with the MFS fuzzy

logic system. In the next section, I will brieﬂy explain the

concept of CNN, FFNN and MFS.

III. BASIC CONCEPTS

In this section, ﬁrstly, we introduce the basic notion of the

CNN, especially CNN, with one convolution layer [45],

which is recognized as a simpliﬁed version. Subsequently,

the architecture of the FNN is described in detail [46]. Finally,

a general overview of the MFS [47] is introduced in brief for

further accurate SA.

A. CONVOLUTIONAL NEURAL NETWORK

The CNN structure was ﬁrst suggested in 1988 by

Fukushima [48], which is one of the most common and effec-

tive deep convolutional learning networks. Fukushima has

designed the architecture of CNN based on the conventional

LeNet approach. In the past, CNN is only used for hand-

writing recognition and image recognition. At present, this

network architecture is also used for text classiﬁcation tasks

(include sentiment analysis). Therefore, The use of CNN in

all artiﬁcial neural networks presented previously achieved

good results in terms of classiﬁcation rate and execution time

accordingly to multiple works from the literature. The secret

behind these great successes is the structure of CNN, which is

designed to become similar to the cat’s visual cortex. Indeed,

the cat’s visual cortex is composed of a complicated arrange-

ment of neurons. These neurons are responsible for covering

small sub-areas of the visual area, named the receptive area.

Then, the receptive areas are tiled to detect the overall visual

area [49]. Hence, receptive areas are deemed as ﬁlters in

the CNN deep learning model. In summary, the main aim

behind CNNs is to innovate a solution for diminishing the

total number of parameters and constructing a deeper neural

network with fewer parameters. Fig. 2 depicts the overall

structure of CNNs, which comprises of three fundamental

layers: convolution layer, pooling layer, and fully connected

layer.

FIGURE 2. Overall structure of the CNN.

As we said previously and as clariﬁed in Fig. 2 CNN

architecture consists of three principal layers, unlike classical

artiﬁcial neural networks. These layers are the Convolution

VOLUME 9, 2021 17947

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

layer, Pooling layer, and Fully connected layer as described

below:

1) CONVOLUTION LAYER

is the essential block in CNN and is always the premier layer

in the overall structure of CNN. The major target of this layer

is to detect and capture the features from an obtained matrix

by applying one of the word embedding methods on the

given input sentence. The convolution layer uses a slid ﬁlter

over the embedding matrix and produces a convolved feature.

Multiple ﬁlters are applied over the embedding matrix to

obtain multiple features maps. These obtained feature maps

are activated (that is to say transform the linear feature

maps to non-linear feature maps) using the Rectiﬁed Linear

Unit (ReLU) activation function in the intermediate task

that linked convolution layer and the pooling layer. Finally,

the obtained non-linear feature maps are passed to the pooling

layer. In summary, the ReLU is the most popular activation

function used with the convolution layer. It merely calculates

using the following formula (1) :

f(y)=max(0;y) (1)

In substance, the activation function ReLU outputs 0 if it

gets a negative value as input, and if it gets a positive value

as input, the ReLU will output the same positive value [50]

as shown in Fig. 3.

FIGURE 3. Graphic representation of the ReLU activation function.

The advantages of the ReLU function are its ability to

overcome the vanishing gradient issue, its convergence is

faster due to its simple math formula, and its execution time is

relatively short unlike other activation methods such as tanh

or sigmoid.

2) POOLING LAYER

After convolving the embedding matrix with multiple ﬁlters

in the ﬁrst stage (convolution layer), the second phase is the

application of the pooling layer to reduce the dimensionality

of obtained feature maps in the ﬁrst step. Thence the total

number of CNN parameters is diminished; the computational

cost is decreased, and the overﬁtting problem is restrained.

Two popular pooling functions are average and max pool-

ing operations. The average-pooling method determines the

pooling feature as the average of all values in the convolved

feature map. The max-pooling method selects the maximum

FIGURE 4. Illustration of average-pooling and max-pooling functions.

element in the convolved feature map as a pooling feature

and discards the rest. These both operations are described

in Fig. 4. In this work, we have applied the max-pooling

method. Generally, the pooling layer transforms the con-

volved feature maps to a single column, which is further

passed to a fully connected layer [51].

3) FULLY CONNECTED LAYER

is also termed a dense layer, which used in this work to cal-

culate the sentimental scores of each input sentence (PSS and

NSS) from the obtained single column from a pooling layer

in the previous phase. We can summarize its functionality as

a linear process in which each input is linked to all output by

different weight [51]. The fully connected layer calculates the

sentimental values using the equation (2) as follows:

Sv =f(Wm ∗Cp +Bi) (2)

where Sv is the calculated sentimental value, fis softmax

activation method, Wm is the used weight matrix, Cp is the

single column obtained from the pooling layer, and Bi is the

bias.

FIGURE 5. Graphic representation of the softmax activation function.

The softmax activation function (or a normalized exponen-

tial function) is applied in the intermediate learning process

between the fully connected and output layers. This activation

function converts the received numerical values from the fully

connected layer to probable values, which are in the interval

[0,1], and the sum of these probable values be equal to 1,

as represented in Fig. 5.

17948 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

Here, in this research work, we applied the softmax func-

tion into the received vector of z real values through the last

hidden layer of FFNN to calculate two values: positive senti-

mental and negative sentimental scores. A softmax activation

function is denoted as in equation (3):

f(y)=ey

i

PK

k=1ey

i

(3)

where fis softmax activation method, yis the input value, ey

i

is the standard exponential function of the input value, and K

is the number of classes in the given dataset.

B. FEEDFORWARD NEURAL NETWORK

FFNN is an artiﬁcial intelligence model, which is broadly

used in several applications due to its capability and com-

petence to act like the human brain. The simple version of

FFNN includes three layers which are: input layer, hidden

layer, and an output layer. The other versions of FFNN differ

in the number of hidden layers. FFNN with two hidden layers

is illustrated in the Fig. 6. In this artiﬁcial neural network,

the data is only transferred in one direction, from the input

neurons to the hidden neurons and from the hidden neurons

to the output neurons nodes. An FFNN can decrease the

error for non-linear input nodes and possesses the capability

to discover the relationship that connected input nodes and

the output nodes without using any complex mathematical

theories. The easy application of FFNNs and its ﬂexibility are

outstanding advantages compared to other neural networks.

FIGURE 6. The overall architecture of the feedforward neural network.

As Fig. 6 illustrated, the FFNN contains four layers, which

are: the input layer, two hidden layers, and the output layer.

The former layer of the FFNN is the input layer. It is utilized

to provide the input text/image data to the FFNN. This input

layer is to be followed by both hidden layers. These hidden

layers are utilized to augment the non-linearity and modify

the representation of the received data from the input layer

for good generalization over the applied activation function.

The most widely applied activation method on the hidden

layer is the ReLU. The last hidden layer is to be followed

by the output layer, which is the latter layer in the FFNN,

which outputs the class label predictions. In our work, this

layer produces positive and negative sentimental scores of

the input sentence. The activation function to be applied in

this output layer is different for different issues. In the binary

classiﬁcation issue, we used the sigmoid activation function

because we need the output of the layer to be either 0 or 1.

For a multiclass classiﬁcation issue, we applied the softmax

activation function. In our work, we want the outcomes of the

output layer to be in the interval [0,1]; thus, we have applied

the softmax activation function. Generally, each hidden layer

in the FFNN contains many nodes called neurons. Each neu-

ron node is related to the input layer and the output layer

through the connectors. And every link has multiple weights,

which are modiﬁable. Therefore, the operations carried out

at the level of each neuron are described by the schematic

diagram shown in Fig. 7.

FIGURE 7. An illustration graphic of a neuron depicting the inputs

(y1−yn) which are the neurons of the former layer, their corresponding

weights (w1−Wn), a bias (b) and the f is the applied activation

procedure on the weighted sum of the inputs.

C. MAMDANI FUZZY SYSTEM

MFS is one of the most popular Fuzzy Inference Systems

(FISs), which are also called fuzzy rule-based systems. The

essential objective behind these kinds of systems is the

decision-making process. FISs are considered as a novel

version of Classical Rule-Based Models (CRBMs). Further-

more, in FIS, the variable’s value takes a numerical value

between 0 and 1. In contrast, the variable’s value takes

either ‘0’ or ‘1’ in CRBM. Therefore, the FISs serve to

convert a black and white decision-making issue into a

grey decision-making issue. MFS consists of a fuzziﬁca-

tion process, a knowledge base (rule base+database) unit,

a decision-making process, and ﬁnally a defuzziﬁcation

process.

1) FUZZIFICATION PROCESS

which converts the crisp set into the fuzzy set using Triangu-

lar, Trapezoidal, or Gaussian MF. In other words, each value

in the crisp set is transformed into linguistic value.

2) KNOWLEDGE BASE UNIT

The knowledge base has consisted of two parts; which are

a rule base and a database, wherein the rule base includes a

collection of IF-THEN fuzzy rules, and the database includes

VOLUME 9, 2021 17949

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

the parameters information about the employed membership

function and a simple deﬁnition of the fuzzy set.

3) DECISION-MAKING UNIT OR INFERENCE ENGINE

it carries out the inference steps on the IF-THEN fuzzy rules

stocked in the rule base and obtains a fuzzy output.

4) DEFUZZIFICATION PROCESS

which transforms the fuzzy outputs of the inference mech-

anism into a crisp output using one of these methods

Max-membership procedure, Weighted average approach,

Centroid technique, Mean-max membership, First of maxima

or last of maxima method, Centre of largest area, and Cen-

tre of sums. The overall architecture of a FIS is illustrated

in Fig. 8.

FIGURE 8. The components of the fuzzy inference systems.

In the literature, there are three well-known FISs, which

are Mamdani, Sugeno, and Tsukamoto. Both Tsukamoto and

Sugeno systems are applied in the case of regression issues,

unlike the Mamdani, which is utilized in the case of system

classiﬁcation problems [28]. The major distinction between

the MFS and the Sugeno fuzzy system lies in the manner that

each system deﬁnes the consequent block of its fuzzy If-Then

rules. Mamdani model employs fuzzy sets as a consequent

block of the fuzzy If-Then rule. At the same time, the Sugeno

model employs a linear equation as a consequent block of the

fuzzy If-Then rule. Basically, the primary goal of our work is

the resolution of a sentiment classiﬁcation problem; for that,

we have used the Mamdani fuzzy method as a fuzzy classiﬁer.

The foundation of the MFS is introduced as crisp output

elements which are deduced from crisp input elements using

a collection of fuzzy If-Then rules stocked in the fuzzy rule

base and passing through the fuzziﬁcation and defuzziﬁca-

tion processes; Therefore, in this work, to calculate the crisp

output (class label of the sentiment sentence) of this MFS

giving consideration to the crisp inputs, we have followed six

steps as described below:

1 Deﬁning a set of If-Then fuzzy rules.

2 Fuzzifying the crisp input variables by applying one of

the membership functions, which are triangular, trape-

zoidal, or gaussian membership function.

3 Integrating the fuzziﬁed input variables based on the

fuzzy If-Then rules in order to create a If-Then rule

strength.

4 Determining the consequence of the rule by integrating

the outcome of the applied membership function and the

created rule strength in the previous step.

5 Integrating all consequences obtained in step 4 to

acquire an output distribution.

6 Applying the defuzziﬁcation function on the output dis-

tribution to get the crisp output.

D. PERFORMANCE METRICS

To evaluate the text classiﬁcation process, we mainly cal-

culate ten performance metrics: True Positive Rate (TPR),

True Negative Rate (TNR) or Speciﬁcity, False Positive Rate

(FPR), False Negative Rate (FNR), Error Rate (ER), Pre-

cision (PR), Classiﬁcation Rate or Accuracy (AC), Kappa

Statistic (KS), F1-score (FS) and Time Consumption (TC).

These performance metrics are calculated using the confu-

sion matrix for binary or multi-class classiﬁcation as given

in Fig. 9 and 10.

FIGURE 9. Confusion matrix for a binary classification issue.

The abbreviations False Negative (FN), True Positive (TP),

True Negative (TN), and False Positive (FP) in the simple

confusion matrix for binary classiﬁcation in Fig. 10 are

deﬁned as follows [9], [28]:

.True Positive (TP): Number of instances that are

actually positive and predicted to be positive

.False Negative (FN): Number of instances that are

actually positive and predicted to be negative

.True Negative (TN): Number of instances that are

actually negative and predicted to be negative

.False Positive (FP): Number of instances that are

actually negative and predicted to be positive

Recall, Speciﬁcity, False Positive Rate, False Negative

Rate, Error Rate, Precision Rate, Accuracy, Kappa Statistic

and F1-score evaluation metrics are calculated in the case

of binary classiﬁcation using the confusion matrix described

in Fig. 10 as follows:

Recall measures the performance and efﬁciency of a clas-

siﬁer to predict the number of instances that have a positive

class label. This metric is calculated by using (4), where tp is

the number of predicted positive instances, and tp+fn is the

total number of positive instances in the dataset.

Recall =tp

tp +fn (4)

Speciﬁcity measures how a classiﬁer is an efﬁcacy to iden-

tify the number of instances that have negative class labels.

This metric is computed by using (5). Where tn corresponds

17950 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 10. Confusion matrix for a multi-class classification issue.

to the number of samples that have negative class labels, and

tn+fp is the total number of instances that are negative class

labels in the used dataset.

Speciﬁcity =tn

tn +fp (5)

False Positive Rate is the rate to detect the inefﬁciency

and ineffectiveness of a classiﬁer and to measure the mis-

classiﬁcation rate by calculating the number of instances,

which are actually negatives but the classiﬁer predicted it to

be positives. False Positive Rate is computed by using (6).

Where fp corresponds to the number of instances which are

actually negatives but the classiﬁer classiﬁed it as positives,

and fp+tn is the total number of negative instances.

FalsePositiveRate =fp

fp +tn (6)

False Negative Rate is the rate to detect the inefﬁciency

and ineffectiveness of a classiﬁer and to measure the mis-

classiﬁcation rate by calculating the number of instances,

which are actually positives but the classiﬁer predicted it to be

negatives. This evaluation metric is calculated by using (7).

Where fn corresponds to the number of instances which are

actually positives but the classiﬁer classiﬁed it as negatives,

and fp+tn is the total number of positive instances.

FalseNegativeRate =fn

fn +tp (7)

Error rate metric serves to measure the misclassiﬁcation

rate, that is to say, this metric computes the number of mis-

classiﬁcation instances over all instances in the used dataset.

Basically, its objective is to measure the classiﬁer’s ability to

restrain false classiﬁcation. The error rate is deﬁned as (8)

presents. Where fp+fn corresponds to the total number of

incorrectly classiﬁed instances, and tp+fn +tn+fp is the total

number of instances in the given dataset.

Error =fp +fn

tp +fn +tn +fp (8)

Precision performance metric measures how many samples

retrieved as positive class labels are, in fact, positives. The

precision rate is useful for assessing brittle classiﬁers, which

are applied to classify all instances of the used dataset. This

evaluation metric is determined as (9) describes. Where tp

corresponds to the number of instances which are actually

positives and the classiﬁer classiﬁed it as positives, and tp+fp

is the total number of instances that are predicted as positive.

Precision =tp

tp +fp (9)

Accuracy rate is an overall metric for estimating the effec-

tiveness, and correctness of learning classiﬁers. The accuracy

is computed utilizing a test set that is detached from the

training set. This rate is measured using equation (10). Where

tp+tn is all true classiﬁer examples and tp+fn +tn+fp is the

overall instances in the used dataset.

Accuracy =tp +tn

tp +fn +tn +fp (10)

F1-Score or F-measure is the harmonic mean between pre-

cision rate and recall rate gives a good idea about the average.

The value of F1-Score is ranged from 0 to 1. it measures

how the used classiﬁer is accurate and robust. If F1-Score

increases the performance of the used classiﬁer will be better.

In other words, to get an extremely accurate performance,

the precision must be higher, and the recall must be lower.

This metric is calculated by using (11). Where Precision is

computed using the formula (9), and Recall is calculated

using the formula (4).

F1−score =2∗Precision ∗Recall

Precision +Recall (11)

Kappa statistic is a performance criterion that compares

an observed accuracy and an expected accuracy (random

chance). It is used not only to estimate one classiﬁer but also

to inspect classiﬁers amongst themselves. The kappa statistic

is computed by utilizing (12).

Kappa-Statistic =P0−Pe

1−Pe

(12)

where: P0=tp+tn

100 ; and Pe=[tp+fn

100 ∗tp+fp

100 ]+[fp+tn

100 ∗fn+tn

100 ].

Recall, Speciﬁcity, False Positive Rate, False Negative

Rate, Error Rate, Precision Rate, Accuracy, Kappa Statis-

tic and F1-score performance metrics are computed in the

case of multi-class classiﬁcation using the confusion matrix

illustrated in Fig. 10. The ﬁrst step to calculate these met-

rics is to compute the measurements TN, FN, TP, FP,

as described in Fig. 9 for each class in the multi-class

confusion matrix. For example, if we take the class Pos-

itive the values of these measurements are determined as

follows: TP=5; TN=(4+1+6+8)=19; FN=(7+3)=10;

FP=(2+9)=11. In the case of Negative class these met-

rics will be TP=4; TN=(5+9+3+8) =25; FN=(2+6)=8;

FP=(1+7)=8. and in the case of Neural class these mea-

surements are computed as follows: TP=8; TN=(5+2+

4+7)=18; FN=(3+6)=9; FP=(1+9)=10 after the calcu-

lation of these measurements, we compute the previously

evaluation metrics as follows.

VOLUME 9, 2021 17951

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

Computation of the recall for multi-class classiﬁcation is

made according to (13). Where tpiis the number of predicted

instances are labeled class i,iis the index of the class, lis the

total number of class labels, and tpi+fniis the total number

of instances labeled class label iin the given dataset.

Recall =Pl

i=1tpi

tpi+fni

l(13)

Speciﬁcity metric is measured for multi-class classiﬁca-

tion using (14). Where tniis the number of predicted instances

which are not labeled class i,iis the index of the class, lis the

total number of class labels, and tni+fpiis the total number

of instances are not labeled class iin the given dataset.

Speciﬁcity =Pl

i=1tni

tni+fpi

l(14)

False positive rate measure is calculated as described

in (15). Where fpiis the number of instances which are not

actually labeled class ibut the classiﬁer predicted it to be class

label i,iis the index of the class, lis the total number of class

labels, and tni+fpiis the total number of instances are not

labeled class iin the used dataset.

FalsePositiveRate =Pl

i=1fpi

fpi+tni

l(15)

False negative rate evaluation criterion is computed by

using (16). Where fniis the number of instances which are

actually labeled class ibut the classiﬁer predicted it not to be

class label i,iis the index of the class, lis the total number of

class labels, and fni+tpiis the total number of instances are

actually labeled class iin the used dataset.

FalseNegativeRate =Pl

i=1fni

fni+tpi

l(16)

Error rate for multi-class classiﬁcation is measured by

employing (17). Where fpi+fniis the number of all instances

which are predicted incorrectly,iis the index of the class, lis

the total number of class labels, and fni+tpi+fpi+tniis the

total number of instances in the given dataset.

Error =Pl

i=1fpi+fni

tpi+fni+tni+fpi

l(17)

Precision measure is calculated in the case of multi-class

classiﬁcation as illustrated in (18). Where tpiis the number of

instances which are actually labeled the class iand predicted

by the used classiﬁer correctly,iis the index of the class, l

is the total number of class labels, and tpi+fpiis the total

number of instances labeled the class iin the given dataset.

Precision =Pl

i=1tpi

tpi+fpi

l(18)

The accuracy measure in the case of multi-class classiﬁ-

cation is calculated according to (19). Where tpi+tniis the

number of all instances which are predicted correctly,iis the

index of the class, lis the total number of class labels, and

fni+tpi+fpi+tniis the total number of instances in the

given dataset.

Accuracy =Pl

i=1tpi+tni

tpi+fni+tni+fpi

l(19)

Another metric, F1-Score, is employed to integrate the

precision and recall rates in a single measure. The value of

this metric is ranged from 0 to 1 as we present previously,

and if the evaluated classiﬁer properly classiﬁes all instances,

this metric will take the value 1. The F1-Score is computed by

applying (20) for multi-class classiﬁcation. Where Precisioni

is computed using the formula (18), and Recalliis calculated

using the formula (13).

F1−score =Pl

i=12∗Precisioni∗Recalli

Precisioni+Recalli

l(20)

Kappa statistic is computed utilizing equation (21)in the

case of multi-class classiﬁcation.

Kappa-Statistic =Pl

i=1P0i−Pei

1−Pei

l(21)

where: P0=tp+tn

100 , and Pe=[tp+fn

100 ∗tp+fp

100 ]+[fp+tn

100 ∗fn+tn

100 ].

IV. METHODOLOGY OF OUR PROPOSED APPROACH

In the subsequent sections, we will discuss the motivations

that pushed us to develop this work proposal. The basic archi-

tecture of our suggested hybrid model is composed of the

data collection phase, text pre-processing steps for reducing

the noisy data, word embedding methods for transforming the

text-based data into numerical-based data, CNN for extract-

ing the features automatically, FFNN for calculating both PSS

and NSS values, and MFS for classifying their input into

negative or positive or neutral class.

A. MOTIVATION

As mentioned in the introduction section, our proposal

endeavors to improve SA effectiveness. The primary goal

of this contribution is to classify each sentence in the used

dataset into a positive or negative or neutral class with

high-performance and efﬁciency in terms of ten evaluation

criteria, which are presented in the previous section, and also

in terms of convergence, stability, and complexity.

In the literature, multiple categories of approaches have

been applied to perform the sentiment classiﬁcation. Among

these techniques, we ﬁnd DL models, ML methods, and

dictionary-based procedures. The performance of ML and

dictionary-based approaches is lower than DL models if we

applied them on the enormous dataset. The studies demon-

strate that classical ML methods and dictionary-based tech-

niques are better for a lesser size of data. As the volume of

data rises beyond a particular number, the accuracy of classi-

cal ML algorithms becomes constant. In contrast, the accu-

racy of DL algorithms raises concerning the raise in data

size. This difference in performance is due to the used man-

ner for extracting the features from the dataset. Conven-

tional ML techniques and dictionary-based approaches adopt

17952 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

hand-crafted engineering features by applying feature extrac-

tor approaches. They then apply the learning algorithms,

which extract incomplete features and take high time to pro-

duce the ﬁnal result. Unlike DL models that adopt automatic

features extractor.

According to these studies that showed DL techniques’

strengths on the massive dataset, in this work, we have been

applied CNN deep learning model as an automatic feature

extractor. Because CNN possesses the higher power to detect

and extract relevant features at different local levels identical

to a human brain and compared to the conventional deep

learning models that cannot do the feature learning, another

advantage of CNN is weight sharing, which makes CNN more

accurate and efﬁcient in terms of complexity and used mem-

ory than traditional neural networks. Also, the CNN is char-

acterized by the optimized structure for handling text/image,

the ability to extract the abstraction features, the absorption

of shape variations by the application of the pooling layer,

The number of parameters is fewer, and the lower diminishing

gradient rate compared to the conventional neural network.

We also used FFNN to employ the outputs of CNN and

calculates two sentimental values: PSS and NSS sentimen-

tal scores. The PSS presents the percentage of the existing

positive opinion words in the given sentimental sentence,

and the NSS presents the percentage of existing negative

opinion words in the same sentence. Basically, FFNN is one

of the most effective neural networks presently preferred in

regression and classiﬁcation tasks. FFNN be composed of an

input fully connected layer, one or more hidden layers, and

an output layer. Consequently, the changed number of hidden

layers aids in processing more and more complicated func-

tions. As presented previously in the basic concepts section,

this type of network can process data by itself and generates

outputs that are not restricted to the input variables provided

to it.

Furthermore, it can carry out multiple operations in a par-

allel manner without any negative inﬂuences on the network

model performance. Also, dropout regularization is applied at

the level of the fully connected layer to overcome the network

overﬁtting and enhance the generalization error. In summary,

we have incorporated CNN and FFNN as the third step of our

work to handle the collected unstructured data from social

media networks and compute both values PSS and NSS as

outputs of our deep learning model (CNN+FFNN).

As a result of social-media data holding considerable noise

and unpredictable vagueness, the notion of ambiguity and

uncertainty data elicits the attention of many researchers.

Such vagueness assesses a big challenge on the capability to

implement and to classify social-media data. First, the ability

to symbolize input social-media data is restricted as variables

react uncertainly. Second, CNN and FFNN deep learning

models are not always powerful when training social-media

data are irritated by the noise. The fuzzy logic theory has been

applied to overcome deep learning shortcomings and enhance

sentiment classiﬁcation performance. Compared with classi-

cal logic representations, fuzzy logic representation builds a

set of IF-THEN fuzzy rules for eliminating the uncertainties

in social-media data and achieves higher accuracy in both

data symbolization and hardiness for handling the noisy data.

Motivated by the fuzzy logic theory’s strengths, in the fourth

step of our work, we have been used MFS as a fuzzy classiﬁer.

Simultaneously, the input variables of MFS are the PSS and

NSS values, and the output variable is the class label (Posi-

tive, Neutral, Negative).

As a short conclusion, this work’s essence is to increase the

classiﬁcation effectiveness of sentiment analysis by integrat-

ing the power of MFS to deal with uncertainty and vagueness

data and the power of both deep learning models CNN and

FFNN to detect and capture the features automatically from

the given dataset. As depicted in Fig. 11, the overall structure

of our developed hybrid model consists of six phases, which

are Data collection, Data pre-processing, Word embeddings,

CNN, FFNN, and MFS.

B. PHASE I: DATA COLLECTION

In this paper, we have been chosen two datasets to prove

the performance of our develop FDLC approach. The ﬁrst

dataset called sentiment140 dataset. Which is extracted using

twitter application programming interfaces (API). It consists

of 1,600,000 tweets in which the emoticons were removed.

The tweets have been labeled into two class negative and

positive, where (0 =negative, and 4 =positive). It includes

the following six attributes:

•Target: is the sentimental score of the tweet

(4 =positive, 0 =negative)

•Ids: is the identiﬁer of the tweet (1467110309)

•Date: is the date when the tweet is posted (thr Mar 06

21:18:55 PDT 2007)

•Flag: represent The query (text of the query). If there

is no query, then this attribute takes the NO_QUERY’

value.

•User: is the username that tweeted (LionsLamb)

•Text: is the text posted by the user with the name

LionsLamb (He’s the reason for the teardrops on my

guitar, the only one who has enough of me to break my

heart)

In this work, we are interested in sentiment analysis. That

is to say, extract the sentiment expressed by the author in

the tweeted text. Thence the other attributes have not any

inﬂuence on the learning process. For that, we removed the

Ids, User, Flag, and Date attributes from the dataset. And

we kept the Text and Target attributes. The Target distri-

bution of the data in this dataset is balanced distribution,

such as 50% of the tweets are labeled negative, which are

ranged from 0 to 799999th index, and another 50% of the

tweets are labeled positive, which are ranged from 800000 to

1 600 000th index. The dataset is split into testing and training

subsets. Consequently, we have been used these subsets to

prove the classiﬁcation performance of our designed FDLC

compared to other proposed methods which are selected from

the literature. Fig. 12 introduces the number of tweets in every

subset. Where a total of 1,440,000 tweets were utilized in the

VOLUME 9, 2021 17953

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 11. Architecture of our proposed approach.

training learning process, and 160,000 tweets were utilized in

the testing learning process.

The second dataset, called COVID-19_Sentiments,

it is also extracted using the twitter API. It consists

of 637978 tweets. The tweets have been labeled into three

class negative [−1,0[, neutral =0, and positive ]0,1] [52].

it contains the following attributes:

•Target: the sentimental score of the tweet (negative

[−1,0[, neutral =0, and positive ]0,1]) [52].

•Ids: The identiﬁer of the tweet (1241032866567350000)

•Date: the date when the tweet is posted (Sun May 31

04:52:40 +0000 2020)

•Location: The location where the tweet is posted

(Ahmadabad City, India)

•Text: the text posted by the users.

The important attributes in our work are the text and

sentimental score attributes. For that, we removed all other

attributes. In addition, the Target distribution of the data in

this dataset is an unbalanced distribution with 259458 neutral

tweets, 120646 negative tweets, and 257874 positive tweets.

Fig. 13 depicts the number of tweets in both testing and train-

ing subset. A total number of 574,182 tweets were employed

in the training learning process, and 63,796 tweets were used

in the testing learning process. In other words, the testing

dataset represents 10% of the overall dataset.

C. PHASE II: DATA PRE-PROCESSING

Pre-processing tasks are deemed to be the ﬁrst phase in

the text classiﬁcation task, and picking out the right, and

17954 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 12. Number of negative and positive tweets for the

sentiment140 dataset.

FIGURE 13. Number of negative, neutral, and positive tweets for the

COVID-19_Sentiments dataset.

effective pre-processing techniques can enhance classiﬁca-

tion performance. The primary goal of the text pre-processing

procedure is preparing, normalizing, removing, and cleaning

the noisy data from the given dataset that is going to be

classiﬁed. The noisy data is the data without any valuable

information for sentiment classiﬁcation. The pre-processing

technique transforms the noisy data from high dimensional

attributes to the low dimensional features to get as much

accurate useful data as possible from the given dataset. The

text pre-processing phase can consist of multiple techniques

depending on the text-classiﬁcation issue and the situation.

In this work, our classiﬁcation issue is the sentiment clas-

siﬁcation of data collected from twitter. Twitter allows its

users to post only messages with 140 characters. Due to this

restricted rule, Twitter users have been employed the slang,

abbreviations, exclamation marks, links, repetitions, punctu-

ation signs to assert their attitudes and emotions in a short

tweet. In addition, Twitter users are vulnerable to spelling

and typographical mistakes. It is not essential to include all

expressions of the tweet in the learning process in our work,

and multiple of them should be deleted, normalized, cleansed,

or replaced with others. Thus, it emerges the need to apply the

pre-processing techniques to the given data. Their free of the

noise is a crucial factor to increase the sentiment classiﬁcation

effectiveness. The followed up pre-processing methods in this

work are described below:

Remove number, URLs, hashtags, and username: It is

a popular tactic to eliminate numbers, URLs, hashtags, and

usernames from the pre-processing sentence because they do

not hold any emotions.

Eliminate punctuation,white-spaces, and special char-

acters: The ﬁrst step to do is removed all existed white-spaces

in the tweet, followed up by removing the three punctua-

tion, which are the stop, question, and exclamation marks.

All found special characters are removed because they do

not have any positive or negative impact on expressed sen-

timent. after all these pre-processing techniques presented

previously, we kept only the lowercase and uppercase letters.

Lower-casing: From the previously described steps, all

special characters other than letters have been deleted. So the

next step is the lower casing. In other words, all the letters

kept in the tweet were transformed to lower case, which

reduce the dimensionality of words.

Replace elongated words: This operation serves to

remove the letter, which is repeating at least more than three

times in the elongated word like the word ‘‘haaaaaaappy’’.

after applying this operation, the word becomes ‘‘haappy’’

and normalized with at most two characters.

Remove stop-words: Stop-words are the words with high

occurrences in the posted tweet. They are removed because

they do not hold any emotions, and it is deemed needless to

handle them. Therefore in our work, all found stops-words

in the tweet are removed based on the stop-words list deter-

mined by the NLTK package in Python.

Correct contractions: One tactic that can be employed in

the pre-processing procedure is the correction of contractions.

For example, the words like ’isn’t’ its corrected word will be

’is not’, and ’weren’t’ its corrected word will be ’were not’.

Handle effect negations: This approach replaces the word

preceded by NOT by its antonym. The antonym means the

opposite meaning of the replaced word. The process of this

approach serves to search in each tweet the word preceded by

NOT, then to check if this word has an antonym in WordNet

dictionary if the case, it replaces this word with its unambigu-

ous antonym. For example, it replaces the word ‘‘not uglify’’

with ‘‘beautify’’.

Stemming: is the operation of reducing the size of words

by merging several words into one. This approach deletes the

endings of words to discover their word stem in a dictionary.

In this work, we used the Porter Stemmer of NLTK package

in Python.

Lemmatization: has the same role as Stemming. It is also

another approach that serves to determine the root forms

of words. The difference between both procedures is in the

followed process to detect the stem or lemma words.

Tokenization: is a process that splits sentences into words

called tokens. In its process, larger paragraphs of analysis data

can be split into sentences. Then these sentences obtained

can also be split into tokens. In this work, we used an NLTK

tokenizer provided by Python.

VOLUME 9, 2021 17955

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

All presented techniques previously are applied to the

used dataset in this work. Besides, we construct a lookup

table with 3000 words and phrases containing the words,

abbreviations, and slang words to replace the abbreviation

and slang words in the currently processed tweet with correct

words. For example, we ﬁnd these slang and abbreviation

words ‘‘ab/abt’’, ‘‘B’’ and ‘‘B4’’, which respectively denote

and replace by ‘‘about’’, ‘‘be’’, and ‘‘before’’. In the twit-

ter platform, users are prone to spelling and typographical

mistakes that might make the learning procedure harder.

Therefore, for improving the learning process effectiveness,

we used Norvig’s spelling and typographical corrector, which

automatically corrects them.

We carried out multiple experiments to demonstrates

the effectiveness of the applied pre-processing techniques

on our dataset. From the experimental results, as shown

in Tables 1 and 2, we deduced that the pre-processing tech-

niques decrease the error rate. Where the error rate in the

Sentiment140 dataset decrease from 35.59% to 5.98%, and it

decrease from 29.04% to 3.61% in the COVID-19 Sentiments

dataset. Therefore, it is recommended to use pre-processing

techniques.

TABLE 1. Error Rate (ER%) without pre-processing techniques.

TABLE 2. Error Rate (ER%) with pre-processing techniques.

After the pre-processing step that serves to remove noisy

data from the used dataset, The next step is word embeddings,

as described in the following subsection. In other words, then

consequently, data from the application of all pre-processing

techniques will be the input of word embedding methods.

D. PHASE III: WORD EMBEDDINGS

CNN deep learning model can only process the numerical

data. Therefore, to make our proposal deals with text-based

data obtained after the pre-processing data phase using deep

learning models, these text-based data must be transformed

into numerical-based data. This operation is called vectoriza-

tion, which is one of the critical issues in NLP. Approaches

such as word2vec, glove [15], tf-idf [16], bag-of-words [17],

fast-text, n-gram or character-level [18] are the major of word

vectorization techniques. The most effective methods in the

case of the larger datasets are GloVe, Word2vec, and Fast-

text, which are introduced by Stanford, Google, and Face-

book respectively [53], [54]. Therefore–in this work–after

the pre-processing data phase, the next stage is the word

embeddings data using Word2vec and GloVe and Fast-text

techniques. This section presents these three different kinds

of word embeddings methods in detail.

1) GloVe

Global Vectors for Word Vectorization or GlobalVectors

(GloVe) was introduced by Jeffrey Pennington et al. [54], and

was supported by Stanford University. This learning model

is an unsupervised algorithm. Its objective is computing the

vector representation for distributed words. This operation

is made by ﬁnding the semantic similarity between words,

then generating the word-word co-occurrence count matrix.

e.g., how frequently these words seem together in the cor-

pus. For that, the GloVe was named the count-based model.

Word embedding of this model is obtained by aggregating

the created co-occurrence count matrix from a corpus, and

the resulting word embeddings show for each word in vec-

tor space important linear substructures. In summary, This

model integrates both methods, which are the local context

window model and the global matrix factorization method.

Experimentally, GloVe gives good results on word similarity,

named entity recognition, and word analogy tasks compared

to word2vec and Fast-text.

2) Word2Vec

Word2Vec was proposed by Tomas Mikolov et al. at

Google [55]. It employs the FFNNs with one hidden layer

to extract the word embeddings vector from the inputted

text/image data. This method integrates the Skip-Gram

model, Which predicts the current surrounding context words

based on target words, and the Continuous Bag-of-Words

(CBOW) model, Which predicts the current target words

based on the surrounding context words. Because of that,

it was named two-layer neural networks. Its objective is to

enhance the predictive ability for word vectorization. This

Word2vec takes a large corpus of text/image-based data as

inputs and generates a matrix as an output. Where each row in

the created matrix represents the vector with several hundred

dimensions. This produced vector is the word vectorization

of the one-hot vector of the input token (word or character).

In summary, Word2Vec is a simple FFNN with one hidden

layer. During the learning process, its main goal is to adjust

their weights for minimizing the error rate by decreasing the

loss function. These hidden weights are used as the word

embeddings. Word2vec gives better performance in senti-

ment polarities prediction, and its performance is better on

massive datasets.

3) FAST-TEXT

Fast-text is another word embedding techniques created by

the Facebook AI Research Team for effective learning of

word vectorization. This method is deemed as an exten-

sion of the Word2vec method; instead of training a set of

tokens (words) directly as in the Word2vec method, Fast-text

trains each token as an n-gram of characters. For example,

the representation of the word ’fuzzy’ using the Fast-text

17956 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

method with n-gram=2 is (f, fu, uz, zz, zy, y), where the

brackets denote the beginning and end of the represented

word. This allows to detect the sense of shorter words and

helps the embeddings to learn the sufﬁxes and preﬁxes of

the word. So, once the inputted token has been divided by

applying the character n-grams, either skip-gram or CBOW

is used to learn the word embeddings. Fast-text works well

with unseen words and the words out-of-vocabulary. So, even

if the word is unseen in the previous training steps, this

method is broken down it into n-gram characters to compute

its embeddings. Word2vec and GloVe methods both fail to get

vector embedding for the unseen words, unlike Fast-text that

can learn the unseen words. This is the strong point of this

method compared to other techniques.

As we said previously, the next step of our work

after the data pre-processing step is the data vectoriza-

tion using Word2vec, GloVe, or Fast-text techniques. In the

data-processing process, each input sentence is split into a

set of words (Word2vec, and GloVe) or n-gram characters

(Fast Test). Subsequently, the word embedding methods take

this set of words or n-gram characters as its inputs. These

methods take as its inputs the one-hot vectors that represent

the sentence’s words or n-gram characters. We symbolize

a sentence as S=[Wv1; Wv2; . . . .; Wvn ], where Wvi is

the word or character vector, which is the one-hot vectori

represent the wordior characteri. the the one-hot vector’

dimension is equal to the number of characters or words

in the pre-processing sentence. In this case equals to N.

a pre-trained method applied its weight matrix (Wm)to

matrix Sand obtained low-dimensional matrix representation

Mr=[x1; x2; . . . .; xn]with xi∈Rm.This operation can be

written as follows:

Mr=Wm.S(22)

where Wm∈Rm.nindicates the weight matrix, Mr∈Rm.n

symbolizes the low-dimensional matrix representation of a

sentence, and Sis the one-hot matrix.

After the Word embedding phase that aims to transform the

text-based data into numerical data. It takes a pre-processing

text as inputs, and it outputs an embedding matrix. The next

stage is the application of CNN, as described in the following

subsection.

E. PHASE IV:CONVOLUTION NEURAL NETWORK

After the Word embedding phase, our proposed system is

trained to employ the CNN deep learning model, which be

formed by four layers.

The ﬁrst layer is called Embedding layer or input layer,

which demands word embedding as an input, i.e., a set of

vector representations of the learned sentence as explained in

the previous section, where each vector vw∈R1∗di represent

either character or word accordingly to the used word embed-

ding method. Where diis the vector dimension, and it must be

inferior to the size of vocabulary in the embedding dictionary.

In our work, Word2vec, GloVe, and Fast-text have been used,

which are eligible to discover the semantic and syntactic

properties of characters and words in the used dataset. In the

previous experiment, which is carried out with the used

word embedding methods and Hadoop framework. These

parallelized word embedding methods have been pre-trained

on 90% of the used dataset. After this operation, we get a

pre-trained model employed to map each word or character

onto its own vector representation. We have then computed

the error rate of these word embedding methods by employ-

ing (17). The computed error rate indicates that the Fast-text

is the most efﬁcient word embedding. Accordingly to these

experimental results, we will use the Fast-text word embed-

ding method in the rest of this work. So, the high-dimensional

vectors set are computed for every n-gram character by com-

puting softmax probability for every n-gram character by

using (3). The produced vector representation dimension is

equal to the number of hidden neuron nodes in the Skip-Gram

hidden layer. The number of hidden neuron nodes has been

set to 200. Each tweet is padded with a vector of zeros. The

padding aims to guarantee that all the tweets in the used

dataset have the same dimension. All the obtained vectors

representation is the rows of the embedding matrix Emcon-

sisting of all n-gram character in the dictionary D. These

n-gram characters are noted into indices 1...|D|to speedily

lookup the vector representation of the n-gram character in

EmThen, for each tweet with tn-gram characters, a embed-

ding matrix M=Vnc1;Vnc2;. . . ;Vnci;. . . ;Vnc|t|has been

built. Where Vnci is the vector representation of the ith n-gram

character.Therefore, M is passed to the convolutional layer.

The second layer is called Convolutional layer, which

is applied to the word embedding matrix Mobtained in the

previous layer. In other words, each convolution operation

comprises one ﬁlter matrix (F), which is applied to every

n-gram character’s window (CW ) in the word embedding

matrix M, and one feature map is generated as an outcome.

Therefore, the convolutional layer consists of many convo-

lution operations. Thus several ﬁlters with ranging window

sizes are applied to M, and a set of feature maps is created.

we have the CW =[x1;x2;. . . .;xn] with xi∈Rm, a feature

kiis produced from a CW which its size is Xi :i+v−1 by

using the following formula:

ki=ReLU(Fi.Xi:i+l−1+b) (23)

where ReLU is a non-linear activation method as described

in (1); b∈Ris the used bais and lis the length of the used

ﬁlter F. Therefore, a feature map =[k0,k1,...,ki+l−1] is

produced by the application of (23) in all possible CW of

the word embedding matrix M. Multiple ﬁlters Fi:1→hare

applied to produce multiple feature maps FMj:1→h. Fig. 14

illustrates a simple example of applying a ﬁlter finto the

n-gram character of the word ‘‘Fuzzy’’ to compute the feature

map based on (23).

As a summarize, the convolution layer takes an embed-

ding matrix Mas input and produces a set of feature maps

as output. It is known for its efﬁciency and capability to

extract local features automatically. Then, the third layer is

max-pooling layer which is applied over every obtained

VOLUME 9, 2021 17957

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 14. Example of the application of one filter in the n-gram character of the word fuzzy.

feature map in FMj:1→hand extracts the maximum value of

the feature map pv =max[ki]. In this layer, the number

of its output (L) will be similar to the number of its input

features maps(L). Accordingly to the max-pooling layer,

the size of each dimension of the input features maps will

be miniaturized, and the outputs will be a set of columns,

where the number of these columns equal to the number

of inputted features maps. The miniaturization applied by

the max-pooling operation is depending on the dimension

size of the max-pooling kernel. Fig. 15 presents an example

of a max-pooling operation where we used 1 ×3 as the

max-pooling kernel.

FIGURE 15. Example of the application of pooling layer.

From Fig. 15 we note that a single column of value is

obtained as a result of the single max-pooling operation.

Therefore, this operation aims to ﬁnd and save the essential

optimum feature by aggregating the data and diminishing

the representation size. Finally, the fourth layer is the fully

connected layer, which is the commune point between CNN

and FFNN. At the same time, the fully connected layer is

deemed as the outcome of CNN and as the income of FFNN.

Furthermore, The value of each fully connected neuron is cal-

culated in the CNN phase using the following formula (24).

Vn=f(Wconnector ∗Cpooling +B) (24)

where Vnis the computed neuron value, fis the ReLU

activation method, Wconnector is the weights of the connec-

tors that rely on pooling layer with the fully connected

layer,Cpooling is the set of the pooled column (or pooled

feature maps) in the preceding layer (Pooling layer), and B

is the used bias. The following algorithm (1) summarizes all

CNN’s steps.

The CNN deep learning model’s success is due to the three

factors: sparse connectivity, weight shared, and equivariant

representation. So, CNN different from the classical neural

networks, where the connection between the input and output

neuron is determined by multiplied the neuron value into the

connector weight plus the bias value, which causes the com-

putation burden. While CNN avoids this type of computation

burden based on sparse connectivity, i.e., the kernels’ size is

reduced to be smaller than the dimension of the inputs by

using the pooling layer. These kernels are considered in the

rest learning CNN process as the whole inputted text/image.

The weight shared parameter allows raising the learning

efﬁciency by diminishing the number of weights parameters

being learned. The main idea behind this operation is that,

in state of learning multiple set of weights parameters at each

neuron as in the classical neural network, CNN learn only one

set of them, which performs a good performance in terms of

classiﬁcation rate and consumption time. The weight shared

parameters have also given the CNN deep learning model,

a new property named equivariant representation. i.e., if the

input alters, the output alters in an automatic manner and

follows the same way as the input changed. Thanks to these

three factors, CNN requires fewer weight parameters than

other neural network models, which minimizes the used size

memory and improves CNN efﬁciency.

Generally, our proposed deep learning model (CNN+

FFNN) is divided into two parts; the ﬁrst part is applying

CNN to word embedding matrix M obtained in the previ-

ous word embedding phase to capture and extract the most

important features. In the second phase, we use an FFNN

deep learning network to calculate PSS and NSS. The FFNN

receives as inputs the outputs of the CNN and generates both

values PSS and NSS. The following subsection introduces in

detail the FFNN applied in this work.

17958 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

Algorithm 1: Our Convolutional Neural Network

Input : A given word embedding Matrix M described

by Rrows and Ccolumns, and the SF is the set

of ﬁlter with varying size.

Output: Set of features.

==Computing operations of convolutional layer====

for a←1to Rdo

for b←1to Cdo

for c←1to kdo

for d←1to kdo

sum =0

for v←1to fdo

for w←1to fdo

sum =sum+

F[v][w]*M[ss*(c−1)+v]

[ss*(d−1)+w]: where ss is the

shifting stride

end for

end for

FM[a][c][d] =FM[a][c][d]+sum:where

FM is the feature map matrix.

if b==Cthen

FM[a][c][d] =F(FM[a][c][d]+B):

where B is the used bias, and f is the

ReLU activation function. FM[a][c][d]

=max(0; (FM[a][c][d]+B));

end for

end for

end for

end for

==Computing operations of pooling layer====

max-value =0;

average-value=0;

for a←1to Rdo

for c←1to Cdo

y=0; for d←1to kdo

x=0; max-value =max(value,FM[a][c][d]);

where the used operation is max-pooling

average-value = + FM[a][c][d]; where the

used operation is average-pooling

end for

Cpooling[a][y][x]=max-value; where the used

operation is max-pooling

Cpooling[a][y][x]=average-value/(r*c); where

the used operation is average-pooling

x++;

end for

y++;

end for

F. PHASE V: FeedForward NEURAL NETWORK

After the CNN phase, the next step is the FFNN. This

phase’s main goal is to take the obtained set of features in

the CNN phase and compute both values, which are nega-

tive and positive sentimental scores as described previously.

==Computing operations of fully connected

layer====

for a←1to Rdo

vartemp=0; for c←1to Cdo

for d←1to kdo

vartemp =vartemp+Wconnector[a][c][d] *

Cpooling[a][y][x]; where Cpooling is the

pooling feature maps and Wconnector is the

connector’s weight

end for

end for

Y[a][c][d] =vartemp;

end for

return Set of features Y[a]

Our simple FFNN version consists of four layers: the input

fully connected layer, two hidden sigmoid layers, and the

softmax output layer. The input fully connected layer is the

same fully connected layer of the CNN deep learning model.

So this layer consists of multiple neurons that represent the

extracted features in the previous CNN phase. All neuron

nodes in the fully connected layer are linked to all neuron

nodes in the ﬁrst hidden layer via the connections with dif-

ferent weights, which are adjustable. The hidden neuron value

is computed using the following equation (25):

Xh=σ(

m

X

i=1

Wi∗Xi)=1

1+e(Pm

i=1Wi∗Xi)(25)

where Xhis the value of the hidden neuron,Wiis the weight

of the connector i,Xiis the value of every neuron in the fully

connected layer, and σis the sigmoid activation function,

which is calculated using the following equation:

σ=1

1+ex(26)

where σis the sigmoid function, and exis the standard

exponential function of the input value x.

The sigmoid function is applied to neural networks as an

activation function and also known as a squashing function.

i.e., this function ensures the neuron’s output will be equal

value in the interval [0,1]. In practice, the sigmoid function

used at the level of the hidden layer is represented by the

graphic representation in Fig. 16:

The second hidden layer in this work has the same function

as the ﬁrst hidden layer. It also uses the sigmoid activation

method to compute the value of its hidden neuron nodes.

The difference between both hidden layers is in the num-

ber of hidden neuron nodes. The second hidden layer has

fewer hidden neuron nodes compared to the ﬁrst hidden

layer. Generally, In every neural network, the hidden layers

are situated between the input and the output layers of the

neural network model. At each hidden layer level, a weights

function is applied to the inputs and passed them via an

activation function as the output. i.e., the used hidden layers

VOLUME 9, 2021 17959

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 16. Graphic representation of the sigmoid activation function.

carry out the nonlinear conversion of the inputs that came

into the neural network. Hidden layers differ from neural net-

works to others due to the function of every neural network.

Also, the hidden layers may change, relying on their related

weights. In this work, we employ these both hidden layers

as squashing functions because the intended outputs of this

model are probability degrees that are to say the output value

will be in the interval [0,1].

The last layer of our used FFNN is the softmax output

layer. The inputs entered into this layer are the outcomes of

the second hidden layer multiplied by connections weights,

and the outcome is passed through the softmax activation

function at the level of the neuron nodes in the output layer.

This output layer produces two values, which are PSS and

NSS. The value of both is ranged between 0 no included,

and 1 no included. Therefore the following algorithm (2)

describes all FFNN’s steps.

In this work, we also used the operation dropout, which

indicates dropping out of a certain set of hidden and visi-

ble neurons in our FFNN in order to avoid the overﬁtting

problem. We mean these randomly dropping neurons are

not considered during the training phase in forwarding or

backward feed as shown in Fig. 17. At each round in the

training phase, every neuron is either inactive (dropout out)

of the total architecture of FFNN with probability degree 1-p

or active with probability degree p. A question arises why

do we shut down certain sets of neurons in all network lay-

ers ? the principal aim of the operation dropout is to prevent

overﬁtting, resulting from the co-dependency establishing

by neurons amongst each other at every round during the

training stage. In short, dropout is a regularization method in

FFNN, which leads to eliminating the interdependent training

amongst the nodes.

G. PHASE VI: MAMDANI FUZZY SYSTEM

After our deep learning phase (CNN+FFNN), the next stage

is the classiﬁcation using the fuzzy classiﬁer. Both outputs

NSS and PSS of the CNN+FFNN deep learning model

will be the inputs of our fuzzy classiﬁer. The sentiments

Algorithm 2: Our FeedForward Neural Network

Input : A given set pooled features maps Cpooling

described by Rrows and Ccolumns, and b =1 is the

used bias Output: Both values PSS and NSS.

Randomly initialize the weights of the neural network

using the following equation

Wk

i=Ud[−1

√nk−1;1

√nk−1]; where Udis the continuous

uniform distribution,nk−1is the number of the neuron

on the (k-1)th layer, and i is the ith connector

do

==Computing operations of the input fully

connected layer====

for a←1to Rdo

vartemp=0; for b←1to Cdo

for c←1to kdo

vartemp =vartemp+Wconnector[a][b][c]

*Cpooling[a][b][c]; where Cpooling is the

pooling feature maps and Wconnector is

the connector’s weight

end for

end for

Y[a][b][c] =vartemp;

end for

==Computing operations of the ﬁrst hidden

layer====

for a←1to Rdo

vartemp=0; for b←1to Cdo

for c←1to kdo

vartemp =vartemp+Wconnector[a][b][c]

* Y[a][b][c]; where Yis the set of the

extracted feature by the CNN in the

precedent phase, and Wconnector is the

connector’s weight

end for

end for

fh[a][b][c] =σ(vartemp +B)=1

1+evartemp+B;

where σis the sigmoid activation function.

end for

==Computing operations of the second hidden

layer====

for a←1to Rdo

vartemp=0; for b←1to Cdo

for c←1to kdo

vartemp =vartemp+Wconnector[a][b][c]

* fh[a][b][c]; where fh is the output of

the ﬁrst hidden layer, and Wconnector is

the connector’s weight

end for

end for

sh[a][b][c] =σ(vartemp +B)=1

1+evartemp+B

end for

while lf <0.000001

expressed by humans are vague and imprecision. So, it is

difﬁcult to decide if their opinions are negative, neutral,

or positive about a particular topic. Our used deep learning

17960 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 17. Feedforward neural network without and with dropout.

==Computing operations of the softmax output

layer====

for a←1to Rdo

vartemp=0; for b←1to Cdo

for c←1to kdo

vartemp =vartemp+Wconnector[a][b][c] *

fh[a][b][c]; where fh is the output of the

ﬁrst hidden layer, and Wconnector is the

connector’s weight

end for

end for

output[a][b][c] =f(vartemp +B)=evartemp+B

Pl

i=1evartemp+B

end for

oo0=output[0][0][0];

oo1=output[0][0][0];

Therefore, the adaptation of this neural network can be

performed by reducing (optimizing) the neural network

loss function lf. The loss function is given by the

following equation:

lf (w(i)) =1

Nc∗PNc

i=1PN−on

j=1(ro −ooj)2; where, lf(w(i))

is the error rate at the ith round, w(i) the actual weights

of the connectors at the ith round; ro the required output

neuron node; ooj, the obtained value of the jth output

neuron node; N−on, the number of output neuron

nodes; Nc, the number of connectors. NSS =oo0;

PSS =oo1;

return Both values NSS and PSS

model (CNN+FNN) is powerful for extracting the features

from the given dataset but is powerless to handle with vague-

ness and ambiguous data. To make our proposal more accu-

rate and more efﬁcient, we have been applied the fuzzy

set theory to the outputs of the suggested deep learning

model (CNN+FFNN), mainly, we used the MFS as a fuzzy

classiﬁer.

MFS is constructed using the fuzzy set theory introduced

by Zadeh [27]. The major aim of this theory is to handle

the imprecise and vagueness concepts as the human brain

is performing. According to multiple works in the literature,

this theory proved its efﬁciency to deal with ambiguous data,

and its ability to treat the data like the human brain. Thence,

this theory is growingly applied for resolving real-life prob-

lems that cannot be solved and dealt with the application of

classical set theory. Based on the Fuzzy set theory, multiple

fuzzy systems are proposed. We ﬁnd amongst Mamdani,

Tsukamoto, and Sugeno fuzzy system [28]. Both later sys-

tems are applied in the regression problem, but the Mamdani

is used in the classiﬁcation issue. So our work is serving

to resolve the classiﬁcation issues. Therefore the suitable

system is Mamdani. MFS consists of three major phases,

which are the Fuzziﬁcation process, the Inference process,

and the Defuzziﬁcation process, as illustrated in Fig. 18.

FIGURE 18. The components of the MFS.

The ﬁrst stage before the fuzziﬁcation procedure is the

deﬁnition of input and output linguistic variables and the

deﬁnition of the linguistic terms of each linguistic variable.

So, in this work, we have been applied the Mamdani as a

fuzzy classiﬁer on both outputs NSS and PSS of our deep

learning model (CNN+FFNN). Thence the input linguistic

variables are NSS and PSS and each variable takes ﬁve

linguistic terms which are very low (is between 0.0 and 0.25),

low (is between 0.0 and 0.50), moderate (is between 0.25 and

0.75), high (is between 0.50 and 1), and very high (is between

0.75 and 1). The output variable is the decision classiﬁcation

which has three linguistic terms neutral (is between 0.0 and

0.35), negative (is between 0.35 and 0.65), and positive (is

between 0.65 and 1.0). In short, the inputs of the MFS will

be the NSS and PSS, and the output will be the decision of

classiﬁcation as depicted in Table 3. After the deﬁnition of

the linguistic variables and linguistic terms of our suggested

fuzzy system. The next phase is the fuzziﬁcation process

which is a substantial step in the MFS.

VOLUME 9, 2021 17961

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 3. Input and output parameters of the used fuzzy system.

1) FUZZIFICATION PROCESS

After the deﬁnition of linguistic variables and linguistic

terms, the next phase is the fuzziﬁcation process. The fuzzi-

ﬁcation method is the operation that transforms the crisp

input set into a fuzzy input set by computing the membership

degree using one of the most popular MFs. Input variables

of the MFS are represented on the fuzzy sets by the employ

of MFs such as Triangular, Trapezoidal, or Gaussian, lin-

guistic terms like very low, low, moderate, high, very high;

and linguistic variables which are PSS, NSS, and DC. The

linguistic variables and terms are signiﬁcantly the complete

phrases or the words of utilized NLP. When we are deﬁning

the linguistic terms and variables, we are convinced sufﬁ-

ciently that no numerical data are employed in the linguistic

variables and terms. The two important points in this phase

are the used MFs and the deﬁned fuzzy sets because we are

employing them to get the fuzziﬁed values. The transfor-

mation of crisp input sets into fuzzy sets is carried out by

utilizing of MFs, and this function of conversion is called

fuzziﬁcation. So, in this fuzziﬁcation method, the crisp input

variables are fuzziﬁed by applying the used MF. In other

words, the membership degree of belonging each input vari-

able to each linguistic term (fuzzy set) is computed using a

particular MF. The literature existing MFs are trapezoidal, tri-

angular, Gaussian,2-D, Left-Right, Sigmoid and Generalized

Bell membership functions. In this contribution, we applied

the triangular, trapezoidal, Gaussian MFs, which are the most

popular used membership functions in the literature. These

functions are described below.

a: TRIANGULAR FUNCTION

is determined by Three parameters ll,vand ul. Where ll is

the lower limit, ul is the upper limit, and the value v, where

ll <v<ul as shown in Fig. 19.

µA(x)=

0 if x≤ll

x−ll

v−ll if ll ≤x≤v

ul −x

ul −vif v≤x≤ul

0 if c≤x

(27)

FIGURE 19. Representation graphic of Triangular function.

where µA(x) is the triangular MF of the input value x,ll is the

lower limit, ul is the upper limit, and the median value v.

An alternative mathematical expression is obtained by

applying the min and max functions on the previous equation.

µA(x;ll,v,ul )=max(min(x−ll

v−ll ,ul −x

ul −v),0) (28)

where µA(x;ll,v,ul ) is the triangular MF of the input value x,

ll is the lower limit, ul is the upper limit, and the median

value v.

b: TRAPEZOIDAL FUNCTION

is determined by four parameters ll,lsl,usl and ul. Where

ll is the lower limit, lsl is the lower support limit, usl is the

upper support limit, and ul is the upper limit, and ll <lsl <

usl <ul as illustrated in Fig. 20.

µA(x)=

0 if (x<ll)or (x>ul )

x−ll

lsl −ll if ll ≤x≤lsl

1 if lsl ≤x≤usl

ul −x

ul −usl if usl ≤x≤ul

(29)

FIGURE 20. Representation graphic of Trapezoidal function.

An alternative mathematical expression is computed by

applying the min and max functions on the preceding

17962 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

equation:

µA(x;ll,lsl ,usl,ul)=max(min(x−ll

lsl −ll ,1,ul −x

ul −usl ),0)

(30)

where µA(x;ll,lsl ,usl,ul) is the rapezoidal MF of the input

value x,ll is the lower limit,lsl is the lower support limit, usl

is the upper support limit, and ulis the upper limit.

c: GAUSSIAN FUNCTION

is determined by two parameters cand s. Where cis the

central value, sis the standard deviation, and s>0 as

depicted in Fig. 21.

µA(x)=e−(x−c)2

2.s2(31)

FIGURE 21. Representation graphic of Gaussian function.

2) DEFINING THE FUZZY RULES

After the fuzziﬁcation step, the next is the deﬁnition of fuzzy

IF-THEN rules. For MFS, deﬁning rules is deemed as the

most important phase. Such IF-THEN fuzzy rules are usually

formulated in an appropriate manner utilizing linguistic terms

instead of employing numerical terms. They are mostly rec-

ognized as IF-THEN fuzzy rules, which are readily designed

by harnessing vague conditional statements. IF-THEN fuzzy

rules consist of two segments: a former block, which repre-

sents the inputs linguistic terms and variables, and a latter

or consequent block, which represents the decision of the

classiﬁcation. All the Fuzzy IF-THEN rules that possess any

truth in their former blocks will release and participate in the

conclusion group. Each Fuzzy IF-THEN rule is released to

a degree, which is a function that represents the degree to

which its former block corresponds the input. This vague

identiﬁcation makes a foundation for fulﬁllment between

probable input linguistic variables and aims to reduce the total

number of Fuzzy IF-THEN rules in demand to determine the

relationship between the input and the output. These Fuzzy

IF-THEN rules have an efﬁcient ability to resolve several

real issues. Because they are similar to human knowledge,

and human reasoning which is often appeared in the form of

IF-THEN Fuzzy rules, at this phase, we employ empirical

expert knowledge to produce a set of the IF-THEN fuzzy

rules. As explained below, 25 rules are deﬁned for the pro-

posed fuzzy classiﬁer.

Rule1: IF NSS is veryLow AND PSS is veryLow THEN

DC is neutral

Rule2: IF NSS is veryLow AND PSS is low THEN DC is

neutral

Rule3: IF NSS is veryLow AND PSS is moderate THEN

DC is positive

Rule4: IF NSS is veryLow AND PSS is high THEN DC

is positive

Rule5: IF NSS is veryLow AND PSS is veryHigh THEN

DC is positive

Rule6: IF NSS is low AND PSS is veryLow THEN DC is

neutral

Rule7: IF NSS is low AND PSS is low THEN DC is

neutral

Rule8: IF NSS is low AND PSS is moderate THEN DC is

positive

Rule9: IF NSS is low AND PSS is high THEN DC is

positive

Rule10: IF NSS is low AND PSS is veryHigh THEN DC

is positive

Rule11: IF NSS is moderate AND PSS is veryLow THEN

DC is negative Rule12: IF NSS is moderate AND PSS is low

THEN DC is negative

Rule13: IF NSS is moderate AND PSS is moderate THEN

DC is neutral Rule14: IF NSS is moderate AND PSS is high

THEN DC is positive

Rule15: IF NSS is moderate AND PSS is veryHigh THEN

DC is positive

Rule16: IF NSS is high AND PSS is veryLow THEN DC

is negative

Rule17: IF NSS is high AND PSS is low THEN DC is

negative

Rule18: IF NSS is high AND PSS is moderate THEN DC

is negative

Rule19: IF NSS is high AND PSS is high THEN DC is

neutral

Rule20: IF NSS is high AND PSS is veryHigh THEN DC

is neutral

Rule21: IF NSS is veryHigh AND PSS is veryLow THEN

DC is negative Rule22: IF NSS is veryHigh AND PSS is low

THEN DC is negative

Rule23: IF NSS is veryHigh AND PSS is moderate THEN

DC is negative

Rule24: IF NSS is veryHigh AND PSS is high THEN DC

is neutral

Rule25: IF NSS is veryHigh AND PSS is veryHigh THEN

DC is neutral

H. INFERENCE ENGINE

After, we fuzziﬁed the crisp input set to a fuzzy set using the

fuzziﬁcation technique, and we deﬁned the fuzzy IF-THEN

rules. The next phase is the inference engine. The fuzzy

VOLUME 9, 2021 17963

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

inference engine is exercised to incorporate the previously

described fuzzy sets with taking into consideration the pre-

deﬁned fuzzy IF-THEN rules and the attached fuzzy zone

individually. Generally, the inference engine process con-

sists of three phases, which are application, implication,

aggregation phases. The Min-Max inference technique or

application technique is applied by the inference engine

procedure to compute the rule conclusions employing the

fuzziﬁcation results and determined Fuzzy IF-THEN rules.

The outcome of this operation is known as the fuzzy con-

clusion. In the Mamdani inference engine, the real value of

each Fuzzy IF-THEN rule is computed by the conjunction of

the antecedent’s blocks of the rules. With conjunction repre-

sented as t-norm =minimum in the logic connective ‘‘AND’’

case i.e. the process searches the rule with the minimum

antecedent block that is considered to be the truth value of

the fuzzy IF-THEN rule. This operation is expressed using

the following equation (32):

µA=µi(PSS)AND µii (NSS)

=min(µi(PSS), µii (NSS)) (32)

where µi(PSS) is the membership degree of the variable PSS,

and µii(NSS ) is the membership degree of the variable NSS.

In the case of the logic connective ‘‘OR’’, the t-norm =

maximum. i.e. the inference mechanism ﬁnds the rule with

the maximum antecedent block that is deemed to be the real

value of the fuzzy IF-THEN rule. This task is computed using

the below equation (33):

µA=µi(PSS)OR µii (NSS)

=max(µi(PSS ), µii (NSS)) (33)

where µAis the membership degree obtained after the appli-

cation phase,µi(PSS) is the membership degree of the vari-

able PSS, and µii(NSS ) is the membership degree of the

variable NSS.

In the application phase, the main goal is to extract the

ﬁring strength of each activated rule via the application of

the conjunction of both computed membership degrees in the

previous fuzziﬁcation step respectively for both numerical

variables PSS and NSS.

At every fuzzy activated IF-THEN rule, an implication

operation Iis applied between the fuzzy outcome obtaining

from the application stage and the classiﬁcation decision

of the rule. The operation minimum is the most operation

used in the implication of Mamdani operation. The following

equation (34) describes this implication phase:

µI(DCt)=min(µA, µi(DCt)=1) (34)

where µI(DCt) the membership degree obtained after the

implication operation,µAis the membership degree obtained

after the application phase the membership degree of the

variable PSS, and µi(DCt)=1) is the membership degree

of the decision classiﬁcation attribute.

In the implication phase, the ﬁring strength of an IF-THEN

rule obtained in the previous phase (application) is employed

to deﬁne the membership degree of the decision classiﬁcation

attribute ’DC’ to each linguistic term ’Negative’, ’Neutral’ or

’Positive’, based on the consequent block of the IF-THEN

fuzzy rule.

The ultimate phase in the inference engine mechanism

is the aggregation operation of the outcomes obtained from

the implication stage. i.e. all rule has the same classiﬁcation

decision will be aggregated. There are multiple aggregation

operators, like geometric means, arithmetic mean, Max and

Min. A commonly used operator is the Max which is given

by the following equation (35):

µAg(DCt)=max (µI1(DCt), µI2(DCt), . . . , µIn(DCt)) (35)

where µI(DCt) the membership degree obtained after the

aggregation operation,µIi(DCt)) is the membership degree of

the decision classiﬁcation attribute DCt.

In the aggregation phase, the value of the decision classi-

ﬁcation attribute ’VDC ’ obtained from each Fuzzy IF-THEN

rule requires to compute its membership degree to the identi-

cal linguistic term (Positive, Neutral, or Negative) and deter-

mines the maximum membership degree among them.

I. DEFUZZIFICATION

After the inference engine process, the next phase is the

defuzziﬁcation process which is used to convert the ﬁnal

fuzzy set obtained in the previous aggregation step into

a real number. Also, the defuzziﬁcation is the approach

that produces quantiﬁable outcomes in crisp logic which

is accomplished from deﬁning the fuzzy sets and member-

ship techniques with proportionate degrees There are several

commonly used defuzziﬁcation methods such as Center of

Gravity Method (CGM), Bisector of Area Approach (BAA),

First of Maximum Procedure (FMP) Last of the Maximum

Technique (LMT), Mean of the Maximum Approach (MMA),

weighted Average Procedure (WAP), and the Center of Sums

(COS) Method. In this work, we applied six defuzziﬁcation

methods which are CGM, BAA, WAP, and COS. These used

defuzziﬁcation methods are described below:

1) CENTER OF GRAVITY METHOD

This approach converts the fuzziﬁed value into a crisp output

value via computing the centre of gravity of the input fuzzy

set. The total zone of the MF spreading employed to monitor

the standard action, which is split into a certain number of

sub-zones. The zone and the centroid (centre of gravity) of

every sub-zone are computed and hence the integration of all

these sub-zones is calculated to get the defuzziﬁed value for

a continuous input fuzzy set. Unlike the case of the discrete

fuzzy set, the summation of all these sub-zones is computed

to determine the defuzziﬁed value. In this work, the fuzzy

sets take discrete values. Therefore the defuzziﬁed value dv

is calculated using the summation instead of integration as

the below equation (36) describes:

dv=Pn

i=1zi.µ(zi)

Pn

i=1µ(zi)(36)

17964 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

where ziindicates the instance element,µ(zi) is the member-

ship degree of the element zi, and ndescribes the number of

elements in the instance.

2) BISECTOR OF AREA APPROACH

The Center of the area defuzziﬁcation approach computes the

abscissa of the perpendicular line that splits the zone of the

obtained membership function into two sub-zones with an

equal surface. In other words, this method serves to compute

the position under the curve where the sub-zones have the

same surface, which is the crisp value corresponds to defuz-

iﬁed value. It is one of the widely applied approaches. The

defuzziﬁed value dvis computed using the Equation (37):

ZzBOA

α

µi(z).dz=Zβ

zBOA

µi(z).dz(37)

where α=min{z;z∈Z},β=max{z;z∈Z}and

z=zBOA is the vertical line that divides the area between

z=α, z=βv=0 and v=µi(z) into two areas with the same

region, µi(z) is the membership degree of the element z, and

dzis the derivative of the element z

3) WEIGHTED AVERAGE PROCEDURE

This approach is suitable for input fuzzy sets with identi-

cal output MFs and generates outcomes very close to the

centre of area approach. This technique used less compu-

tational resources. Every membership method is weighted

by its membership degree that has the maximum value. The

defuzziﬁed value dvis determined as the below equation (38)

describes:

dv=Pµi(z).z

Pµi(z)(38)

where Pindicates the algebraic summation, zis the element

that has the maximum membership degree, and µi(z) is the

membership function of the element zwhere iis the linguistic

term.

4) CENTER OF SUMS (COS) METHOD

It is a widely applied defuzziﬁcation method. In this

approach, the overlapping zone is computed twice. It is

faster compared to other defuzziﬁcation methods. It applies

algebraic sum on all output fuzzy sets. It is identical to the

weighted average approach; however, in this approach, the

weights are the zones, instead of membership degrees in

the weighted average approach. The defuzziﬁed value dvis

calculated using the below equation (39):

dv=Pn

ii=1zii.Pk

j=1µij(zii)

Pn

ii=1.Pk

j=1µij(zii)(39)

where nis the total number of used fuzzy sets, Kis the total

number of fuzzy linguistic variables, µijis the membership

degree for the j-th fuzzy set.

a: OUTPUT OF THE DEFUZZIFICATION APPROACH

After the application of one of the defuzziﬁcation method

on the aggregated value obtained in the aggregation phase.

The applied defuzziﬁcation method transforms the fuzzy

aggregated input to the crisp output.which is obtained by

the application of all steps of the fuzzy inference engine

process. It consists of the different kinds of the decision of

the classiﬁcation output variable, which are computed by

the defuzziﬁed value. Therefore we employ defuzziﬁcation

rules to deﬁne the relationship between defuzziﬁed value and

decision classiﬁcation.

b: DEFUZZIFICATION RULES

Here are all possible rules of defuzziﬁcation where dvsigni-

ﬁes the defuzziﬁed value while dcsigniﬁes the decision of

the classiﬁcation. These rules are used to determine the ﬁnal

crisp output, which is either negative, neutral, or positive.

if (0.0 ≤dv≤0.35 ), then dc=Negative

if (0.35<dv≤0.65 ), then dc=Neutral

if (0.65 <dv≤1.0 ), then dc=Positive

V. PARALLELIZATION OF OUR PROPOSAL USING

HADOOP FRAMEWORK

One of the most terrible shortcomings of our deep neural

networks (CNN+FNN) is the long execution time. Such a

time-consumption issue prevents the trained deep learning

models from speedily get more accurate information and

perform the required tasks. To curb this execution time prob-

lem, we have been applied the Hadoop framework [56] to

our proposed approach, which is a useful framework that

serves to improve the forecasting effectiveness and scalability

of our proposed fuzzy deep learning model. The Hadoop

platform parallelizes our FDLC between multiple computing

nodes. This framework utilizing its Hadoop Distributed File

System (HDFS) for stocking both used social-media datasets

in this work (sentiment140, and COVID-19 Sentiments) to be

classiﬁed and the decision of the classiﬁcation, and MapRe-

duce programming model that processes and treats our fuzzy

deep learning tasks in a parallel manner using multiple map-

pers and reducers as illustrated in Fig. 22. Implementing

our FDLC on the MapReduce programming model mostly

consists of three stages: the Map phase, the Combining

stage, and the Reduce stage, introduced in brief details as

follows.

•Map phase: The map phase consists of four mappers;

each mapper read one or more data chunks from HDFS

as a different key-value pair’s inputs data. The mapper

applies the text-preprocessing to each chunk, then trans-

forms it to numerical data based on the word embeddings

phase, and passes it through our deep learning model

(CNN+FFNN), ﬁnally applies the fuzzy classiﬁer to the

processed chunk. After dealing with all data chunks, The

outputs obtained by using our FDLC are turned into a set

of intermediate key-value pairs and write them on the

local disk.

VOLUME 9, 2021 17965

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

Algorithm 3: Our Fuzzy Classiﬁer Based on Mamdani

Fuzzy System

Input : A obtained both sentimental scores NSS and

PSS in the deep learning phase. each variable

take ﬁve linguistic terms which are very low (is

between 0.0 and 0.25),low(is between 0.0 and

0.50),moderate(is between 0.25 and 0.75),high

(is between 0.50 and 1),and very high(is

between 0.75 and 1)

Output: Decision of classﬁcation which is determined

in three label neural,negative or positive.

Phase 1: Deﬁnition of the input linguistic and output

linguistic variables and the deﬁnition of linguistic terms

of every linguistic variable.

Phase 2: Fuzziﬁcation process

2.1 Use Gaussian, Triangular, or Trapezoidal

membership function

2.2 Calculates the membership degree of each linguistic

term using the selected membership function.

2.3 Transform every crisp set to fuzzy set

Phase 3: Generates IF-THEN fuzzy rules based on our

expert knowledge

Phase 4:Inference engine process

4.1 Application phase

4.2 Implication phase

4.3 Aggregation phase

Phase 5: Defuzziﬁcation process

5.1 Use BOA, COA, WAM, or COS defuzziﬁcation

methods

5.2 Transform the obtained aggregated fuzzy value in

the aggregation phase to crisp or real value by the

application of one defuzziﬁcation method.

5.3 From the resulted crisp value, discovery the

classiﬁcation decision.

if (0.0 ≤dv≤0.35 ), then dc= Negative

if (0.35<dv≤0.65 ), then dc= Neutral

if (0.65 <dv≤1.0 ), then dc= Positive

Where dvis the defuzziﬁed value, and dcis the decision

of the classiﬁcation.

return dc

•Group by keys: The MapReduce programming model

carries out this operation. Its main goal is aggregated

all obtained intermediate values in the Mapper with the

same intermediate key into an array list of values and

passed it to the Reducer.

•Reduce phase: In our work, the Reduce phase consists

of four Reducers; each reducer receives all intermediate

array list of values from all mapper. The reducer worked

on one key simultaneously and aggregated the list of

values associated with that key in a smaller set [56].

Finally, all reducers outputs are combined and merged

as one intermediate output and write this resulted output

as output key-value pair on HDFS as depicted in Fig. 22.

The advantage of Hadoop is its ability to prohibit the

problem of server failures by storing information redundantly

on several compute server, which aid to back up data auto-

matically. i.e the same piece of information is recorded on

multiple computing servers. If one of those compute servers

fails, the amount of data is still available on another comput-

ing server. MapReduce programming system is a software

that offers scalable and reliable conditions for the process

and implementation of distributed applications. To be more

accurate, this programming framework broken automatically

the computations into multiple parallelization tasks. Like if

one task fails to accomplish its processing work, it can be

refreshed without any negative inﬂuences to other running

tasks. MapReduce prevents the issue of network bottlenecks

by making the computation tasks close to stored data and

prohibits copying data around the network, and this reduces a

network bottleneck issue and leads to information and com-

putational load balancing. MapReduce Model also supplies

their users a very straightforward and straightforward model

which stashes the complications of all computing tasks of its

functioning.

In this contribution, we have used the Hadoop frame-

work in our proposal to minimize the execution time and

improve our FDLC. Our proposed method is involving both

used massive datasets (Sentiment140, and COVID-19 Sen-

timents). In the ﬁrst step, we have employed the HDFS to

store and share the enormous dataset parallel between all

the computing servers in the cluster Hadoop. After we store

the dataset in HDFS, the next step is applying our FDLC.

In this second step, we used the MapReduce programming

model to parallelize our approach between all computing

nodes of the Hadoop cluster. The input in every round of

the MapReduce algorithm is a sentence to be classiﬁed, and

the outcome is a classiﬁed sentence with the classiﬁcation’s

decision. The outcome of the classiﬁcation of each sentence

will also be stored in HDFS. All these steps are described

in Fig. 22, and algorithm 4 presents the MapReduce algorithm

applied in our work to classify the sentences using our fuzzy

deep learning classiﬁer.

VI. EXAMPLE OF THE APPLICATION

This section illustrates our FDLC approach with an exam-

ple i.e. in this section We will explain the followed steps

to classify each sentence S, according to three class labels

(Negative, Neutral, or Positive). The ﬁrst step of our algo-

rithm is to check all words in each sentence Sare written

in the English language. If not the case, we use the trans-

lated function to translate those words in other languages

into the English language. For example, we have S=‘‘The

deep neural networks are very efﬁcienccccccccy to process

data; but they are inefﬁciency with the ingrained ambigu-

ity in NL que demande d’autre solutions.’’; The parts ‘‘que

demande d’autre solutions’’ of the sentence Sis translated

to ‘‘that needs more solutions,’’ Then after the application of

pre-processing techniques, we obtained the sentence Sas a

set of token (T) like Table 4 describes.

17966 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 22. Parallel architecture of our proposal using MapReduce.

TABLE 4. Set of token represents the sentence to be classified.

Algorithm 4: Our MapReduce Programming Model

Input : Data Chunks

Output: Decision of classﬁcation

if (a word in sentence not in English) then

Translate(word into English)

T=Text-preprocessing(Data chunks)

W=Word-embedding(T)

C=CNN(W)

NSS =FFNN(C)

PSS =FFNN(C)

µ(NSS)=Fuzziﬁcation(NSS)

µ(PSS)=Fuzziﬁcation(PSS)

A=Application(µ(NSS),µ(PSS),IF-THEN rules)

I=Implication(A,µ(DCt)=1)

Ag =Aggregation(I)

dv=Defuzziﬁcation(Ag)

if (0.0 ≤dv≤0.35 ), then dc=Negative

if (0.35<dv≤0.65 ), then dc=Neutral

if (0.65 <dv≤1.0 ), then dc=Positive

Where dvis the defuzziﬁed value, and dcis the decision

of the classiﬁcation.

return dc

After we get the set of tokens, we have applied the Fast-text

word embedding approach to transform the text-based data to

numerical-based data, as shown in Fig. 23.

As illustrated in Fig. 23, The Fast-text word embedding

method applies the n-gram=2 characters to each word in

FIGURE 23. Word embedding matrix of sentence S.

the sentence S. for example the n-gram=2 for word ‘‘deep’’

will be (d,de,ee,ep,p). Then Fast-text uses either CBOW

or Skip-Gram to compute the word embedding for each

n-gram word and generates the word embedding matrix of

the sentence S. After the embedding matrix is obtained, CNN

automatically extracts the essential feature from this matrix.

So Fig. 24 describes CNN’s steps.

VOLUME 9, 2021 17967

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 24. Application of CNN’s steps.

FIGURE 25. Application of FFNN’s stages.

As seen in Fig. 24, CNN’s ﬁrst step is the convolved oper-

ation, which serves to apply multiple ﬁlters to each n-gram

character windows CW. To clarity the convolved operation

in an accurate manner, Fig. 24 illustrates only the applica-

tion of one ﬁlter Fto one n-gram character window CW,

and we obtain F.CW matrix. Then, we add the bias bequal

to 1, and we get the matrix F.CW+1. Finally, we applied

the ReLU activation function to F.CW+1matrix, and we

obtain the ﬁrst feature map. After the convolved operation,

the next stage is the pooling operation, in which we reduce

the feature map dimensionality. In this example, we use the

max-pooling with the shifting stride equal to 2. For that,

it must pad to zero the feature map to get its dimensionalities

divisible by 2. Then we apply the max-pooling to the padded

matrix, and we get the pooling column. Finally, we pass

the obtained pooling column to the fully connected layer.

After we extract the essential feature using CNN, the next

phase is the application of FFNN to compute both NSS

and PSS values. Therefore Fig. 25 presents the FFNN’s

steps.

As presented in Fig. 25, the ﬁrst layer of our FFNN simple

version is the fully connected layer, followed by two hidden

layers and the output layer. We used at the level of both

hidden layers the sigmoid activation function, and at the level

of of each neuron node of the output layer, we applied the

softmax activation method. The input of this network is a fully

connected layer obtained in the CNN phase, and the outputs

are both PSS and NSS. The value of each hidden neuron Hnv

is computed by multiplying the all connected weights to this

hidden neuron into its value and add the bias bthen calculated

the sigmoid activation function. In the following an example

of calculating the value of the ﬁrst hidden neuron in the ﬁrst

hidden layer:

Hnv =f((wi1∗v1+wi2∗v2+wi3∗v3+wi4∗v4)+B)

=f((1.72 ∗0.02 +2.34 ∗0.15

+1.27 ∗0.39 +1.29 ∗0.75) +b)

=f(0.0344+1+0.351+1+0.4953+1+0.9675 +1)

=f(5.85) =1

1+e5.85 =1

1+347.23

=1

348.23 =0.00287

The same manner is used to calculate the value of each

neuron node in the output layer. The only difference is that

instead of using the sigmoid function for calculating the value

of each hidden neuron, we employ the softmax activation

function for computing the value of each neuron node in

the output layer. The following example explains how we

calculate the value of both neurons in the output layer:

ONv1=wi1∗v1+wi2∗v2)+B

=(0.56 ∗0.25 +0.96 ∗0.47) +B

=0.14 +1+0.4512 +1=2.60

ONv2=wi3∗v1+wi4∗v2)+B

17968 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 26. Membership degree of each linguistic term.

=(0.56 ∗0.36 +0.96 ∗0.10) +B

=0.2016 +1+0.096 +1=2.30

Therefore, both PSS and NSS values are calculated by

applying the softmax activation function on both values

ONv1, and ONv2as follows.

PSS =e2.60

e2.60 +e2.30 =13.46

13.46 +9.97 =13.46

23.43 =0.575

NSS =e2.30

e2.60 +e2.30 =9.97

13.46 +9.97 =9.97

23.43 =0.425

After we compute both PSS=0.575 and NSS=0.425 values

using our proposed FDLC, the next phase is the application

of the MFS. So, The ﬁrst stage of MFS is the application

of the fuzziﬁcation process into both NSS and PSS crisp

values, i.e., we use the triangular (or trapezoidal, or gaussian)

membership function to compute the membership degree of

belonging of the NSS and PSS to veryLow, low, moderate,

high, and veryHigh fuzzy sets. In this example, we have used

the triangular MF represented by Equation (27) and Fig. 19.

The calculating process is introduced as follows based on

Fig. 26.

For the linguistic term veryLow, and the optimal scalar

parameters are ll=0; v=0.125; and ul=0.25; then, we used

these parameters to calculate the membership degrees of both

linguistic variables NSS and PSS of belonging to the fuzzy set

veryLow. The results like the following:

•We have PSS=0.575 ≥ul=0.25 Therefore;

µveryLow(PSS )=0

•We have NSS=0.425 ≥ul=0.25 Therefore;

µveryLow(NSS )=0

Therefore, the values of each used membership function’s

parameters were determined experimental, and we take the

optimal values of these parameters that provide better classi-

ﬁcation performance.

For the linguistic term low, the optimal scalar parameters

are ll=0; v=0.25; and ul=0.5; then, we used these parameters

to calculate the membership degrees of both linguistic vari-

ables NSS and PSS of belonging to the fuzzy set low. The

results like the following:

•We have PSS=0.575 ≥ul=0.5 Therefore;

µlow(PSS )=0

•We have v=0.25 ≤NSS=0.425 ≤ul=0.5 Therefore;

µlow(NSS )=ul−NSS

ul−v=0.5−0.425

0.5−0.25 =0.3

For the linguistic term moderate, the optimal scalar param-

eters are ll=0.25; v=0.5; and ul=0.75; then, we used these

parameters to calculate the membership degrees of both lin-

guistic variables NSS and PSS of belonging to the fuzzy set

moderate. The results like the following:

•We have v=0.5≤PSS=0.575≤ul=0.75 Therefore;

µmoderate(PSS)=ul−PSS

ul−v=0.75−0.575

0.75−0.5=0.7

•We have ll=0.25 ≤NSS=0.425 ≤v=0.5 Therefore;

µmoderate(NSS)=NSS−ll

v−ll =0.425−0.25

0.5−0.25 =0.7

For the linguistic term high, the optimal scalar parameters

are ll=0.5; v=0.75; and ul=1; then, we used these parameters

to calculate the membership degrees of both linguistic vari-

ables NSS and PSS of belonging to the fuzzy set high. The

results like the following:

•We have ll=0.5≤PSS=0.575 ≤v=0.75 Therefore;

µhigh(PSS )=PSS −ll

v−ll =0.575−0.50

0.75−0.50 =0.30

•We have NSS=0.425 ≤ll=0.5 Therefore;

µhigh(NSS )0

For the linguistic term veryHigh, the optimal scalar param-

eters are ll=0.75; v=0.875; and ul=1; then, we used these

parameters to calculate the membership degrees of both lin-

guistic variables NSS and PSS of belonging to the fuzzy set

high. The results like the following:

•We have PSS=0.575 ≤ll=0.75 Therefore;

µveryHigh(PSS )=0

•We have NSS=0.425 ≤ll=0.75 Therefore;

µveryHigh(NSS )=0

As we said earlier, the next step after the fuzziﬁcation

process is the application process of the 25 generated IF-Then

fuzzy rules. The primary objective of this operation is to

discover the ﬁring strength of each activated fuzzy IF-THEN

rule via the application of the conjunction of both computed

VOLUME 9, 2021 17969

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

numerical variables PSS and NSS in the fuzziﬁcation phase.

The computing steps of this process are presented as follows:

•Rule1: IF (NSS is veryLow) =0AND (PSS is veryLow)

=0THEN (DC is neutral)=Min(0,0)=0

•Rule2: IF (NSS is veryLow) =0AND (PSS is low)=0

THEN (DC is neutral) =min(0,0)=0

•Rule3: IF (NSS is veryLow) =0AND (PSS is moderate)

=0.7THEN (DC is positive)=min(0,0.7) =0

•Rule4: IF (NSS is veryLow) =0AND (PSS is high) =

0.30 THEN (DC is positive)=min(0,0.30) =0

•Rule5: IF (NSS is veryLow) =0AND (PSS is veryHigh)

=0THEN (DC is positive) =min(0,0) =0

•Rule6: IF (NSS is low)=0.3 AND (PSS is veryLow) =

0THEN (DC is neutral) =min(0.3,0)=0

•Rule7: IF (NSS is low)=0.3 AND (PSS is low)=0

THEN (DC is neutral)=min(0.3,0) =0

•Rule8: IF (NSS is low)=0.3 AND (PSS is moderate)=0.7

THEN (DC is positive) =min(0.3,0.7)=0.3

•Rule9: IF (NSS is low)=0.3 AND (PSS is high)=0.3

THEN (DC is positive) =min(0.3,0.3)=0.3

•Rule10: IF (NSS is low)=0.3 AND (PSS is veryHigh)=0

THEN (DC is positive) =min(0.3,0) =0

•Rule11: IF (NSS is moderate)=0.7AND (PSS is

veryLow)=0THEN (DC is negative)=min(0.7,0)=0

•Rule12: IF (NSS is moderate)=0.7 AND (PSS is low)=

0THEN (DC is negative)=min(0.7,0)=0

•Rule13: IF (NSS is moderate)=0.7 AND (PSS is

moderate)=0.7 THEN (DC is neutral)=min(0.7,0.7)

=0.7

•Rule14: IF (NSS is moderate)=0.7 AND (PSS is

high)=0.375 THEN (DC is positive)=min(0.7,0.3) =0.3

•Rule15: IF (NSS is moderate)=0.7 AND (PSS is

veryHigh)=0THEN (DC is positive)==min(0.7,0)=0

•Rule16: IF (NSS is high)=0AND (PSS is veryLow)=

0THEN (DC is negative)=min(0,0)=0

•Rule17: IF (NSS is high)=0AND (PSS is low)=

0THEN (DC is negative)=min(0,0)=0

•Rule18: IF (NSS is high)=0AND (PSS is moderate)=

0.7THEN (DC is negative)=min(0,0.7)=0

•Rule19: IF (NSS is high)=0AND (PSS is high)=0.3

THEN (DC is neutral)=min(0,0.3)=0

•Rule20: IF (NSS is high)=0AND (PSS is veryHigh)=0

THEN (DC is neutral)=min(0,0)=0

•Rule21: IF (NSS is veryHigh)=0AND (PSS is very

low)=0THEN (DC is negative)=min(0,0)=0

•Rule22: IF (NSS is veryHigh)=0AND (PSS is low)=0

THEN (DC is negative)=min(0,0)=0

•Rule23: IF (NSS is veryHigh)=0AND (PSS is

moderate)=0.7 THEN (DC is negative)=min(0,0.7)=0

•Rule24: IF (NSS is veryHigh)=0AND (PSS is

high)=0.3 THEN (DC is neutral)=min(0,0.30)=0

•Rule25: IF (NSS is veryHigh)=0AND (PSS is

veryHigh)=0THEN (DC is neutral)=min(0,0)=0

The following stage is the implication process. The main

goal of the implication phase, as we presented previously,

is the calculation of the membership degree of the decision

classiﬁcation attribute ’DC’ to each linguistic term ’Nega-

tive,’ ’Neutral’ or ’Positive,’ based on the ﬁring strength of

an IF-THEN rule obtained in the previous application phase,

and on the consequent block of the IF-THEN fuzzy rule. the

computing steps are presented below:

•Rule1:µ1(neutral)=min(0,1)=0

•Rule2:µ2(neutral)=min(0,1)=0

•Rule3:µ3(positive)=min(0,1)=0

•Rule4:µ4(positive)=min(0,1)=0

•Rule5:µ5(positive)=min(0,1)=0

•Rule6:µ6(neutral)=min(0,1)=0

•Rule7:µ7(neutral)=min(0,1)=0

•Rule8:µ8(positive)=min(0.3,1)=0.3

•Rule9:µ9(positive)=min(0.3,1)=0.3

•Rule10:µ10(positive)=min(0,1)=0

•Rule11:µ11(negative)=min(0,1)=0

•Rule12:µ12(negative)=min(0,1)=0

•Rule13:µ13(neutral)=min(0.625,1)=0.7

•Rule14:µ14(positive)=min(0.375,1)=0.3

•Rule15:µ15(positive)=min(0.625,1)=0

•Rule16:µ16(negative)=min(0,1)=0

•Rule17:µ17(negative)=min(0,1)=0

•Rule18:µ18(negative)=min(0,1)=0

•Rule19:µ19(neutral)=min(0,1)=0

•Rule20:µ20(neutral)=min(0,1)=0

•Rule21:µ21(negative)=min(0,1)=0

•Rule22:µ22(negative)=min(0,1)=0

•Rule23:µ23(negative)=min(0,1)=0

•Rule24:µ24(neutral)=min(0,1)=0

•Rule25:µ25(neutral)=min(0,1)=0

After the implication process, the next process is the

aggregation process, which aims to aggregate each class

label(Positive, Neutral, or Negative) among them and ﬁnd

the maximum membership degree among the aggregated

values. Also, the computing steps are carried out as

follows:

•µ(positive)=µ3(positive) ∨µ4(positive) ∨µ5(positive)

∨µ8(positive) ∨µ9(positive) ∨µ10(positive) ∨

µ14(positive) ∨µ15(positive) =max(0,0,0,0.3,0.3,0,

0.3,0) =0.3

•µ(negative)=µ11(negative) ∨µ12 (negative) ∨

µ16(negative) ∨µ17(negative) ∨µ18(negative) ∨

µ21(negative) ∨µ22(negative) ∨µ23(negative) =

max(0,0,0,0,0,0,0,0) =0

•µ(neutral)=µ1(neutral) ∨µ2(neutral) ∨µ6(neutral)

∨µ7(neutral) ∨µ13(neutral) ∨µ19(neutral) ∨

µ20(neutral) ∨µ24(neutral) ∨µ25(neutral) =max(0,0,

0,0,0.7,0,0,0,0) =0.7

Finally, and after the aggregation process, we use the Cen-

troid of Area defuzziﬁcation method to defuzzify the fuzzy

aggregated outputs to obtain one crisp result that indicates

the class label. This process is performed by following these

described steps below:

Step 1: Accordingly to Fig. 27 we compute the Cen-

troid of negative label that will be equal to (0.0+0.35)/

2=0.175, the Centroid of neutral label that will be equal to

17970 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 27. Membership degree of each class label.

FIGURE 28. Defuzzified value computed using rules review in Matlab.

(0.35+0.65)/2 =0.5, and the Centroid of positive label that

will be equal to (0.65+1)/2 =0.825

Step 2: Accordingly to Centroid of Area defuzziﬁcation

method introduced in equation (36). The defuzzﬁed value

dvis calculated using the computed Centroid in the ﬁrst

step and the membership degree of each label as follows:

dv=0∗0.175+0.7∗0.5+0.825∗0.3

0+0.7+0.3=0.60

Step 3: Accordingly to step 2 the defuzziﬁed value

dv=0.60 is between 0.35 and 0.65. Therefore; the decision

of classiﬁcation of the sentence Sis neutral

Accordingly to Fig. 28 we remark that the defuzziﬁed

value is equal to 0.60 for PSS=0.575 and NSS=0.425. This

defuzziﬁed value is is between 0.35 and 0.65. Therefore;

the decision of classiﬁcation of the sentence Sis neutral.

So we get the same conclusion as the performed manually

computing.

VII. THE EXPERIMENT AND THE RESULTS

This section describes the experimental results of our fuzzy

deep learning classiﬁer (CNN +FFNN +MFS). These

experimental results are provided by applying our fuzzy

deep learning classiﬁer and other literature methods on both

used datasets, as presented in the data-collection subsection.

Generally, in this study, we split the given dataset into

a training dataset, which represents 90% of the overall

dataset, and a testing dataset representing only 10% of the

comprehensive dataset. After that, we store both obtained

datasets(training and testing) into HDFS. Once the storage

is ﬁnished, we apply multiple text pre-processing techniques

on training and testing datasets to reduce and remove the

noisy data. Then we implement the most efﬁcient word

embedding approach to transform the text-based data into

numerical-based data. Besides, we apply our proposed fuzzy

deep learning classiﬁer on the testing dataset, and we stock

the classiﬁed data into HDFS. Also, we carry out the clas-

siﬁcation based on our proposed classiﬁer in a parallel

manner by applying the Hadoop framework with its HDFS

and its MapReduce programming framework. In this work,

the Hadoop cluster consists of ﬁve computing nodes; one

master computing node and four salve computing nodes.

For assessing the experimental results, we have calculated

ten evaluation metrics exhibited in the performance metrics

subsection. The ten metrics comprise TPR, TNR, FPR, FNR,

ER, PR, AC, KS, FS, and TC.

In this work, we performed the experiments in ﬁve

steps. First, we kept the same parameters of our suggested

VOLUME 9, 2021 17971

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 5. Parameters settings of our FDLC that we used to assess the word embedding approaches.

TABLE 6. ER and TC of each word embedding methods without applying the Hadoop framework.

fuzzy deep learning classiﬁer. Then, we switched the word

embedding adopted approaches(Word2vec, GloVe, Fast-

text) to conclude the most effective approach to transform

the text-based data into numerical-based data. After we

obtained the most performant word embedding procedure

in terms of accuracy, we have utilized it in the rest of

this work.Second, we applied the efﬁciency word embed-

ding technique obtained in the preceding step, and we pre-

served the same parameters of our proposed deep learning

model(CNN+FFNN). Then, we changed the adopted fuzzi-

ﬁcation approach and the used defuzziﬁcation methods. This

experiment’s main objective is to determine the most effective

fuzziﬁcation approach and the most efﬁcient defuzziﬁcation

method among all used methods in terms of accuracy. Third,

after determining the most efﬁcient word embedding tech-

nique, the better fuzziﬁcation method, and the most effec-

tive defuzziﬁcation approach. We formed several fuzzy deep

learning classiﬁers (FDLC) using different parameters for

each layer. This experiment’s primary goal is to deﬁne the set

of parameters, which made our FDLC more accurate. Four,

we parallelize our proposed FDLC employing the Hadoop

framework with its HDFS and MapReduce programming

framework. Finally, we compared the most effective model

of our proposed FDLC with similar literature approaches and

demonstrating our suggested FDLC’s performance.

A. EXPERIMENT 1

In this experiment, we evaluated the effectiveness of the

Word2vec, GloVe, Fast-text in terms of the ER, and the TC.

We merged these studies word embedding methods with our

suggested FDLC to prove and to verify their performance.

This experiment serves to discover the more efﬁcient one

among all employed word embedding processes in the case of

our work, in which we used both large datasets. So, we kept

the same parameters of our proposed FDLC, and we switched

the adopted word embedding method (Word2vec, GloVe, and

Fast-text), Then we applied each combination on both used

massive datasets as presented in Table 5.

Table 5 introduces the parameter settings of both used deep

learning model CNN and FFNN. Also, for the MFS classiﬁer,

we used the Triangular membership function for performing

the fuzziﬁcation and the Centroid of the Area for carrying out

the defuzziﬁcation. Besides, the Hadoop framework is imple-

mented in our work, which parallelizes the learning word

embedding tasks between ﬁve machines; one master node and

four slave nodes. The Hadoop framework uses its HDFS for

stocking the dataset to be embedding and the set of repre-

sentation vectors (the obtained result by applying the word

embedding method), and MapReduce programming frame-

work for processing and treating our work [56]. The Hadoop

framework’s primary goal is to parallelize our embedding

process multiple machines to improve the AC and reduce the

TC. Table 6 shows the ER and TC of each word embedding

methods without applying the Hadoop framework.

From our experimental consequences, as revealed

in Table 6, we remarked that the GolVe method takes less

execution time than other techniques, which is equal to

7.32s, and 4.52s in the case of sentiment140 and COVID-19

Sentiments datasets, respectively. But it has a higher error

rate, which is equal to 35.68%, and 21.39% in the case of

sentiment140 and COVID-19 Sentiments datasets, respec-

tively. Fast-text has less error rate compared to Word2vec and

GloVe techniques, which is equal to 11.02%, and 8.66% in

the case of sentiment140 and COVID-19 Sentiments datasets,

17972 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 7. ER and TC of each word embedding methods with applying the Hadoop framework.

TABLE 8. ER, AC, and TC of fuzzification method/defuzzification approaches without using Hadoop framework.

respectively. But it has a higher execution time, which is

equal to 20.15s, and 12.46s in the case of sentiment140 and

COVID-19 Sentiments datasets, respectively. In summary,

the Fast-text embedding method is the most efﬁcient in terms

of the learning rate. And in order to overcome its execution

time shortcoming, we have employed the Hadoop framework

to parallelize the used embedding methods to minimize the

execution time and raise the learning rate. Table 7 depicts

the obtained results after the incorporation of the Hadoop

framework with word embedding methods.

From Tables 6 and 7, we deduce that the Hadoop frame-

work can decrease the execution time and raise the learning

rate. For example, it reduces the Fast-text method’s exe-

cution time from (20.15s,12.46s) to (3.03s,1.49s) for both

datasets sentiment140, and COVID-19 Sentiments, respec-

tively. Almost, we can say that Fast-text is the most efﬁcient

method. Therefore, we will use only the Fast-text word

embedding method in the rest of this work.

B. EXPERIMENT 2

This second experiment’s main goal is to ﬁnd the most

efﬁcient fuzziﬁcation approach and better defuzziﬁcation

methods. In this experiment, the adopted word embed-

ding is the Fast-text, and the parameter settings of both

used deep learning model CNN and FFNN are the same

as those presented in Table 5. The renovated part of our

proposal is in the applied fuzzy classiﬁer. As explained

earlier, In our fuzzy classiﬁer, we use three MFs for the

fuzziﬁcation process, which are Triangular, Trapezoidal, and

Gaussian MFs, and four defuzziﬁcation approaches, which

are Centroid of Area, Bisector of Area, Weighted Aver-

age, and Center of Sums. Therefore in this experiment,

we combine each fuzziﬁcation method with different defuzzi-

ﬁcation methods, as illustrated in Table 8. This aggre-

gation generates 12 combinations, which are Triangular

MF/Centroid of Area, Triangular MF/Bisector of Area, Trian-

gular MF/Weighted Average, Triangular MF/Center of Sums,

Trapezoidal MF/Centroid of Area, Trapezoidal MF/Bisector

of Area, Trapezoidal MF/Weighted Average, Trapezoidal

MF/Center of Sums, Gaussian MF/Centroid of Area, Trape-

zoidal MF/Gaussian MF, Trapezoidal MF/Gaussian MF, and

Trapezoidal MF/Gaussian MF. Table 8 presents the obtained

ER, AC, and TC after applying our proposal on the Senti-

ment140 dataset without using the Hadoop framework.

As presented previously, in this experiment, we kept the

same parameters for the deep learning model, as detailed

in Table 5, and we applied the Fast-text word embed-

ding. Therefore at each time, we changed the fuzziﬁca-

tion/defuzziﬁcation methods in order to discover the most

efﬁcient fuzziﬁcation method /defuzziﬁcation approaches.

From Table 8, we perceive that the better fuzziﬁcation method

is the Gaussian MF, and the most efﬁcient defuzziﬁcation is

the Center of Sum approach compared to other approaches.

Furthermore, we used the Gaussian MF to fuzzify both NSS

and PSS values into veryLow, low, moderate, high, and very-

High fuzzy sets. After we get the output of the inference

engine process, we defuzziﬁed this fuzzy output into the

crisp output using the Center of Sum method. This Gaussian

MF/Center of Sums aggregation raises the AC to 89.75% and

decrease the ER to 10.25%; also, this combination is better in

terms of consumption time that equal to 22.01s.

Table 9 describes the obtained result for the ER, AC, and

TC of applying fuzziﬁcation approaches and defuzziﬁcation

methods using the Hadoop framework. This step’s primary

VOLUME 9, 2021 17973

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 9. ER, AC, and TC of fuzzification method/defuzzification approaches using Hadoop framework.

TABLE 10. Parameters settings of our proposed (CNN+FNN) deep learning model.

purpose is to demonstrate the effectiveness of applying

the Hadoop framework on the fuzziﬁcation techniques and

defuzziﬁcation process. Similarly, in this phase, the deep

learning hybrid model keeps the same parameters as dis-

played in Table 5; the Fast-text approach is used as the word

embedding technique, and at each time, the fuzziﬁcation

approach /defuzziﬁcation methods are changed.

From Table 9, we remark that the Hadoop framework

reduces the execution time of all 12 fuzziﬁcation method/

defuzziﬁcation approach. Also, it increases the AC and

decreases the ER. For example, in the case of Gaussian

MF/Center of Sums incorporation, the TC is reduced from

22.01s, as shown in Table 8, to 4.402s, as exhibited in Table 9.

The AC is increased from 89.75%, as illustrated in Table 8,

into 94.87%, as presented in Table 9. The ER is decreased

from 10.25%, as described in Table 8, to 5.13%, as depicted

in Table 9. Therefore, in this experiment, we have learned

two remarks. First, the Hadoop framework possesses a signif-

icant ability to improve our proposed FDLC’s performance.

Second, the Gaussian MF/Center of Sums combination is

proven its effectiveness compared to eleven other varieties.

For that, we have decided to use this Gaussian MF/Center of

Sums as a fuzziﬁcation approach/defuzziﬁcation method in

our proposed FDLC.

C. EXPERIMENT 3

In this experiment, several FDLCs have been constructed

harnessing different parameters for each layer. The param-

eter settings employed for our proposed deep learning

(CNN+FFNN) model such as the number of convolutional

layers, number of pooling layers, number of fully connected

layers, number of the hidden layers, the used activation

functions, ﬁlter size, regularizer, number of ﬁlters, dropout

options, window size, vocabulary size and number of epochs

are described in Table 10.

For evaluation of our proposal, in this experiment, we take

either 1,2,3,4,5,6, or 7 convolution layers, also we take either

1,2,3,4,5,6,or 7 pooling layers, then we vary the sizes of

ﬁlter such as 3 ×3,4 ×4,5 ×5,7 ×7,9 ×9,10 ×10,11 ×

11,12 ×12. The number of these used ﬁlters is varied from

15 to 135. Also, we take either 3,4,5,6,7 or 8 hidden layers.

For the other parameters such as embedding input matrix,

dropout option, the used activation functions, regularizer,

window size, vocabulary size, and the number of epochs have

been made steady along with the performed changes because

these parameters have not demonstrated any enhancement

in the accuracy of our proposal. In all constructed FDLCs,

the number of convolutional layers, number of pooling layers

has been changed along with other used parameters such

17974 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 11. Parameters settings for our parallel FDLCs.

as the size of ﬁlters, number of hidden layers, number of

ﬁlters, the word embedding is set to Fast-text, and the fuzzi-

ﬁcation method/defuzziﬁcation approach is set to Gaussian

MF/Center of Sums method. The conﬁguration of parameter

settings of all the 18 parallel FDLCs is illustrated in Table 11.

As presented in Table 11, the difference between

FDLC1 and FDLC2 is in the number of hidden layers, such as

the FDLC1 has three hidden layers and FDLC2 has four hid-

den layers, and in the size of ﬁlters such as the FDLC1 used

two ﬁlters of size 4 ×4 and 5 ×5, and FDLC2 applied two

ﬁlters of size 5×5 and 7×7. From Table 12 we remark that the

accuracy of FDLC1 is equal to 91.30% and those of FDLC2 is

equal to 94.04% So, there is a signiﬁcant amelioration for

passing from FDLC1 into FDLC2.

Based on FDLC1 and FDLC2, we have been built two

novel models, which are called FDLC3 and FDLC4, in which

we ﬁxed the number of hidden layers, and we varied the size

of the used ﬁlters to ﬁnd the causes that make FDLC2 better

than FDLC1. From FDLC2 and FDLC4, we note that the ﬁl-

ter 7×7 increases the classiﬁcation rate by 0.74%. Also, From

FDLC1 and FDLC4, we remark that using four hidden layers

in FDLC4 instead of three hidden layers in FDLC1 raises the

classiﬁcation by 4.74%.

At this point, the most efﬁcient model among the four

constructed models is the FDLC2. So, based on this FDLC 2,

we will build two other models, which are called FDLC5 and

FDLC6. In FDLC5, we add a new ﬁlter size 9 ×9, and in

FDLC6, we boost the number of used ﬁlters to 45, and we

also append the ﬁlter size 9 ×9 to other used ﬁlters size.

Consequently, The ﬁlter 9 ×9 grows the classiﬁcation rate

by 0.94%, and the increase in the number of used ﬁlters to

45 leads to an increase in the classiﬁcation rate by 3.69%.

At this actual moment, the FDLC6 is better than all previ-

ously suggested models; likewise, we build two other models

based on FDLC6, which are called FDLC7 and FDLC8.

In FDLC7, we attach the ﬁlter 11 ×11 to different adopted

ﬁlters size in FDCL6, and in FDLC8, we also add the ﬁl-

ter 11 ×11, and we boost the number of used ﬁlters to

90. According to a comparative study between FDLC7 and

FDLC8, we perceive that the classiﬁcation rate is decreased

by 0.09% when we add the ﬁlter 11×11. Therefore in the next

experiment, we will use the ﬁlter 10 ×10 and ﬁlter 12 ×12

to discover the optimal ﬁlter among 9 ×9,10 ×10,11 ×11

and12 ×12. In FDLC8, the classiﬁcation rate is increased

by 5.24%. We conclude that the raise in the number of used

ﬁlters leads to an increase in the classiﬁcation rate.

VOLUME 9, 2021 17975

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 12. ER, AC and TC of different proposed parallel FDLCs.

Presently, the most powerful model is FDLC8. Therefore,

based on this FDLC8 model, we have been constructed both

models FDLC9 and FDLC10. These novel models keep the

same conﬁguration as FDLC8. The only difference is the

added ﬁlter size, such as in FDLC9, we remove the ﬁlter

11 ×11, and we adopt the ﬁlter with size 10 ×10, and

in FDLC10, we eliminate the ﬁlter with size 11 ×11, and

we add the ﬁlter with size 12 ×12. The ﬁlter 11 ×11 is

removed because it gives rise to a decreasing in the classiﬁca-

tion performance, as presented in FDLC7. According to the

obtained result, as explained in Table 12, the ﬁlter 10 ×10

used in FDLC9 increases the classiﬁcation performance by

1.03%, and the ﬁlter 12 ×12 used in the FDLC10 decreases

the accuracy by 0.97% compared to the FDLC8. In this

experiment, we observe that the ﬁlter size, which starts by

3×3 to 10 ×10, gives rise to an increase in the classiﬁcation

rate. Still, once we arrive at a ﬁlter size greater or equal to

11 ×11, the classiﬁcation performance starts to decrease.

Thence, we deemed the size ﬁlter 10 ×10 as an optimal size

value of our suggested FDLC.

Accordingly, to all previously performed experiments,

FDLC9 is the most efﬁcient model. Thus, based on this

FDLC9 model, we form two different models: FDLC11 and

FDLC12. In FDLC11, we vary the number of hidden lay-

ers, and for FDLC12, we raise the number of ﬁlters. From

the empirical results, we notice that the augmentation in

the number of hidden layers leads to 0.86% augmentation

in the classiﬁcation rate for the FDLC11 model. Also, for

the model 12, we observe that the rise in the number of

used ﬁlters increases the accuracy by 0.39%. Based on these

results, we build a novel FDLC13 that augments the number

of hidden layers and the number of employed ﬁlters at once.

The empirical outcome proves that the FDLC13 achieved the

best accuracy at this current time that equal to 96.89%.

Once again, we create two novel models, which are

FDLC14 and FDLC15. In FDLC14, we augment the number

of hidden layers to 6, and in FDLC15, we raise the number

of used ﬁlters to 360. Accordingly, to the empirical results,

we show that the FDLC14 increases the classiﬁcation rate by

0.36% compared to FDLC13. But the augmentation in the

number of used ﬁlters made in FDLC15 provokes a signif-

icant decrease in classiﬁcation rate compared to FDLC13.

In comparison, it reduced the accuracy from 96.89% in

FDLC13 to 78.29 in FDLC15. These considerable decreases

demonstrate that the optimal number of ﬁlters value is

close to 180.

17976 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 29. AC and ER of all evaluated FDLCs of our proposal.

Based on the previous experiment, we create four other

novel models, which are FDLC16, FDLC17, FDLC18, and

FDLC19. These models’ objective is to ﬁnd the possible

interval in which the optimal value of the number of used

ﬁlters exists. In the FDLC16, we used 270 as the number

of adopted ﬁlters; for the FDLC17, we employed 225 as

the number of employed ﬁlters; In the FDLC18, we utilized

200 as the number used ﬁlters. Finally, in the FDLC19,

we used 190 as the number of applied ﬁlters. On the report

of experimental results, the classiﬁcation rate augments when

the number of used ﬁlters is close to 180. So the optimal value

will be in the interval [180;190].

To determine the optimal value, we assemble three mod-

els, which are FDLC20, FDLC21, Model 22. In FDLC20,

we employed 185 as the number of ﬁlters. While, In FDLC21,

the number of used ﬁlters is 183. Also, for FDLC22, the num-

ber of applied ﬁlters is 187. The preliminary outcome proved

that the number of used ﬁlters’ optimal value would be in the

interval [183;185]. So, in the previous experiment, we com-

puted the classiﬁcation rate for 183 and those of 185. In this

experiment, we will create a novel model called FDLC23,

in which the number of used ﬁlters equal to 184–acquired

experimental results of the FDLC20, which is utilized the

183 as a number of the used ﬁlters show that its classiﬁcation

rate is equal to 97.25%. It is currently the highest classiﬁca-

tion rate. So, we can deduce that the optimal value of the used

ﬁlters is 183.

Based on FDLC20, we change the number of hidden

layers to determine the most efﬁcient number. To do that,

we develop four models, which are FDLC24, FDLC25,

FDLC26, and FDLC27. In FDLC24, we used the seven as

the number of hidden layers. Also, in FDLC25, we change the

number of convolutional layers to 2, and we alter the number

of pooling layer to 2. Then in FDLC26, we modify at the same

time the number of convolutional, pooling, and hidden layers.

Also, in FDLC27, we change the number of hidden layers to

eight. Then. Consequently, FDLC24 augments the classiﬁca-

tion rate by 0.78%. Thus, FDLC25 increases the classiﬁcation

rate by 0.67% compared to FDLC20. Hence, the accuracy

in FDLC26 is raised to 99.17%. But in FDLC27 has less

accuracy compared to FDLC26, which is equal to 98.39%.

According to FDLC26 and FDLC27, we deduced that the

optimal number of hidden layers is seven.

Based on the previous empirical results, we construct ﬁve

models, in which we vary at the same time, the number of

convolutional and pooling layers. These models are FDLC28,

FDLC29, FDLC30, FDLC31, and FDLC32, as described

in Table 11. On the report of the experiment results for the

last ﬁve models, we remark that 6 is the optimal number of

the convolutional and pooling layers. Therefore, according

to the presented results in Table 12 and in Fig. 29, the most

efﬁcient introduced FDLC in this work is the FDLC31 with

an accuracy equal to 99.83%. Fig. 30 illustrates the ﬁnal

architecture of our proposed fuzzy deep learning classiﬁer

after we carried out several experiments to ﬁnd the optimal

values in each phase.

From Table 12 and Fig. 31, we remark that our pro-

posal has a higher classiﬁcation rate but offers by higher

VOLUME 9, 2021 17977

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 30. TC of all evaluated FDLCs of our proposal.

FIGURE 31. The architecture of our proposal after multiple experiments.

execution time. So, to prevent this problem, we decide to

use more Hadoop computing nodes. Fig. 32 represents the

experimental result after the execution of our proposal on

several Hadoop computing nodes. Therefore, for this Fig. 32,

we observe that the execution time decreased from 25.97s

used ﬁve computing machine to 0.0089s used twelve com-

puting machines.

D. EXPERIMENT 4

We experimented with demonstrating the obtained experi-

mental results by applying our proposed fuzzy deep learn-

ing classiﬁer. Its objective is to compare our proposal with

multiple other methods from the literature. The selected

works for these experiments are a ‘‘Multi-task learning

model based on Multi-scale CNN and LSTM for senti-

ment classiﬁcation’’ suggested by Jin et al. [29]. This work

merges CNN and LSTM to improve sentiment analysis per-

formance. 86.25%. a ‘‘Stacked Residual Recurrent Neural

Networks With Cross-Layer Attention for Text Classiﬁca-

tion’’ proposed by Lan et al. [31]. This approach integrates

the stacked residual RNN and cross-layer attention tech-

nique. Its objective is to capture and detect more linguistic

features, thus employ them for the sentiment analysis task

89%. A ‘‘Comparison Enhanced Bi-LSTM with MultiHead

Attention (CE-B-MHA)’’ developed by Lin et al. [32]. This

paper, called CE-B-MHA, combines multi-Head attention to

extracting global features and the strength of Bi-LSTM to

discover the local sequence features. A ‘‘hybrid method for

17978 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

FIGURE 32. The execution time of our proposal illustrates in Fig. 31.

bilingual text sentiment classiﬁcation based on deep learn-

ing’’ implemented by Liu et al. [33]; this suggested method

integrates NB, SVM machine learning algorithm with RNN,

and LSTM deep learning model. And ﬁnally, ‘‘Intelligent

asset allocation via market sentiment views’’ designed by

Xing et al. [36]; also, this proposed method combines the

evolving clustering with LSTM deep learning model. Our

proposal and all these chosen approaches from the literature

are applied to both used datasets in this work, which are

Sentiment140 and COVID-19_Sentiments datasets. Fig. 33

depicted the obtained results in terms of classiﬁcation rate

and consumption time by applying our proposal and other

selected methods on the Sentiment140 dataset.

FIGURE 33. Experimental result of the comparative study carried out

between our proposal and other works for the literature using

Sentiment140 dataset.

Fig. 34 illustrates the obtained results after the application

of our suggested FDLC and other selected techniques from

the literature on the COVID-19_Sentiments dataset

From Fig. 33 and Fig. 34, we observe that our pro-

posal based on the deep learning model (CNN+FFNN),

Hadoop framework, and Mamdani fuzzy system outperforms

the other used approaches (Jin et al. [29], Lan et al. [31],

Lin et al. [32], Liu et al. [33], and Xing et al. [36]) with

accuracy equal to 99.83%,99.99%, and execution time equal

to 0.0089s, and 0.00534 on Sentiment140 dataset and

COVID-19_Sentiments dataset respectively. Our proposal’s

FIGURE 34. Experimental result of the comparative study carried out

between our proposal and other works for the literature using

COVID-19_Sentiments dataset.

signiﬁcant effectiveness and performance are due to the appli-

cation of fuzzy logic, integration of CNN and FFNN deep

learning models, and the utilization of twelve computing

nodes in the Hadoop cluster.

For more evaluation of our proposed fuzzy deep learning

classiﬁer, we did another experiment that compares our pro-

posal with the other selected approaches from the literature

(Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33],

and Xing et al. [36]). But in this case, the evaluations used

criterion will be TPR, FNR, TNR, FPR, PR, KS, and FS,

as presented in the performance metrics subsection. This

comparative is performed using both datasets, which are

Sentiment140 and COVID-19_Sentiments. Its experimental

results are illustrated in Table 13.

Based on the experimental results depicted in Table 13,

we observe that our proposal (CNN+FNN+FuzzyLogic+

Hadoop) outperforms the other selected approaches from the

literature in both datasets (Sentiment140 and COVID-19_

Sentiments) and at the level of TPR(99.98%, 99.81%),

FNR(0.02%, 0.19%), TNR(98.61%,98.91%), FPR(1.39%,

1.09%) PR(98.55%,97.75%), KS(97.96%,98.96%) and

FS(98.35%,96.54%)

E. EXPERIMENT 5

In this last experiment, we have evaluated the effectiveness

of our proposed FDLC in terms of complexity, convergence,

and stability. This experiment serves to compare our pro-

posed FDLC, Jin et al. [29], Lan et al. [31], Lin et al. [32],

Liu et al. [33], and Xing et al. [36] and to discover the more

efﬁcient method among all evaluated approaches in terms of

complexity, convergence, and stability.

F. COMPLEXITY

The complexity of a model is a measurement of the time

consumption and space used by a model. in this subsec-

tion we evaluated the time complexity and space complex-

ity of our proposed FDLC, Jin et al. [29], Lan et al. [31],

Lin et al. [32], Liu et al. [33], and Xing et al. [36]. Addition-

ally, Table 14 presents the experimental results obtained after

VOLUME 9, 2021 17979

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 13. The Experimental outcome of TPR, FNR, TNR, FPR, PR, KS, and FS. of the our FDLC,Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33], and

Xing et al. [36] approaches.

TABLE 14. Space complexity of the our FDLC, Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33], and Xing et al. [36] approaches.

we measure the space complexity of our FDLC model and

other chosen models in terms of the number of executed

operations, and the number of the network parameters.

From Table 14, we note that our proposed FDLC

has performed multiple operations with a size equal

to (42M,102M) for COVID-19_Sentiments and Senti-

ment140 dataset, respectively. The size of our FDLC

parameters is equal to (21.76M,47.82M) for COVID-19_

Sentiments and Sentiment140 dataset, respectively. As the

experimental result shown, our proposed FDLC requires

much lower space computational complexity compared to

Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33],

and Xing et al. [36] approaches.

Table 15 shows the empirical results obtained after measur-

ing the time computational complexity of our FDLC model

and other chosen models in terms of training time consump-

tion and testing time consumption.

From Table 15, we observe that our proposed FDLC has

consumed a training time equal to (0.00534s,0.0089s) for

COVID-19_Sentiments and Sentiment140 dataset, respec-

tively. The testing time of our FDLC model is equal to

(0.00178s,0.00287s) for COVID-19_Sentiments and Sen-

timent140 dataset, respectively. As the empirical result

described, our proposed FDLC requires much lower time

computational complexity compared to Jin et al. [29],

Lan et al. [31], Lin et al. [32], Liu et al. [33], and

Xing et al. [36] approaches.

G. CONVERGENCE

Our proposed FDLC will be proved if it is convergent or

not according to the speciﬁc number of learning iterations,

which is useful to control the time-consuming. The fol-

lowing formula speciﬁes the condition of the convergent

17980 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

TABLE 15. Time complexity of the our FDLC, Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33], and Xing et al. [36] approaches.

trend:

Errorformer −Errorcurrent ≥T(40)

where Errorformer is our FDLC average error of the previous

training iteration, Error_current is our FDLC average error

of the present training iteration, and Tis the threshold that

determinate the convergence rate value and after multiple

experiments, we set this threshold value to 0.0001.

FIGURE 35. Convergence rate of our proposed FDLC in both used

datasets.

Our FDLC average error is computed as follows:

Error =1

2∗PN

i=1PM

j=1(c−clabel )2

N(41)

where Nis the total number of sentences in the given dataset,

Mis the total number of FDLC output units, cis the required

output class label, and clabel the obtained output class label.

If the equation (40) is met, our proposed FDLC can be

deemed convergent, and the algorithm is trained until the

FDLC’s average error satisﬁed the condition. Otherwise, our

proposed FDLC is not convergent. Fig. 35 shows the con-

vergence rate of our proposed FDLC in both used datasets.

From Fig. 35, we note that our proposed FDLC converged

towards the threshold value 0.0001 after our FDLC reached

the iterations 200, and 400 for COVID-19_Sentiments and

Sentiment140 dataset, respectively.

TABLE 16. Convergence rate of our FDLC, Jin et al. [29], Lan et al. [31],

Lin et al. [32], Liu et al. [33], and Xing et al. [36] approaches.

TABLE 17. Stability of the our FDLC, Jin et al. [29], Lan et al. [31],

Lin et al. [32], Liu et al. [33], and Xing et al. [36] over five

cross-validations.

Table 16 represents the convergence rate of our FDLC,

Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33],

and Xing et al. [36] approaches. From Table 16, we deduce

that our proposed FDLC converges very fast compared to

other approaches.

H. STABILITY

We determine if our FDLC is stable or not by comput-

ing the mean standard deviation (MSD) corresponding to

the classiﬁcation-based models’ accuracy on the different

VOLUME 9, 2021 17981

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

ﬁve cross-validations of the given dataset. Our FDLC is

trained with the same hyper-parameters and conﬁgurations

but with different ﬁve cross-validations dataset. Table 17,

shows the obtained average accuracy (AVA) and mean devi-

ation standard of our FDLC, Jin et al. [29], Lan et al. [31],

Lin et al. [32], Liu et al. [33], and Xing et al. [36] in the ﬁve

cross-validations of the both used datasets in this work.

From Table 17, we note that our FDLC is practically always

capable of achieving higher average accuracy with a very

low mean standard deviation. This suggests that our FDLC

is more stable than other algorithms.

VIII. CONCLUSION

The diversity in social media platforms leads to massive

usage by the personals, and they deem these platforms as

an efﬁcient tool of communication. Therefore the feedback

of users on these platforms has generated the big sentiment

analysis data to learn. At present, NLP, Hadoop framework,

and deep learning models provide a set of tools that aim to

capture and detect the expressed users’ sentiments in the col-

lected massive datasets from social media platforms. In this

work, a novel parallel fuzzy deep learning classiﬁer is devel-

oped; This classiﬁer incorporates NLP text-preprocessing

methods, NLP word embedding approaches, CNN+FFNN

deep learning model, and Mamdani fuzzy system. This pro-

posal’s primary goal is to determine a signiﬁcant relationship

between word embedding approaches and both used deep

learning models (CNN+FNN). Also of its objective is to deal

with ingrained ambiguities in data by applying the Mamdani

fuzzy system. The proposed classiﬁer is parallelized using the

Hadoop framework for avoiding the long-running problem

and improve the classiﬁcation rate.

In our parallel fuzzy deep learning classiﬁer, we proposal

a new structure that works with pre-processing technique,

word embedding algorithms such as FastText, Word2vec,

and GolVe under FFNN, CNN and MFS algorithms. Fur-

thermore, the ﬁrst step of our work is the application of

text pre-processing for reducing the noisy data, after that

we have applied the word embedding method to transform

the text based-data to numerical based data, then we employ

the CNN deep learning model to detect and extract features

from the obtained embedding matrix in the previous step.

In addition we have used the FFNN to compute the NSS and

PSS, ﬁnally we applied the Mamdani fuzzy approaches to

deal with ingrained ambiguities for NSS and PSS values.

Six experiments were performed to demonstrate the effec-

tiveness of our developed classiﬁer. In the ﬁrst experi-

ment, we have executed our approach with and without text

pre-processing techniques, and we deduce that the application

of text pre-processing methods reduce the error rate for the

classiﬁcation of Sentiment140 from 35.59% to 5.98% and it

reduces from 29.04% to 3.61% in the COVID-19 Sentiments

dataset

In the second experiment, we have evaluated different

used word embedding methods (i.e., FastText, GolVe, and

Word2Vec). The experimental result shows that Fast-text has

less error rate compared to Word2vec and GloVe techniques,

which is equal to 8.28%, and 5.51% in the case of senti-

ment140, and COVID-19 Sentiments datasets, respectively.

In the third experiment, we carried out a comparative

study between twelve aggregation fuzziﬁcation approach

/defuzziﬁcation methods to ﬁnd the most efﬁcient fuzziﬁca-

tion method and the better defuzziﬁcation approach. These

aggregation methods are MF/Centroid of Area, Triangu-

lar MF/Bisector of Area, Triangular MF/Weighted Average,

Triangular MF/Center of Sums, Trapezoidal MF/Centroid

of Area, Trapezoidal MF/Bisector of Area, Trapezoidal

MF/Weighted Average, Trapezoidal MF/Center of Sums,

Gaussian MF/Centroid of Area, Gaussian MF/ Bisector of

Area, Gaussian MF/ Weighted Average, and Gaussian MF/

Center of Sums. This experiment shows that the Gaussian

MF/Center of Sums method raises the classiﬁcation rate to

89.75% and decreases the error rate to 10.25%. Also, this

combination is better in terms of execution time that equals

22.01s.

In the fourth experiments, 32 deep learning models were

built, harnessing different parameters for each layer. The most

efﬁcient FDLC model is the FDLC31, which consists of a text

preprocessing phase, one embedding layer, six convolutional

layers, six pooling layers, 183 ﬁlters, 5×5, 7×7, 9×9, 10×10

size of ﬁlters, one fully connected layer, seven hidden layers,

one output layer, Gaussian fuzziﬁcation method, inference

engine process, and Sum of center defuzziﬁcation process.

This FDLC31 has achieved an accuracy equals to 99.97% if

applied to the COVID-19_Sentiments dataset, and it equals to

99.83%. if the FDLC31 is used on the Sentiment140 dataset

In the ﬁfth, we have compared our proposed FDLC31 with

Jin et al. [29], Lan et al. [31], Lin et al. [32], Liu et al. [33],

and Xing et al. [36] We deduce that our proposal outperforms

all these used approaches in terms of TPR, TNR, FPR, FNR,

ER, PR, AC, KS, FS, and TC.

Finally, we have evaluated the performance our proposed

FDLC31 with Jin et al. [29], Lan et al. [31], Lin et al. [32],

Liu et al. [33], and Xing et al. [36] in terms of complex-

ity, convergence, and stability. We reveal that our approach

outperforms all other approaches in terms of speed conver-

gence, much lower computational complexity and it is more

stable.

Our future work is the combination of our approach with

the wireless sensor networks. The main goal of these future

work is to classify the collected data by sensor nodes, taking

into consideration multiple parameters associated with fea-

ture detection and extraction and data aggregation.

REFERENCES

[1] M. Anagha, R. R. Kumar, K. Sreetha, and P. C. Reghu Raj, ‘‘Fuzzy

logic based hybrid approach for sentiment analysisl of malayalam movie

reviews,’’ in Proc. IEEE Int. Conf. Signal Process., Informat., Com-

mun. Energy Syst. (SPICES), Feb. 2015, pp. 1–4, doi: 10.1109/SPICES.

2015.7091512.

[2] J. B. Sathe and M. P. Mali, ‘‘A hybrid sentiment classiﬁcation method

using neural network and fuzzy logic,’’ in Proc. 11th Int. Conf. Intell. Syst.

Control (ISCO), Jan. 2017, pp. 93–96, doi: 10.1109/ISCO.2017.7855960.

17982 VOLUME 9, 2021

F. Es-Sabery et al.: Sentence-Level Classification Using Parallel FDLC

[3] M., Biltawi, W. Etaiwi, S. Tedmori, A. Shaout, ‘‘Fuzzy based sentiment

classiﬁcation in the Arabic language,’’ in Intelligent Systems and Applica-

tions (Advances in Intelligent Systems and Computing), vol. 868, K. Arai,

S. Kapoor, and R. Bhatia, Eds. Cham, Switzerland: Springer, 2019, doi:

10.1007/978-3-030-01054-6_42.

[4] U. Kumari, A. K. Sharma, and D. Soni, ‘‘Sentiment analysis of smart

phone product review using SVM classiﬁcation technique,’’ in Proc. Int.

Conf. Energy, Commun., Data Anal. Soft Comput. (ICECDS), Aug. 2017,

pp. 1469–1474, doi: 10.1109/ICECDS.2017.8389689.

[5] Y. Thein and T. N. Khin, ‘‘Comparing SVM and KNN algorithms for

Myanmar news sentiment analysis system,’’ in Proc. 6th Int. Conf. Comput.

Data Eng. (ICCDE), New York, NY, USA: Association for Computing

Machinery, 2020, , pp. 65–69, doi: 10.1145/3379247.3379293.

[6] S. N. Singh and T. Sarraf, ‘‘Sentiment analysis of a product based on

user reviews using random forests algorithm,’’ in Proc. 10th Int. Conf.

Cloud Comput., Data Sci. Eng. (Conﬂuence), Jan. 2020, pp. 112–116, doi:

10.1109/Conﬂuence47617.2020.9058128.

[7] W. P. Ramadhan, S. T. M. T. Astri Novianty, and S. T. M. T. Casi

Setianingsih, ‘‘Sentiment analysis using multinomial logistic regression,’’

in Proc. Int. Conf. Control, Electron., Renew. Energy Commun. (ICCREC),

Sep. 2017, pp. 46–49, doi: 10.1109/ICCEREC.2017.8226700.

[8] S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, ‘‘Sentiment analysis

of review datasets using naive Bayes and K-NN classiﬁer,’’ in Proc.

IJIEEB, Kolkata, India, vol. 8, 2016, pp. 54–62, doi: 10.5815/ijieeb.

2016.04.07.

[9] F. Es-Sabery and A. Hair, ‘‘Animproved ID3 classiﬁcation algorithm based

on correlation function and weighted attribute*,’’ in Proc. Int. Conf. Intell.

Syst. Adv. Comput. Sci. (ISACS), Dec. 2019, pp. 1–8.

[10] A. Severyn and A. Moschitti, ‘‘Twitter sentiment analysis with deep

convolutional neural networks,’’ in Proc. 38th Int. ACM SIGIR Conf.