ArticlePDF Available

Abstract and Figures

The growing popularity of social media sites has generated a massive amount of data that attracted researchers, decision-makers, and companies to investigate people's opinions and thoughts in various fields. Sentiment analysis is considered an emerging topic recently. Decision-makers, companies, and service providers as well-considered sentiment analysis as a valuable tool for improvement. This research paper aims to obtain a dataset of tweets and apply different machine learning algorithms to analyze and classify texts. This research paper explored text classification accuracy while using different classifiers for classifying balanced and unbalanced datasets. It was found that the performance of different classifiers varied depending on the size of the dataset. The results also revealed that the Naive Byes and ID3 gave a better accuracy level than other classifiers, and the performance was better with the balanced datasets. The different classifiers (K-NN, Decision Tree, Random Forest, and Random Tree) gave a better performance with the unbalanced datasets.
Content may be subject to copyright.
www.astesj.com 1683
Sentiment Analysis in English Texts
Arwa Alshamsi1, Reem Bayari1, Said Salloum2,3,*
1Faculty of Engineering & IT, The British University, Dubai, 345015, UAE
2Research Institute of Sciences & Engineering, University of Sharjah, Sharjah, 27272, UAE
3School of Science, Engineering, and Environment, University of Salford, Manchester, M5 4WT, UK
A R T I C L E I N F O
A B S T R A C T
Article history:
Received: 23 September, 2020
Accepted: 24 December, 2020
Online: 28 December, 2020
The growing popularity of social media sites has generated a massive amount of data that
attracted researchers, decision-makers, and companies to investigate people's opinions and
thoughts in various fields. Sentiment analysis is considered an emerging topic recently.
Decision-makers, companies, and service providers as well-considered sentiment analysis
as a valuable tool for improvement. This research paper aims to obtain a dataset of tweets
and apply different machine learning algorithms to analyze and classify texts. This research
paper explored text classification accuracy while using different classifiers for classifying
balanced and unbalanced datasets. It was found that the performance of different classifiers
varied depending on the size of the dataset. The results also revealed that the Naive Byes
and ID3 gave a better accuracy level than other classifiers, and the performance was better
with the balanced datasets. The different classifiers (K-NN, Decision Tree, Random Forest,
and Random Tree) gave a better performance with the unbalanced datasets.
Keywords:
Sentiment Analysis
Balanced Dataset
Unbalanced Dataset
Classification
1. Introduction
The recent widening expansion of social media has changed
communication, sharing, and obtaining information [14]. In
addition to this, many companies use social media to evaluate their
business performance by analysing the conversations' contents [5].
This includes collecting customers' opinions about services,
facilities, and products. Exploring this data plays a vital role in
consumer retention by improving the quality of services [6, 7].
Social media sites such as Instagram, Facebook, and Twitter offer
valuable data that can be used by business owners not only to track
and analyse customers' opinions about their businesses but also
that of their competitors [811]. Moreover, these valuable data
attracted decision-makers who seek to improve the services
provided [8, 9, 12, 13].
In this research paper, several research papers that studied
Twitter's data classification and analysis for different purposes
were surveyed to investigate the methodologies and approaches
utilized for text classification. The authors of this research paper
aim to obtain open-source datasets then conduct text classification
experiments using machine learning approaches by applying
different classification algorithms, i.e., classifiers. The authors
utilized several classifiers to classify texts of two versions of
datasets. The first version is unbalanced datasets, and the second
is balanced datasets. The authors then compared the classification
accuracy for each used classifier on classifying texts of both
datasets.
2. Literature Review
As social media websites have attracted millions of users,
these websites store a massive number of texts generated by users
of these websites [1421]. Researchers were interested in
investigating these metadata for search purposes [17, 18, 2225].
In this section, a number of research papers that explored the
analysis and classification of Twitter metadata were surveyed to
investigate different text classification approaches [26] and the text
classification results.
Researchers of [27] investigated the user's gender of Twitter.
Authors noticed that many Twitter users use the URL section of
the profile to point to their blogs, and the blogs provided valuable
demographic information about the users. Using this method, the
authors created a corpus of about 184000 Twitter users labeled
with their gender. Then authors arranged the dataset for
ASTESJ
ISSN: 2415-6698
*Corresponding Author: Said Salloum, University of Sharjah, UAE. Tel:
+971507679647 Email: ssalloum@sharjah.ac.ae
Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com
Special Issue on Multidisciplinary Sciences and Engineering
https://dx.doi.org/10.25046/aj0506200
H. Tariq et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com 1684
experiments as following: for each user; they specify four fields;
the first field contains the text of the tweets and the remaining three
fields from the user's profile on Twitter, i.e., full name, screen
name, and description. After that, the authors conducted the
experiments and found that using all of the dataset fields while
classifying Twitter user's gender provides the best accuracy of
92%. Using tweets text only for classifying Twitter user's gender
provides an accuracy of 76%. In [28], the authors used Machine
Learning approaches for Sentiment Analysis. Authors constructed
a dataset consisting of more than 151000 Arabic tweets labeled as
"75,774 positive tweets and 75,774 negative tweets". Several
Machine Learning Algorithms were applied, such as Naive Bayes
(NB), AdaBoost, Support vector machine (SVM), ME, and Round
Robin (RR). The authors found that RR provided the most accurate
results on classifying texts, while AdaBoost classifier results were
the least accurate results. A study by [29] interested as well in
Sentiment Analysis of Arabic texts. The authors constructed the
Arabic Sentiment Tweets Dataset ASTD, which consists of 84,000
Arabic tweets. The number of tweets remaining after annotation
was around 10,000 tweets. The authors applied machine learning
approaches using classifiers on the collected dataset. They reported
the following: (1) The best classifier applied on the dataset is SVM,
(2) Classifying a balanced set is challenging compared to the
unbalanced set. The balanced set has fewer tweets than the
unbalanced set, which may negatively affect the classification's
reliability. In [30], the author investigated the effects of applying
preprocessing methods before the sentiment classification of the
text. The authors used classifiers and five datasets to evaluate the
preprocessing method's effects on the classification. Experiments
were conducted, and researchers reported the following findings:
Removing URL has no much effect, Removing stop words have a
slight effect, Removing Numbers have no effect, Expanding
Acronym improved the classification performance, and the same
preprocessing methods have the same effects on the classifier's
performance, NB and RF classifiers showed more sensitivity than
LR and SVM classifiers. In conclusion, the classifier's
performance for sentiment analysis was improved after applying
preprocessing methods. A study by [31] investigated Twitter
geotagged data to construct a national database of people's health
behavior. The authors compared indicators generated by machine
learning algorithms to indicators generated by a human. The
authors collected around 80 million geotagged tweets. Then
Spatial Join procedures were applied, and 99.8% of tweets were
successfully linked. Then tweets were processed. After that,
machine learning approaches were used and successfully applied
in classifying tweets into happy and not happy with high accuracy.
In [32] explored classifying sentiments in movie reviews. The
authors constructed a dataset of 21,000 tweets of movie reviews.
Dataset split into train set and test set. Preprocessing methods
applied, then two classifiers, i.e., NB and SVM, were used to
classify tweets text into positive or negative sentiment. The authors
found that better accuracy achieved using SVM of 75% while NB
has 65% accuracy. Researchers of [33] used Machine Learning
methods and Semantic Analysis for analyzing tweet's sentiments.
Authors labeled tweets in a dataset that consists of 19340 sentences
into positive or negative. They applied preprocessing methods
after that features were extracted; authors applied Machine
Learning approaches, i.e., Naïve Bayes, Maximum Entropy, and
Support Vector Machine (SVM) classifiers after that Semantic
Analysis were applied. The authors found that Naïve Bayes
provided the best accuracy of 88.2, the next SVM of 85.5, and the
last is Maximum entropy of 83.8. The authors reported as well that
after applying Semantic Analysis, the accuracy increased to reach
89.9. In [34], the authors analyzed sentiments by utilizing games.
Authors introduced TSentiment, which is a web-based game.
TSentiment used for emotion identification in Italian tweets.
TSentiment is an online game in which the users compete to
classify tweets in the dataset consists of 59,446 tweets. Users first
must evaluate the tweet's polarity, i.e., positive, negative, and
neutral, then users have to select the tweet's sentiment from a pre-
defined list of 9 sentiments in which 3 sentiments identified for the
positive polarity, 3 sentiments identified for negative polarity.
Neutral polarity is used for tweets that have no sentiment
expressions. This approach for classifying tweets was effective.
A study by [35] examined the possibility of enhancing the
accuracy of predictions of stock market indicators using Twitter
data sentiment analysis. The authors used a lexicon-based
approach to determine eight specific emotions in over 755 million
tweets. The authors applied algorithms to predict DJIA and
S&P500 indicators using Support Vectors Machine (SVM) and
Neural Networks (NN). Using the SVM algorithm in DJIA
indication, the best average precision rate of 64.10 percent was
achieved. The authors indicated that the accuracy could be
increased by increasing the straining period and by improving the
algorithms for sentiment analysis. authors conclude that adding
Twitter details does not improve accuracy significantly. In [36],
the authors applied sentiment analysis on around 4,432 tweets to
collect opinions on Oman tourism, they build a domain-specific
ontology for Oman tourism using Concept Net. Researchers
constructed a sentiment lexicon based on three existing lexicons,
SentiStrength, SentWordNet, and Opinion lexicon. The authors
randomly divide 80% of the data for the training set and 20% for
testing. The researcher used two types of semantic sentiment,
Contextual Semantic Sentiment Analysis, and Conceptual
Semantic Sentiment Analysis. Authors applied Nave Base
supervised machine learning classifier and found that using
conceptual semantic sentiment analysis expressively improves the
sentiment analysis's performance. A study by [37] used sentiment
analysis and subjectivity analysis methods to analyze French
tweets and predict the French CAC40 stock market. The author
used a French dataset that consists of 1000 positive and negative
book reviews. The author trained the neural network by using three
input features on 3/4 of the data, and he tested on the remaining
quarter. The achieved accuracy 80% and a mean absolute
percentage error (MAPE) of 2.97%, which is less than the work
reported by Johan Bollen. The author suggested adding more
features as input to improve the performance. In [38], the authors
examined the relationship between Twitter's social emotion and
the stock market. Researchers collected millions of tweets by
Twitter API. Researchers retrieved the NASDAQ market closing
price in the same period. The authors applied the correlation
coefficient. Authors conclude that emotion-related terms have
some degree of influence on the stock market overall trend, but it
did not meet standards that can be used as a guide to stock-market
prediction. While at the same time, there was a fairly close
association between positive, negative, and angry mood-words.
Particularly sad language tends to have a far greater influence on
the stock market than other groups. In [39], the authors
investigated telecommunications companies' conversation on
H. Tariq et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com 1685
social media Twitter ('indihome,' in Indonesia ). The authors
collected 10,839 raw data for segmentation. The authors collected
data: over 5 periods of time in the same year. Authors found that
most of the tweets (7,253) do not contain customers' perception
toward Indihome. Only 3,586 tweets are containing the perception
of customers toward Indihome. Most of the data contained
perception reveal that the customers have the negative perception
(3,119) on Indihome and only 467 tweets contain positive
perceptions; the biggest number of negative perceptions relate to
the first product, the second relates to a process, third relate to
people, and fourth relate to pricing. Researchers of [40] examined
prevalence and geographic variations for opinion polarities about
e-cigarettes on Twitter. Researchers collected data from Twitter by
pre-defined seven keywords. They classified the tweets into four
categories: Irrelevant to e-cigarettes, Commercial tweets, organic
tweets with attitudes (supporting or against or neutral) the use of
e-cigarettes, and the geographic locations information city and
state. Researchers selected six socio-economic variables from
Census data 2014 that are associated with smoking and health
disparities. Researchers classified the tweets based on a
combination of human judgment and machine-learning
algorithms, and two coders classified a random sample of 2000
tweets into five categories. The researcher applied a multilabel
Nave Bayes classifier algorithm; the model achieved an accuracy
of 93.6% on the training data. Then the researcher applied the
machine learning algorithm to a full set of collected tweets and
found the accuracy of the validation data was 83.4%. To evaluate
the socio-economic impact related to public perception regarding
e-cigarette use in the USA, researchers calculated the Pearson
correlation between prevalence and percentage of opinion
polarities and selected ACS variants for 50 states and the District
of Columbia. In [41], the authors Investigated the link between any
updates on certain brands and their reaction. Researchers gathered
geographic locations based on the data to see consumer
distribution. Researchers collected Twitter data by using the REST
API. In total, 3,200, from ten different profiles, then used
sentiment analysis to differentiate between clustered data
expressed positively or negatively then resampled the result in an
object model and cluster. For every answer, the researcher has been
evaluated for the textual sentiment analysis from the object model.
Researchers used AFINN based word list and Sentiments of
Emojis to run comprehensive sentiment analysis; for the data that
not existed in the word list, researcher added a separated layer to
an analysis by using emoji analysis on top of sentiment analysis,
and authors did not see any difference in the level of accuracy
when applying this extra layer. The researcher found some
Sentiment Analysis weaknesses related to the misuse of emoji, the
use of abbreviated words or terms of slang, and the use of sarcasm.
In [42], the authors proposed an application that can classify a
Twitter content into spam or legitimate. Auhtors used an integrated
approach, from URL analysis, Natural Language Processing, and
Machine Learning techniques. Auhtors analyzed the URL that
derived from the tweets, then convert URLs to their long-form,
then compare URLs with Blacklisted URLs, then compare them
with a set pre-defined expressions list as spam; the presence of any
of these expressions can conclude that the URL is spam. After
cleaning data, the stemmed keywords are compared with the per
set of identified spam words and, if a pre-defined expressions list
are found in the tweet, then the user is classified as spam. Six
features were used for classification. The training set has 100
instances with six features and a label. The author used Nave-
Bayes algorithm. Authors manually examined 100 users and found
(60 were legitimate and 40 were spam) then the sampled checked
by the application and the result presented that 98 were classified
correctly.
3. Proposed Approach
In this work, the authors implemented and evaluated different
classifiers in classifying the sentiment of the tweets. It’s by
utilizing RapidMiner software. Classifiers were applied on both
balanced and unbalanced datasets. Classifiers used are Decision
Tree, Naïve Bayes, Random Forest, K-NN, ID3, and Random
Tree.
4. Experiment Setup
In this section, the dataset is described as well as the settings
and evaluation techniques are used in the experiments have been
discussed. The prediction for the tweet category is tested twice
the first time on an unbalanced data set and the second time on a
balanced dataset as below.
Experiments on the unbalanced dataset: Decision Tree,
Naïve Bayes, Random Forest, K-NN, ID3, and Random Tree
classifiers were applied on six unbalanced datasets.
Experiments on the balanced dataset: In this experiment,
the challenges related to unbalanced datasets were tackled by
manual procedures to avoid biased predictions and misleading
accuracy. The majority class in each dataset almost equalized
with the minority classes, i.e., many positive, negative, and
neutral, practically the same in the balanced dataset as
represented in Table 3.
4.1. Dataset Description
We obtained a dataset from Kaggle, one of the largest online
data science communities in this work. It consists of more than
14000 tweets, labeled either (positive, negative, or neutral). The
dataset was also split into six datasets; each dataset includes tweets
about one of six American airline companies (United, Delta,
Southwest, Virgin America, US Airways, and American). Firstly,
we summarized the details about the obtained datasets, as
illustrated in Table 1 below.
Table 1: Summary of obtained Dataset
American Airline Companies
Virgin
Americ
a
Unite
d
Delt
a
Southwes
t
US
Airway
s
America
n
Number
of
Tweets
504
3822
2222
2420
2913
2759
Positive
Tweets
152
492
544
570
269
336
Negativ
e
Tweets
181
2633
955
1186
2263
1960
Neutral
Tweets
171
697
723
664
381
463
4.2. Dataset Cleansing
H. Tariq et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com 1686
In this section, the authors described the followed procedure
in the dataset preparation. The authors utilized RapidMinor
software for tweet classification. Authors followed the methods
described below:
1) Splitting the dataset into a training set and test set.
2) Loading the dataset, i.e., excel file into RapidMinor software
using Read Excel operator.
3) Applying preprocessing by utilizing the below operators.
Transform Cases operator to transform text to lowercase.
Tokenize operator to split the text into a sequence of tokens.
Filter Stop words operator to remove stop words such as: is,
the, at, etc.
Filter Tokens (by length) operator: to remove token based on
the length, in this model, minimum characters are 3, and
maximum characters are 20 any other tokens that don't match
the rule will be removed.
Stem operator: to convert words into base form.
4.3. Dataset Training
Each of the datasets was divided into two-part. The first part
contains 66% of the total number of tweets of the data set, and it is
used to train the machine to classify the data under one attribute,
which is used to classify the tweets to either (positive or Negative
or Neutral). The remaining 34% of tweets were used to classify
tweets' attribute to (positive or Negative or Neutral), i.e., test set.
Figure 1: Summarization of the Process Model
4.4. Dataset Classifying
In this section, the authors described the steps in the tweet’s
classification techniques.
Set Role operator is used to allow the system to identify
sentiment as the target variable,
Select Attributes operator is used to removing any attribute
which has any missing values.
Then in the validation operator, the dataset is divided into two
parts (training and test). We used Two-thirds of the dataset to
train the dataset and the last one-third to evaluate the model.
Different machine learning algorithms are used for training
the dataset (Decision Tree, Naïve Bayes, Random Forest, K-
NN, ID3, and Random Tree).
For testing the model, the Performance operator utilized to
measure the performance of the model.
5. Experiment Results and Discussion
This section presented the experiment results in terms of
accuracy level of prediction for each classifier on both types of
datasets (balanced, unbalanced) and a comparison between the two
experiments.
5.1. Experiment results for an unbalanced dataset
Figure 2 and Table 2 present the accuracy results of the
utilized classifiers on the datasets.
Table 2: Accuracy results on unbalanced dataset
Accuracy
Virgin
Americ
a
United
Delta
Southw
est
US
Airway
s
Americ
an
Dataset
504
3822
2222
2420
2913
2759
Training
set
333
2523
1467
1597
1923
1821
Test set
171
1299
755
823
990
938
Decision
Tree
31.86%
72.03%
42.08%
50.46%
82.72%
68.98%
Naïve
Bayes
32.74%
72.38%
42.28%
51.01%
82.72%
72.21%
Random
Forest
31.86%
72.03%
42.08%
50.46%
82.72%
68.98%
K-NN
39.82%
11.66%
35.27%
50.46%
82.72%
69.43%
ID3
32.74%
72.38%
42.28%
51.01%
82.72%
72.21%
Random
Tree
31.86%
72.03%
42.08%
50.46%
82.72%
68.98%
Figure 2: Accuracy results on unbalanced airline datasets using different
classifiers
In some datasets, the classifier's accuracy results were very
high, while it was low in others. All classifier's performance on the
US airways dataset and United dataset provided the best accuracy
due to the dataset's size, which was the largest. Naïve Bayes
classifier, Decision Tree, and ID3 were mostly better than other
classifiers and were given almost the same accuracy level. The
classifiers with Virgin America dataset reported the lowest
accuracy level due to the dataset's size, which is very small.
H. Tariq et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com 1687
5.2. Experiment results for a balanced dataset
Decision Tree, Naïve Bayes, Random Forest, K-NN, ID3, and
Random Tree classifiers were applied on the five obtained
balanced datasets. (United, Delta, Southwest, and US Airways).
The dataset for each was divided into two parts. The first part
contains 66% of the total number of tweets of the data set, and it is
used to train the machine to classify the data under one attribute,
which is used to classify the tweets as either positive, Negative, or
Neutral. The remaining 34% of tweets were used to classify tweets'
attributes into (positive, Negative, or Neutral), i.e., test set.
Table 3: Number of tweets before and after balancing.
Number of instances
Percentage
Total
tweets
before
balancing
Total
tweets
after
balancing
Positive
Negative
Neutral
United
3822
8276
33%
33%
34%
Delta
2222
2635
33%
33%
34%
Southwest
2420
5518
33%
33%
33%
US Airways
2913
6608
33%
33%
33%
American
2759
5924
34%
34%
33%
After applying different algorithms on the five balanced
datasets, the performance, i.e., accuracy results, were reported in
Table 4 and Figure 3 below:
Table 4: Accuracy results on the balanced dataset
Accuracy
Virgin
Americ
a
United
Delta
Southw
est
US
Airway
s
Americ
an
Dataset
8276
2635
5518
6608
5924
8276
Training
set
5464
1743
3642
4363
3911
5464
Test set
2812
892
1876
2245
2013
2812
Decision
Tree
35.06%
34.63%
34.35%
35.06%
33.98%
35.06%
Naïve
Bayes
97.65%
36.99%
65.48%
97.65%
61.20%
97.65%
Random
Forest
35.06%
34.63%
34.35%
35.06%
33.98%
35.06%
K-NN
38.79%
32.77%
35.32%
38.79%
39.47%
38.79%
ID3
97.65%
36.99%
65.48%
97.65%
61.20%
97.65%
Random
Tree
35.06%
34.63%
34.35%
35.06%
33.98%
35.06%
Figure 3: Accuracy results on balanced airline datasets using different classifiers
5.3. Comparison between two experiments results for each
classifier
While comparing results between the performance of the
classifiers on balanced and unbalanced datasets, it was found the
following as seen in Figure 4 below:
5.3.1 Naive Byes and ID3
Gave the best accuracy than other classifiers in the two
experiments. The accuracy level with the balanced datasets higher
than unbalanced ones. In the unbalanced datasets, the maximum
accuracy for both classifiers was 82.7%. In the balanced dataset,
the accuracy reached 97.6%; these results confirm that these two
classifiers are the best compared to the other selected classifiers in
the two experiments:
5.3.2 K-NN and Decision Tree
Show better performance with the unbalanced datasets, and
the difference is so apparent. The maximum accuracy with the
balanced datasets is 39.4%, while it reached 82.7 % with the
unbalanced datasets.
5.3.3 Random forest and Random Tree
It shows better performance with the unbalanced datasets, and
the difference is so apparent. The maximum accuracy with the
balanced datasets around 35%, while it reached 82.7% with the
unbalanced datasets.
In conclusion, Naive Bayes and ID3 gave a better accuracy
level than other classifiers, and the performance was better with
the balanced datasets. The different classifiers (K-NN, Decision
Tree, Random Forest, and Random Tree) gave a better
understanding of the unbalanced datasets.
Figure 4: Accuracy results of classifiers on balanced and unbalanced datasets
6. Conclusions
Social media websites are gaining very big popularity among
people of different ages. Platforms such as Twitter, Facebook,
Instagram, and Snapchat allowed people to express their ideas,
opinions, comments, and thoughts. Therefore, a huge amount of
data is generated daily, and the written text is one of the most
common forms of the generated data. Business owners, decision-
makers, and researchers are increasingly attracted by the valuable
and massive amounts of data generated and stored on social media
websites. Sentiment Analysis is a Natural Language Processing
field that increasingly attracted researchers, government
authorities, business owners, services providers, and companies to
improve products, services, and research. In this research paper,
the authors aimed to survey sentiment analysis approaches.
Therefore, 16 research papers that studied Twitter's text
H. Tariq et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com 1688
classification and analysis were surveyed. The authors also aimed
to evaluate different machine learning algorithms used to classify
sentiment to either positive or negative, or neutral. This experiment
aims to compare the efficiency and performance of different
classifiers that have been used in the sixteen papers that are
surveyed. These classifiers are (Decision Tree, Naïve Bayes,
Random Forest, K-NN, ID3, and Random Tree). Besides, the
authors investigated the balanced dataset factor by applying the
same classifiers twice on the dataset, one on the unbalanced and
the other, after balancing the dataset. The targeted dataset included
six datasets about six American airline companies (United, Delta,
Southwest, Virgin America, US Airways, and American); it
consists of about 14000 tweets. The authors reported that the
classifier's accuracy results were very high in some datasets while
low in others. The authors indicated that the dataset size was the
reason for that. On the balanced dataset, the Naïve Bayes classifier,
Decision Tree, and ID3 were mostly better than other classifiers
and have given the almost same level of accuracy. The classifiers
with Virgin America dataset reported the lowest level of accuracy
due to its small size. On the unbalanced dataset, results show that
the Naive Byes and ID3 gave a better level of accuracy than other
classifiers when it’s applied on the balanced datasets. While (K-
NN, Decision Tree, Random Forest, and Random Tree) gave a
better understanding of the unbalanced datasets.
Conflict of Interest
The authors declare no conflict of interest
Acknowledgment
This is a part of project done in British University in Dubai.
References
[1] S.A. Salloum, C. Mhamdi, B. Al Kurdi, K. Shaalan, “Factors affecting the
Adoption and Meaningful Use of Social Media: A Structural Equation
Modeling Approach,” International Journal of Information Technology and
Language Studies, 2(3), 2018.
[2] M. Alghizzawi, S.A. Salloum, M. Habes, “The role of social media in
tourism marketing in Jordan,” International Journal of Information
Technology and Language Studies, 2(3), 2018.
[3] S.A. Salloum, W. Maqableh, C. Mhamdi, B. Al Kurdi, K. Shaalan, “Studying
the Social Media Adoption by university students in the United Arab
Emirates,” International Journal of Information Technology and Language
Studies, 2(3), 2018.
[4] S.A. Salloum, M. Al-Emran, S. Abdallah, K. Shaalan, Analyzing the arab
gulf newspapers using text mining techniques, 2018, doi:10.1007/978-3-
319-64861-3_37.
[5] F.A. Almazrouei, M. Alshurideh, B. Al Kurdi, S.A. Salloum, Social Media
Impact on Business: A Systematic Review, 2021, doi:10.1007/978-3-030-
58669-0_62.
[6] Alshurideh et al., “Understanding the Quality Determinants that Influence
the Intention to Use the Mobile Learning Platforms: A Practical Study,”
International Journal of Interactive Mobile Technologies (IJIM), 13(11),
157183, 2019.
[7] S.A. Salloum, K. Shaalan, Adoption of E-Book for University Students,
2019, doi:10.1007/978-3-319-99010-1_44.
[8] S.A. Salloum, M. Alshurideh, A. Elnagar, K. Shaalan, “Mining in
Educational Data: Review and Future Directions,” in Joint European-US
Workshop on Applications of Invariance in Computer Vision, Springer: 92
102, 2020.
[9] S.A. Salloum, M. Alshurideh, A. Elnagar, K. Shaalan, “Machine Learning
and Deep Learning Techniques for Cybersecurity: A Review,” in Joint
European-US Workshop on Applications of Invariance in Computer Vision,
Springer: 5057, 2020.
[10] S.A. Salloum, R. Khan, K. Shaalan, “A Survey of Semantic Analysis
Approaches,” in Joint European-US Workshop on Applications of
Invariance in Computer Vision, Springer: 6170, 2020.
[11] K.M. Alomari, A.Q. AlHamad, S. Salloum, “Prediction of the Digital Game
Rating Systems based on the ESRB.”
[12] S.A. Salloum, M. Al-Emran, A.A. Monem, K. Shaalan, “A survey of text
mining in social media: Facebook and Twitter perspectives,” Advances in
Science, Technology and Engineering Systems, 2(1), 2017,
doi:10.25046/aj020115.
[13] S.A. Salloum, A.Q. AlHamad, M. Al-Emran, K. Shaalan, A survey of Arabic
text mining, Springer, Cham: 417431, 2018, doi:10.1007/978-3-319-
67056-0_20.
[14] C. Mhamdi, M. Al-Emran, S.A. Salloum, Text mining and analytics: A case
study from news channels posts on Facebook, 2018, doi:10.1007/978-3-319-
67056-0_19.
[15] A.S. Alnaser, M. Habes, M. Alghizzawi, S. Ali, “The Relation among
Marketing ads, via Digital Media and mitigate (COVID-19) pandemic in
Jordan The Relationship between Social Media and Academic Performance:
Facebook Perspective View project Healthcare challenges during COVID-
19 pandemic View project,” Dspace.Urbe.University, (July), 2020.
[16] M. Alshurideh, B. Al Kurdi, S. Salloum, “Examining the Main Mobile
Learning System Drivers’ Effects: A Mix Empirical Examination of Both
the Expectation-Confirmation Model (ECM) and the Technology
Acceptance Model (TAM),” in International Conference on Advanced
Intelligent Systems and Informatics, Springer: 406417, 2019.
[17] M. Alghizzawi, M. Habes, S.A. Salloum, M.A. Ghani, C. Mhamdi, K.
Shaalan, “The effect of social media usage on students’e-learning acceptance
in higher education: A case study from the United Arab Emirates,”
International Journal of Information Technology and Language Studies, 3(3),
2019.
[18] M. Habes, S.A. Salloum, M. Alghizzawi, C. Mhamdi, “The Relation
Between Social Media and Students’ Academic Performance in Jordan:
YouTube Perspective,” in International Conference on Advanced Intelligent
Systems and Informatics, Springer: 382392, 2019.
[19] M. Habes, S.A. Salloum, M. Alghizzawi, M.S. Alshibly, “The role of
modern media technology in improving collaborative learning of students in
Jordanian universities,” International Journal of Information Technology
and Language Studies, 2(3), 2018.
[20] B.A. Kurdi, M. Alshurideh, S.A. Salloum, Z.M. Obeidat, R.M. Al-dweeri,
“An empirical investigation into examination of factors influencing
university students’ behavior towards elearning acceptance using SEM
approach,” International Journal of Interactive Mobile Technologies, 14(2),
2020, doi:10.3991/ijim.v14i02.11115.
[21] S.A. Salloum, M. Al-Emran, M. Habes, M. Alghizzawi, M.A. Ghani, K.
Shaalan, “Understanding the Impact of Social Media Practices on E-
Learning Systems Acceptance,” in International Conference on Advanced
Intelligent Systems and Informatics, Springer: 360369, 2019.
[22] M. Alghizzawi, M.A. Ghani, A.P.M. Som, M.F. Ahmad, A. Amin, N.A.
Bakar, S.A. Salloum, M. Habes, “The Impact of Smartphone Adoption on
Marketing Therapeutic Tourist Sites in Jordan,” International Journal of
Engineering & Technology, 7(4.34), 9196, 2018.
[23] S.F.S. Alhashmi, S.A. Salloum, S. Abdallah, “Critical Success Factors for
Implementing Artificial Intelligence (AI) Projects in Dubai Government
United Arab Emirates (UAE) Health Sector: Applying the Extended
Technology Acceptance Model (TAM),” in International Conference on
Advanced Intelligent Systems and Informatics, Springer: 393405, 2019.
[24] M. Alghizzawi, M. Habes, S.A. Salloum, The Relationship Between Digital
Media and Marketing Medical Tourism Destinations in Jordan: Facebook
Perspective, 2020, doi:10.1007/978-3-030-31129-2_40.
[25] R.S. Al-Maroof, S.A. Salloum, A.Q.M. AlHamadand, K. Shaalan, A Unified
Model for the Use and Acceptance of Stickers in Social Media Messaging,
2020, doi:10.1007/978-3-030-31129-2_34.
[26] K.S.A. Wahdan, S. Hantoobi, S.A. Salloum, K. Shaalan, “A systematic
review of text classification research based ondeep learning models in Arabic
language,” Int. J. Electr. Comput. Eng, 10(6), 66296643, 2020.
[27] J.D. Burger, J. Henderson, G. Kim, G. Zarrella, “Discriminating gender on
Twitter,” in Proceedings of the 2011 Conference on Empirical Methods in
Natural Language Processing, 13011309, 2011.
[28] D. Gamal, M. Alfonse, E.-S.M. El-Horbaty, A.-B.M. Salem, “Twitter
Benchmark Dataset for Arabic Sentiment Analysis,” International Journal of
Modern Education and Computer Science, 11(1), 33, 2019.
[29] M. Nabil, M. Aly, A. Atiya, “Astd: Arabic sentiment tweets dataset, in
Proceedings of the 2015 conference on empirical methods in natural
language processing, 25152519, 2015.
[30] Z.J. and G. Xiaolini, “Comparison Research on Text Pre-processing
Methods on Twitter Sentiment Analysis,” Digital Object Identifier, 2017,
doi:10.1109/ACCESS. 2017. 2672677.
H. Tariq et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 6, 1683-1689 (2020)
www.astesj.com 1689
[31] Q.C. Nguyen, D. Li, H.-W. Meng, S. Kath, E. Nsoesie, F. Li, M. Wen,
“Building a national neighborhood dataset from geotagged Twitter data for
indicators of happiness, diet, and physical activity,” JMIR Public Health and
Surveillance, 2(2), e158, 2016.
[32] A. Amolik, N. Jivane, M. Bhandari, M. Venkatesan, “Twitter sentiment
analysis of movie reviews using machine learning techniques,” International
Journal of Engineering and Technology, 7(6), 17, 2016.
[33] G. Gautam, D. Yadav, “Sentiment analysis of twitter data using machine
learning approaches and semantic analysis,” in 2014 Seventh International
Conference on Contemporary Computing (IC3), IEEE: 437442, 2014.
[34] M. Furini, M. Montangero, “TSentiment: On gamifying Twitter sentiment
analysis,” in 2016 IEEE Symposium on Computers and Communication
(ISCC), IEEE: 9196, 2016.
[35] A. Porshnev, I. Redkin, A. Shevchenko, “Machine learning in prediction of
stock market indicators based on historical data and data from Twitter
sentiment analysis .,” 2013 IEEE 13th International Conference on Data
Mining Workshops, 440444, 2013, doi:10.1109/ICDMW.2013.111.
[36] V. Ramanathan, “Twitter Text Mining for Sentiment Analysis on People ’ s
Feedback about Oman Tourism,” 2019 4th MEC International Conference
on Big Data and Smart City (ICBDSC), 15, 2019.
[37] V. Martin, “Predicting the french stock market using social media analysis,”
Proceedings - 8th International Workshop on Semantic and Social Media
Adaptation and Personalization, SMAP 2013, 37, 2013,
doi:10.1109/SMAP.2013.22.
[38] Q. Li, B. Zhou, Q. Liu, “Can twitter posts predict stock behavior?: A study
of stock market with twitter social emotion,” Proceedings of 2016 IEEE
International Conference on Cloud Computing and Big Data Analysis,
ICCCBDA 2016, 359364, 2016, doi:10.1109/ICCCBDA.2016.7529584.
[39] Indrawati, A. Alamsyah, “Social network data analytics for market
segmentation in Indonesian telecommunications industry,” 2017 5th
International Conference on Information and Communication Technology,
ICoIC7 2017, 0(c), 2017, doi:10.1109/ICoICT.2017.8074677.
[40] H. Dai, J. Hao, “Mining social media data for opinion polarities about
electronic cigarettes,” Tobacco Control, 26(2), 175180, 2017,
doi:10.1136/tobaccocontrol-2015-052818.
[41] A. Husnain, S.M.U. Din, G. Hussain, Y. Ghayor, “Estimating market trends
by clustering social media reviews,” Proceedings - 2017 13th International
Conference on Emerging Technologies, ICET2017, 2018-Janua, 16, 2018,
doi:10.1109/ICET.2017.8281716.
[42] K. Kandasamy, P. Koroth, “An integrated approach to spam classification
on Twitter using URL analysis, natural language processing and machine
learning techniques,” 2014 IEEE Students’ Conference on Electrical,
Electronics and Computer Science, SCEECS 2014, 15, 2014,
doi:10.1109/SCEECS.2014.6804508.
... The KNN algorithm works well with noisy training data as well as the implementation is simple. The disadvantage is that when new data comes, K neighbors have to be recalculated again, which in turns increase the computational time consumption [45], [50], [52], [53]. ...
Article
Full-text available
Sentiment analysis on views and opinions expressed in Indian regional languages has become the current focus of research. But, compared to a globally accepted language like English, research on sentiment analysis in Indian regional languages like Malayalam are very low. One of the major hindrances is the lack of publicly available Malayalam datasets. This work focuses on building a Malayalam dataset for facilitating sentiment analysis on Malayalam texts and studying the efficiency of a pre-trained deep learning model in analyzing the sentiments latent in Malayalam texts. In this work, a Malayalam dataset has been created by extracting 2,000 tweets from Twitter. The bidirectional encoder representations from transformers (BERT) is a pretrained model that has been used for various natural language processing tasks. This work employs a transformer-based BERT model for Malayalam sentiment analysis. The efficacy of BERT in analyzing the sentiments latent in Malayalam texts has been studied by comparing the performance of BERT with various machine learning models as well as deep learning models. By analyzing the results, it is found that a substantial increase in accuracy of 5% for BERT when compared with that of Bi-GRU, which is the next bestperforming model.
... Hasan, Maliha & Arifuzzaman (2019) developed a framework that contained preprocessed data using NLP and retrieved important tweet text from the data with the help of a bag of words(BoW) and term frequency-inverse document frequency (TF-IDF). Alshamsi et al. (2020) used about 14,000 tweets for sentiment analysis. They cleaned the tweets using NLP techniques and categorized the tweets into positive, negative, and neutral sentiments. ...
Article
Full-text available
With the rise of social media platforms, sharing reviews has become a social norm in today’s modern society. People check customer views on social networking sites about different fast food restaurants and food items before visiting the restaurants and ordering food. Restaurants can compete to better the quality of their offered items or services by carefully analyzing the feedback provided by customers. People tend to visit restaurants with a higher number of positive reviews. Accordingly, manually collecting feedback from customers for every product is a labor-intensive process; the same is true for sentiment analysis. To overcome this, we use sentiment analysis, which automatically extracts meaningful information from the data. Existing studies predominantly focus on machine learning models. As a consequence, the performance analysis of deep learning models is neglected primarily and of the deep ensemble models especially. To this end, this study adopts several deep ensemble models including Bi long short-term memory and gated recurrent unit (BiLSTM+GRU), LSTM+GRU, GRU+recurrent neural network (GRU+RNN), and BiLSTM+RNN models using self-collected unstructured tweets. The performance of lexicon-based methods is compared with deep ensemble models for sentiment classification. In addition, the study makes use of Latent Dirichlet Allocation (LDA) modeling for topic analysis. For experiments, the tweets for the top five fast food serving companies are collected which include KFC, Pizza Hut, McDonald’s, Burger King, and Subway. Experimental results reveal that deep ensemble models yield better results than the lexicon-based approach and BiLSTM+GRU obtains the highest accuracy of 95.31% for three class problems. Topic modeling indicates that the highest number of negative sentiments are represented for Subway restaurants with high-intensity negative words. The majority of the people (49%) remain neutral regarding the choice of fast food, 31% seem to like fast food while the rest (20%) dislike fast food.
... A classification algorithm classified the data from the data set to predict the variables that closely belong to a classification, thus finding a model that describes and distinguishes to predict the target variable. The ID3 is a classification algorithm that generates decision trees used in various disciplines such as semantic analysis (Alshamsi et al., 2020), medical data classification (Yang et al., 2018), optimization (Qiang et al., 2020), and detection (Abbas & Farooq, 2019) among others. ...
Article
Full-text available
An individualized teaching performance evaluation identified the teacher's handling of his class. Using multisource feedback in evaluating the teaching performance defined the teacher's teaching performance as observed only by several evaluators. However, knowing the brand of teachers from a certain group is vitally important to define the group's identity in their delivery of instruction. Hence, modeling the teaching performance to determine the brand of teaching enables academic administration to easily monitor and evaluate the teaching performance. This study developed a model for teaching performance to determine the brand of teaching. The model used the faculty performance evaluation dataset in a university in the southern Philippines. The model pre-processed the data using the expectation-maximization algorithm and then classified them using the J48 classifier running in WEKA to generate decision trees. Furthermore, the model partitioned the original data set into ten partitions, making nine training sets, and the remaining are for validation or test sets using the 10-fold cross-validation testing. Based on the dataset used, the model was able to identify prominent teaching brands and disregard attributes with no noticeable contribution to the classification.
... And the dataset consists of 4 fields (text of the tweets, full name, screen name, and description). After applying classification algorithms, they have achieved 92% accuracy (Shamsi et al. 2021). ...
... And the dataset consists of 4 fields (text of the tweets, full name, screen name, and description). After applying classification algorithms, they have achieved 92% accuracy (Shamsi et al. 2021). ...
Article
Full-text available
Urdu language is being spoken by over 64 million people and its Roman script is very popular, especially on social networking sites. Most users prefer Roman Urdu over English grammar for communication on social networking platforms such as Facebook, Twitter, Instagram and WhatsApp. For research, Urdu is a poor resource language as there are a few research papers and projects that have been carried out for the language and vocabulary enhancement in comparison to other languages especially English. A lot of research has been made in the domain of sentiment analysis in English but only a limited work has been performed on the Roman Urdu language. Sentiment analysis is the method of understanding human emotions or points of view, expressed in a textual form about a particular thing. This article proposes a deep learning model to perform data mining on emotions and attitudes of people using Roman Urdu. The main objective of the research is to evaluate sentiment analysis on Roman Urdu corpus containing RUSA-19 using faster recurrent convolutional neural network (FRCNN), RCNN, rule-based and N-gram model. For assessment, two series of experiments were performed on each model, binary classification (positive and negative) and tertiary classification (positive, negative, and neutral). Finally, the evaluation of the faster RCNN model is analyzed and a comparative analysis is performed for the outcomes of four models. The faster RCNN model outperformed others as the model achieves an accuracy of 91.73% for binary classification and 89.94% for tertiary classification.
... This study showed that the term frequency-inverse positive-negative document frequency classifier outperforms the standard TF-IDF technique. In addition, the results of this study highlight the importance of this analysis technique for imbalanced data sets, which, if not accounted for, could lead to erroneous results [54]. ...
Article
Background Mixed reality (MR) devices provide real-time environments for physical-digital interactions across many domains. Owing to the unprecedented COVID-19 pandemic, MR technologies have supported many new use cases in the health care industry, enabling social distancing practices to minimize the risk of contact and transmission. Despite their novelty and increasing popularity, public evaluations are sparse and often rely on social interactions among users, developers, researchers, and potential buyers. Objective The purpose of this study is to use aspect-based sentiment analysis to explore changes in sentiment during the onset of the COVID-19 pandemic as new use cases emerged in the health care industry; to characterize net insights for MR developers, researchers, and users; and to analyze the features of HoloLens 2 (Microsoft Corporation) that are helpful for certain fields and purposes. Methods To investigate the user sentiment, we collected 8492 tweets on a wearable MR headset, HoloLens 2, during the initial 10 months since its release in late 2019, coinciding with the onset of the pandemic. Human annotators rated the individual tweets as positive, negative, neutral, or inconclusive. Furthermore, by hiring an interannotator to ensure agreements between the annotators, we used various word vector representations to measure the impact of specific words on sentiment ratings. Following the sentiment classification for each tweet, we trained a model for sentiment analysis via supervised learning. Results The results of our sentiment analysis showed that the bag-of-words tokenizing method using a random forest supervised learning approach produced the highest accuracy of the test set at 81.29%. Furthermore, the results showed an apparent change in sentiment during the COVID-19 pandemic period. During the onset of the pandemic, consumer goods were severely affected, which aligns with a drop in both positive and negative sentiment. Following this, there is a sudden spike in positive sentiment, hypothesized to be caused by the new use cases of the device in health care education and training. This pandemic also aligns with drastic changes in the increased number of practical insights for MR developers, researchers, and users and positive net sentiments toward the HoloLens 2 characteristics. Conclusions Our approach suggests a simple yet effective way to survey public opinion about new hardware devices quickly. The findings of this study contribute to a holistic understanding of public perception and acceptance of MR technologies during the COVID-19 pandemic and highlight several new implementations of HoloLens 2 in health care. We hope that these findings will inspire new use cases and technological features.
Chapter
This paper presents a new approach for processing the entire books with Natural Language Processing algorithms. In particular, we proposed methods to evaluate books in terms of assessing the intensity of the book’s soft features, such as fantastic, touching, suspenseful, etc. Using Bag of Words and TF/IDF, we embedded books and conducted classification experiments to determine the most appropriate parameters for classifying the intensity of features. The obtained results showed, that in the considered problem the Random Forests algorithm fitted the best, achieving accuracy of 95% and F1 measure of 89%. The evaluation also included the selection of the best converter and data aggregation method.KeywordsText representationWord embeddingsLong text classificationBag of words
Article
Full-text available
(1) Background: the ability to use social media to communicate without revealing one’s real identity has created an attractive setting for cyberbullying. Several studies targeted social media to collect their datasets with the aim of automatically detecting offensive language. However, the majority of the datasets were in English, not in Arabic. Even the few Arabic datasets that were collected, none focused on Instagram despite being a major social media platform in the Arab world. (2) Methods: we use the official Instagram APIs to collect our dataset. To consider the dataset as a benchmark, we use SPSS (Kappa statistic) to evaluate the inter-annotator agreement (IAA), as well as examine and evaluate the performance of various learning models (LR, SVM, RFC, and MNB). (3) Results: in this research, we present the first Instagram Arabic corpus (sub-class categorization (multi-class)) focusing on cyberbullying. The dataset is primarily designed for the purpose of detecting offensive language in texts. We end up with 200,000 comments, of which 46,898 comments were annotated by three human annotators. The results show that the SVM classifier outperforms the other classifiers, with an F1 score of 69% for bullying comments and 85 percent for positive comments.
Chapter
Every successful business aims to know how customers feel about its brands, services, and products. People freely express their views, ideas, sentiments, and opinions on social media for their day-to-day activities, for product reviews, for surveys, and even for their public opinions. This process provides a fortune of valuable resources about the market for any type of business. Unfortunately, it's impossible to manually analyze this massive quantity of information. Sentiment analysis (SA) and opinion mining (OM), as new fields of natural language processing, have the potential benefit of analyzing such a huge amount of data. SA or OM is the computational treatment of opinions, sentiments, and subjectivity of text. This chapter introduces the reader to a survey of different text SA and OM proposed techniques and approaches. The authors discuss in detail various approaches to perform a computational treatment for sentiments and opinions with their strengths and drawbacks.
Article
Full-text available
Classifying or categorizing texts is the process by which documents are classified into groups by subject, title, author, etc. This paper undertakes a systematic review of the latest research in the field of the classification of Arabic texts. Several machine learning techniques can be used for text classification, but we have focused only on the recent trend of neural network algorithms. In this paper, the concept of classifying texts and classification processes are reviewed. Deep learning techniques in classification and its type are discussed in this paper as well. Neural networks of various types, namely, RNN, CNN, FFNN, and LSTM, are identified as the subject of study. Through systematic study, 12 research papers related to the field of the classification of Arabic texts using neural networks are obtained: for each paper the methodology for each type of neural network and the accuracy ration for each type is determined. The evaluation criteria used in the algorithms of different neural network types and how they play a large role in the highly accurate classification of Arabic texts are discussed. Our results provide some findings regarding how deep learning models can be used to improve text classification research in Arabic language.
Article
Full-text available
There are several reasons why most of the universities implement E-learning. The extent of E-learning programs is being offered by the higher educational institutes in the UAE are evidently expanding. However, very few studies have been carried out to validate the process of how E-learning is being accepted and employed by university students. The study involved a sample of 365 university students. To describe the acceptance process, the Structural Equation Modeling (SEM) method was used. On the basis of the technology acceptance model (TAM), the standard structural model that in
Article
Full-text available
This study investigates the influence of student social media usage on the acceptance of e-learning platforms at the British University in Dubai. A modified Technology Acceptance Model was developed and validated for the quantitative study, which comprised data collected from 410 graduate and postgraduate students via an electronic questionnaire. The findings showed that knowledge sharing, social media features and motivation to use social media systems, including Facebook YouTube and Twitter, positively affected the perceived usefulness and perceived ease-of-use of e-learning platforms, which, in turn, led to increased e-learning platform acceptance by students. The research model can be adapted to similar studies to assist in further research regarding how higher-education institutions in the UAE can maximize the benefits and uptake of e-learning platforms.
Conference Paper
Full-text available
This study aimed to analyze and discover the relation of using digital media sites (Facebook) on promoting medical tourism destinations in Jordan, and its impact on the behavior of tourists through the technologies provided by these means. Away from the traditional methods in marketing, the researchers used the survey methodology for a sample of 560 tourists distributed at central of Jordan in Dead Sea area to realize the study objective, a new framework was suggested to show the impact of Facebook on the behavior of tourists through: demographic variables, Facebook features, advertising, by using the TAM model in adoption of social media technology in tourism marketing for tourist destinations in Jordan. The proposed data were analyzed using the Smart PLS system by modeling structural equations (SEM). The outcome of the study showed that the advantages of Facebook, advertising and demographic variables have a favorable effect on the (PEOU) of the tourist and the PU in the adoption of tourism behavior, in addition to the (PU) and (PEOU) (ATT), which led to the adoption of behavior around therapeutic tourism destinations in Jordan. By determining the impact of Facebook in marketing tourism in Jordan, it would be useful to conduct further research to provide better proposals for marketing tourist therapeutic destinations in Jordan.
Conference Paper
Full-text available
This study aims mainly at analyzing the relationship between social media and students’ academic performance in Jordan in the context of higher education from a YouTube perspective. It intends to explore the benefits this relationship may have in enhancing students; leaning and improving their academic performance. To successfully reach its aims, this study proposes a new model aiming at verifying the relationship of social Bookmarking, YouTube Features, Perceived Usefulness, Use of Social Media, on Jordanian students’ academic performance. To verify the validity of the proposed model, data were analyzed using Smart PLS using structural equations modeling (SEM). Data were collected from Yarmouk University in Jordan covering all the levels of study at the university. An electronic questionnaire was conducted for a target of 360 students who participated in this study. The findings of the study revealed that Social Bookmarking, YouTube Features, Perceived Usefulness, Use of Social Media are important factors to predict students’ academic performance in relation to using social networking media for e-learning purposes in Jordan.
Conference Paper
Full-text available
There have been several longitudinal studies concerning the learners’ acceptance of e-learning systems using the higher educational institutes (HEIs) platforms. Nonetheless, little is known regarding the investigation of the determinants affecting the e-learning acceptance through social media applications in HEIs. In keeping with this, the present study attempts to understand the influence of social media practices (i.e., knowledge sharing, social media features, and motivation and uses) on students’ acceptance of e-learning systems by extending the technology acceptance model (TAM) with these determinants. A total of 410 graduate and undergraduate students enrolled at the British University in Dubai, UAE took part in the study by the medium of questionnaire surveys. The partial least squares-structural equation modeling (PLS-SEM) is employed to analyze the extended model. The empirical data analysis triggered out that social media practices including knowledge sharing, social media features, and motivation and uses have significant positive impacts on both perceived usefulness (PU) and perceived ease of use (PEOU). It is also imperative to report that the acceptance of e-learning systems is significantly influenced by both PU and PEOU. In summary, social media practices play an effective positive role in influencing the acceptance of e-learning systems by students.
Article
Full-text available
This study tries to find out the best model for prediction video game rate categories. A representation from four rating categories (everyone, everyone 10+, teen, mature) was used for the analysis. The paper follows CRISP-DM approach under Rapid Miner software to business and data understanding, Data preparation, model building and evaluation. The researchers compared prediction among six model and the results showed that the Generalized Linear Models (GLMs) achieved a best accuracy (0.9027), also results highlighted eight important content descriptions to have the highest influence on prediction.
Conference Paper
Social media is a multifaceted phenomenon that significantly affects business competence mainly because of spearheading the evolutionary process. The primary purpose of the systematic review is to encompass the evaluation of social media as a model that influences business enterprises in the local and international levels. The systematic review utilized four primary hypotheses to determine the influence of social media on businesses. These hypotheses are Social media (SM) that significantly influences the sales (SL) in business, Social media (SM) which have a strong relationship with businesses loyalty (LO), Social media (SM) that influences business by awareness (AW), and Social media (SM) significantly influences the level of business performance (BP). Different research studies established that social media significantly contributes to the competence of firms mainly because of the global effect. Examples of these social media facets include social media knowledge and various platforms such as Facebook, Instagram, Twitter, YouTube, and LinkedIn. In this case, social media fostered the emergence of various business capabilities. Examples of these capabilities encompass brand awareness, brand loyalty, and sales. Social media is a platform that profoundly influences the level of business competence through the advancement of business capabilities.
Conference Paper
This study aims to investigate the intention to use and actual use of Mobile Learning System (MLS) drivers by students within the UAE higher education setting. A set of factors were chosen to study and test the issue at hand. These factors are social influence, expectation-confirmation, perceived ease of use, perceived usefulness, satisfaction, continuous intention and finally the actual use of such MLS. This study adds more light to the MLS context because it combines between two models which are the Information Technology Acceptance Model (TAM) and Expectation-Confirmation Model (ECM). A set of hypotheses were developed based on such theoretical combination. The data collected from 448 students for the seek of primary data and analyzed using the Structural Equation Modeling (SEM) in particular (SmartPLS) to evaluate the developed study model and test the prepared hypotheses. The study found that both social influence and expectation-confirmation factors influence positively perceived ease of use, perceived usefulness and satisfaction and such three drivers influence positively students’ intention to use MLS. Based on previous proposed links, the study confirms that intention to use such mobile educational means affect strongly and positively the actual use. Scholars and practitioners should take care of learners’ intention to use and actual use of MLS and their determinants into more investigation especially the social influence and reference group ones within the educational setting. A set of limitation and future research venues were mentioned in details also.
Conference Paper
The combination of two technology model which are the Technology Acceptance Model (TAM) and Use of Gratifications Theory (U&G) to create an integrated model is the first step in predicting the importance of using emotional icons and the level of satisfaction behind this usage. The reason behind using these two theories into one integrated model is that U&G provides specific information and a complete understanding of usage, whereas TAM theory has proved its effectiveness with a variety of technological applications. A self-administered survey was conducted in University of Fujairah with college students to find out the social and cognitive factors that affect the usage of stickers in WhatsApp in the United Arab of Emirates. The hypothesized model is validated empirically using the responses received from an online survey of 372 respondents were analyzed using structural equation modeling (SEM-PLS). The results show that ease of use, perceived usefulness, cognition, hedonic and social integrative significantly affected the intention to use sticker by college students. Moreover, personal integrative had a significant influence on the intention to use sticker in UAE.