PreprintPDF Available

Investigating Gender-Specific Discourse about Online Learning during COVID-19 on Twitter using Sentiment Analysis, Subjectivity Analysis, and Toxicity Analysis

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

The work presented in this paper presents several novel findings from a comprehensive analysis of about 50,000 Tweets about online learning during COVID-19, posted on Twitter between November 9, 2021, and July 13, 2022. First, the results of sentiment analysis from VADER, Afinn, and TextBlob show that a higher percentage of these tweets were positive. The results of gender-specific sentiment analysis indicate that for positive tweets, negative tweets, and neutral tweets, between males and females, males posted a higher percentage of the tweets. Second, the results from subjectivity analysis show that the percentage of least opinionated, neutral opinionated, and highly opinionated tweets were 56.568%, 30.898%, and 12.534%, respectively. The gender-specific results for subjectivity analysis indicate that for each subjectivity class, males posted a higher percentage of tweets as compared to females. Third, toxicity detection was performed on the tweets to detect different categories of toxic content - toxicity, obscene, identity attack, insult, threat, and sexually explicit. The gender-specific analysis of the percentage of tweets posted by each gender in each of these categories revealed several novel insights. For instance, for the sexually explicit category, females posted a higher percentage of tweets as compared to males. Fourth, gender-specific tweeting patterns for each of these categories of toxic content were analyzed to understand the trends of the same. The results unraveled multiple paradigms of tweeting behavior, for instance, the intensity of obscene content in tweets about online learning by males and females has decreased since May 2022. Fifth, the average activity of males and females per month was calculated. The findings indicate that the average activity of females has been higher in all months as compared to males other than March 2022. Finally, country-specific tweeting patterns of males and females were also performed which presented multiple novel insights, for instance, in India a higher percentage of the tweets about online learning during COVID-19 were posted by males as compared to females.
Communication Not peer-reviewed version
Investigating Gender-Specific
Discourse about Online Learning
during COVID-19 on Twitter using
Sentiment Analysis, Subjectivity
Analysis, and Toxicity Analysis
Nirmalya Thakur * , Shuqi Cui , Karam Khanna , Victoria Knieling , Yuvraj Nihal Duggal , Mingchen Shao
Posted Date: 3 October 2023
doi: 10.20944/preprints202310.0157.v1
Keywords: online learning; COVID-19; Twitter; Data Analysis; Natural Language Processing; Sentiment
Analysis; Subjectivity Analysis; Toxicity Analysis; Diversity Analysis
Preprints.org is a free multidiscipline platform providing preprint service that
is dedicated to making early versions of research outputs permanently
available and citable. Preprints posted at Preprints.org appear in Web of
Science, Crossref, Google Scholar, Scilit, Europe PMC.
Copyright: This is an open access article distributed under the Creative Commons
Attribution License which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Communication
Investigating Gender-Specific Discourse About
Online Learning during COVID-19 on Twitter Using
Sentiment Analysis, Subjectivity Analysis, and
Toxicity Analysis
Nirmalya Thakur 1*, Shuqi Cui 1, Karam Khanna 1, Victoria Knieling 2, Yuvraj Nihal Duggal 1
and Mingchen Shao 1
1 Department of Computer Science, Emory University, Atlanta, GA 30322, USA;
nirmalya.thakur@emory.edu (N.T.) nicole.cui@emory.edu (S.C.), karam.khanna@emory.edu (K.K.),
yuvraj.nihal.duggal@emory.edu (Y.N.D.), katie.shao@emory.edu (M.S)
2 Department of Linguistics, Emory University, Atlanta, GA 30322; victoria.knieling@emory.edu
* Correspondence: nirmalya.thakur@emory.edu.
Abstract: The work presented in this paper presents several novel findings from a comprehensive analysis of
about 50,000 Tweets about online learning during COVID-19, posted on Twitter between November 9, 2021,
and July 13, 2022. First, the results of sentiment analysis from VADER, Afinn, and TextBlob show that a higher
percentage of these tweets were positive. The results of gender-specific sentiment analysis indicate that for
positive tweets, negative tweets, and neutral tweets, between males and females, males posted a higher
percentage of the tweets. Second, the results from subjectivity analysis show that the percentage of least
opinionated, neutral opinionated, and highly opinionated tweets were 56.568%, 30.898%, and 12.534%,
respectively. The gender-specific results for subjectivity analysis indicate that for each subjectivity class, males
posted a higher percentage of tweets as compared to females. Third, toxicity detection was performed on the
tweets to detect different categories of toxic content - toxicity, obscene, identity attack, insult, threat, and
sexually explicit. The gender-specific analysis of the percentage of tweets posted by each gender in each of
these categories revealed several novel insights. For instance, for the sexually explicit category, females posted
a higher percentage of tweets as compared to males. Fourth, gender-specific tweeting patterns for each of these
categories of toxic content were analyzed to understand the trends of the same. The results unraveled multiple
paradigms of tweeting behavior, for instance, the intensity of obscene content in tweets about online learning
by males and females has decreased since May 2022. Fifth, the average activity of males and females per month
was calculated. The findings indicate that the average activity of females has been higher in all months as
compared to males other than March 2022. Finally, country-specific tweeting patterns of males and females
were also performed which presented multiple novel insights, for instance, in India a higher percentage of the
tweets about online learning during COVID-19 were posted by males as compared to females.
Keywords: online learning; COVID-19; Twitter; Data Analysis; Natural Language Processing;
Sentiment Analysis; Subjectivity Analysis; Toxicity Analysis; Diversity Analysis
1. Introduction
In December 2019, the world was challenged by the outbreak of COVID-19. COVID-19 is a
coronavirus (CoV), specifically or otherwise known as SARS-CoV-2, that causes severe respiratory
illness [1]. For some individuals, this respiratory illness may even cause acute respiratory distress
syndrome or extra-pulmonary organ failure due to the extreme inflammatory response of the virus
[2]. The dangers of COVID-19 resulted in a global search for understanding as to how this virus works
and is contracted. SARS-CoV-2, named such because of its similarities with SARS-CoV, is believed to
initially be contracted by humans through animal-human contact and thereafter spread through
human-human contact [3]. This is not unusual, as other CoVs, like MERS-CoV, were transmitted
through animal-human contact [4]. While COVID-19 is believed to have originated in a seafood
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
© 2023 by the author(s). Distributed under a Creative Commons CC BY license.
2
market in Wuhan, China, the exact animal that may have infected the first identified patients remains
unclear [3]. However, genome sequencing between SARS-CoV-2 and SARS-CoV in a bat from
Yunnan Province, China suggests that bats may have been the originators of SARS-CoV-2 due to the
CoVs having a 93.1% identity with the RaTG12 virus found in the bat [1].
After the initial outbreak, COVID-19 soon spread to different parts of the world and on March
11, 2020, the World Health Organization (WHO) declared COVID-19 an emergency [5]. As no
treatments or vaccines for COVID-19 were available at that time, the virus rampaged unopposed
across different countries, infecting and leading to the demise of people the likes of which the world
had not witnessed in centuries. As of September 21, 2023, there have been a total of 770,778,396 cases
and 6,958,499 deaths due to COVID-19 [6]. As an attempt to mitigate the spread of the virus, several
countries across the world went on partial to complete lockdowns [7]. Such lockdowns affected the
educational sector immensely. Universities, colleges, and schools across the world were left searching
for solutions to best deliver course content online, engage learners, and conduct assessments during
the lockdowns. During this time, online learning was considered a feasible solution. Online learning
platforms are applications (web-based or software) that are used for designing, delivering, managing,
monitoring, and accessing courses online [8]. This switch to online learning took place in more than
100 countries [9] and led to an incredible increase in the need to familiarize, utilize, and adopt online
learning platforms by educators, students, administrators, and staff at universities, colleges, and
schools across the world [10].
In today’s Internet of Everything era [11], the usage of social media platforms has skyrocketed
as such platforms serve as virtual communities [12] for people to seamlessly connect with each other.
Currently, around 4.9 billion individuals worldwide actively participate in social media, and it is
projected that this number will reach 5.85 billion by 2027. On average, a social media user maintains
approximately 8.4 social media profiles and allocates roughly 145 minutes each day to engage with
various social media platforms. Among the various social media platforms available, Twitter has
gained substantial popularity across diverse age groups [13,14]. This rapid transition to online
learning resulted in a tremendous increase in the usage of social media platforms, such as Twitter,
where individuals communicated their views, perspectives, and concerns towards online learning,
leading to the generation of Big Data of social media conversations. This Big Data of conversations
holds the potential to provide insights about these paradigms of information and seeking behavior
about online learning during COVID-19.
1.1. COVID-19: A Brief Overview
COVID-19 is a type of coronavirus (CoVs). CoVs are a type of RNA virus consisting of four
proteins: spike (S) protein, membrane (M) protein, envelope (E) protein, and nucleocapsid (N)
protein. The S protein is involved with the attachment and recognition of the host cell (infection); the
M protein is involved with shaping virions; the E protein is responsible for packaging and
reproduction, and the N protein packages RNA into a nucleocapsid. The virions also have
polyproteins that are translated after entry into the host or target cell. These polyproteins include 1a
and b (pp1a, pp1b) [15,16]. The SARS-CoV-2 virus particle measures between 60 to 140 nanometers
in diameter and boasts a positive-sense, single-stranded RNA genome spanning a length of 29891
base pairs [15].
Infection by SARS-CoV-2 occurs when the S protein binds to the surface receptor, angiotensin-
converting enzyme 2 (ACE2), and enters type II pneumocytes, which are found in human lungs. The
S protein is critical to transmission and infection by SARS-CoV-2, as the S protein has two domains,
S1 and S2, where S1 involves binding of ACE2 and S2 involves fusion to the host cell at its membrane.
Similarly important is the cleavage of the S protein. Having two cleavage sites, the S protein must be
cleaved by nuclear proteases so that viral entry, and subsequent infection, of the host cell can happen.
Previous research suggests that the S protein of SARS-CoV-2 has a higher binding efficiency and may
explain its high rate of transmissibility. The high transmissibility is also explained by four amino
acids found during insertion, P681, R682, R683, and A684, that have not been found in other CoVs
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
3
before, nor was it found in the RaTG12 virus observed in the bat thought to have infected the first
human patients of COVID-19 [17,18].
While infections involving various organs have been documented in different cases, the typical
effect of the SARS-CoV-2 virus on patients is centered around their respiratory systems. Investigation
of the infections caused in Wuhan in December 2019 has shown that patients suffer from a range of
symptoms during the initial days of contracting this virus. These symptoms encompass fever, a dry
cough, breathing difficulties, headaches, dizziness, fatigue, nausea, and diarrhea. It’s important to
note that the symptoms of COVID-19 can vary from person to person both in terms of the nature of
the symptoms as well as the intensity of one or more symptoms [19,20].
1.2. Twitter: A Globally Popular Social Media Platform
Twitter ranks as the sixth most popular social platform in the United States and the seventh
globally [21,22]. Notably, 42% of Americans between the ages of 12 and 34 are active Twitter users,
marking a substantial 36.6% surge over the span of two years. The frequency of posting on the
platform appears to correlate with the number of accounts followed, as users who post more than
five Tweets per month tend to follow an average of 405 accounts, in contrast to those who post less
frequently and follow an average of 105 accounts [22]. Furthermore, users spend an average of 1.1
hours per week on Twitter, which equates to 4.4 hours per month on the platform [23].
In 2023, Twitter boasts 353.9 million monthly active users, constituting 9.4% of the global social
media user base [24]. The majority of Twitter users, accounting for 52.9%, fall within the age range of
25 to 49 years. Notably, 17.1% of users belong to the 14-18 age group, 6.6% to the 13-17 age group,
and the remaining 17% are aged 50 and above [25]. On average, U.S. adults spend approximately 34.1
minutes per day on Twitter [26]. Impressively, a staggering 500 million tweets are published each
day, equivalent to 5,787 tweets per second. An encouraging statistic reveals that 42.3% of U.S. users
utilize Twitter at least once a month and it is currently the ninth most visited website globally. The
countries with the highest number of Twitter users include the United States with 95.4 million users,
Japan with 67.45 million, India with 27.25 million, Brazil with 24.3 million, Indonesia with 24 million,
the UK with 23.15 million, Turkey with 18.55 million, and Mexico with 17.2 million [28,29]. On
average, a Twitter user spends 5.1 hours per month on the platform, translating to approximately 10
minutes daily. A fifth of users under 30 visit frequently, and 25% use the platform every week, with
71% visiting at least weekly. Twitter is a significant source of news, with 55% of users accessing it
regularly for this purpose. Ninety-six percent of U.S. Twitter users report monthly usage.
Additionally, 82% engage with Twitter for entertainment. In terms of activity, 6,000 tweets are sent
per second. Mobile usage is dominant, with 80% of active users accessing Twitter via smartphones
[30,31].
Due to this ubiquitousness of Twitter, studying the multimodal components of information-
seeking and sharing behavior has been of keen interest to scientists from different disciplines as can
be seen from recent works in this field that focused on the analysis of tweets about various emerging
technologies [3235], global affairs [3638], humanitarian issues [3941], and societal problems [42
44]. Since the outbreak of COVID-19, there have been several research works conducted in this field
(Section 2) where researchers analyzed different components and characteristics of the tweets to
interpret the varying degrees of public perceptions, attitudes, views, and responses towards this
pandemic. However, the tweeting patterns about online learning during COVID-19, with respect to
the gender of twitter users, have not been investigated in any prior work in this field.
1.3. Gender Diversity on Social Media Platforms
Gender differences in content creation online have been comprehensively studied by researchers
from different disciplines [45] as such differences have been considered important in the investigation
of digital divides that produce inequalities of experience and opportunity [46,47]. Analysis of gender
diversity and the underlying patterns of content creation on social media platforms has also been
widely investigated [48]. However, the findings are mixed. Some studies have concluded that males
are more likely to express themselves on social media as compared to females [4951], while others
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
4
found no such difference between genders [5254]. The gender diversity related to the usage of social
media platforms has varied over the years in different geographic regions [55]. For instance, Figure
1 shows the variation in social media use by gender from the findings of a survey conducted by the
Pew Research Center from 2005 to 2021 [56].
Figure 1. The variation of social media use by gender from the findings of a survey conducted by the
Pew Research Center from 2005 to 2021.
Table 1. Gender Diversity in Different Social Media Platforms.
Social Media Platform
Percentage of Male Users
Percentage of Female Users
Twitter
63
37
Instagram
Tumblr
WhatsApp
WeChat
Quora
Facebook
LinkedIn
Telegram
Sina Weibo
QQ
SnapChat
51.8
52
53.2
53.5
55
56.3
57.2
58.6
51
51.7
48.2
48.2
48
46.7
46.5
45
43.7
42.8
41.4
49
48.3
51
In general, most social media platforms tend to exhibit a notable preponderance of male users
over their female counterparts, for example WhatsApp [57], Sina Weibo [58], QQ [59], Telegram
[60], Quora [61], Tumblr [62], Facebook, LinkedIn, Instagram [63], and WeChat [64]. Nevertheless,
there do exist exceptions to this prevailing trend. Snapchat has male and female users accounting for
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
5
48.2% and 51%, respectively [65]. These statistics about the percentage of male and female users in
different social media platforms are summarized in Table 1. As can be seen from Table 1, Twitter has
the highest gender gap as compared to several social media platforms such as Instagram, Tumblr,
WhatsApp, WeChat, Quora, Facebook, LinkedIn, Telegram, Sina Weibo, QQ, and SnapChat.
Therefore, the work presented in this paper focuses on the analysis of user diversity-based (with a
specific focus on gender) patterns of public discourse on Twitter in the context of online learning
during COVID-19.
The rest of this paper is organized as follows. In Section 2, a comprehensive review of recent
works in this field is presented. Section 3 discusses the methodology that was followed for this work.
The results and scientific contributions of this study are presented and discussed in Section 4. It is
followed by Section 5, which summarizes the contributions of this study and outlines the scope of
future research in this area.
2. Literature Review
This section is divided into two parts. Section 2.1 presents an overview of the recent works
related to sentiment analysis of tweets about COVID-19. In Section 2.2, a review of emerging works
in this field is presented where the primary focus was the analysis of tweets about online learning
during COVID-19.
2.1. A Brief Review of Recent Works related to Sentiment Analysis of Tweets about COVID-19
Villavicencio et al. [66] analyzed tweets to determine the sentiment of people towards the
Philippines government, regarding their response to COVID-19. They used the Naïve Bayes model
to classify the tweets as positive, negative, and neutral. Their model achieved an accuracy of 81.77%.
Boon-Itt et al. [67] conducted a study using Twitter data to gain insights into public awareness and
concerns related to the COVID-19 pandemic. They conducted sentiment analysis and topic modeling
on a dataset of over 100,000 tweets related to COVID-19. Taking a slightly different angle, Marcec et
al. [68] analyzed 701,891 tweets mentioning the COVID-19 vaccines, specifically AstraZeneca/Oxford,
Pfizer/BioNTech, and Moderna. They used the AFINN lexicon to calculate the daily average
sentiment. The findings of this work showed that Pfizer and Moderna remained consistently positive
as opposed to AstraZeneca which showed a declining trend. Machuca et al. [69] focused on
evaluating the general public sentiment towards COVID-19. They used a Logistic Regression-based
approach to classify the tweets as positive or negative. The methodology achieved 78.5% accuracy.
Kruspe et al. [70] performed sentiment analysis of Tweets about COVID-19 from Europe and their
approach used a neural network for performing sentiment analysis. Similarly, the works of Vijay et
al. [71], Shofiya et al. [72], and Sontayasara et al. [73] focused on sentiment analysis of tweets about
COVID-19 from India, Canada, and Thailand, respectively. Nemes et al. [74] used a Recurrent Neural
Network for sentiment classification of the tweets about COVID-19.
Okango et al. [75] employed a dictionary-based method for detecting sentiments in tweets about
COVID-19. Their work indicated that mental health issues and lack of supplies were a direct result
of the pandemic. The work of Singh et al. [76] focused on a deep-learning approach for sentiment
analysis of tweets about COVID-19. Their algorithm was based on an LSTM-RNN-based network
and enhanced featured weighting by attention layers. Kaur et al. [77] developed an algorithm, the
Hybrid Heterogeneous Support Vector Machine (H-SVM), for sentiment classification. The algorithm
was able to categorize tweets as positive, negative, and neutral as well as detect the intensity of
sentiments. In [78], Vernikou et al. implemented sentiment analysis through seven different deep-
learning models based on LSTM neural networks. Sharma et al. [79] studied the sentiments of people
towards COVID-19 from the USA and India using text mining-based approaches. The authors also
discussed how their findings could provide guidance to authorities in healthcare to tailor their
support policies in response to the emotional state of their people. Sanders et al. [80] took a slightly
different approach to also aid policy makers, by analyzing over one million tweets to illustrate public
attitudes towards mask-wearing during the pandemic. They observed both the volume and polarity
of tweets relating to mask-wearing increased over time. Alabid et al. [81] used two machine learning
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
6
classification models - SVM and Naïve Bayes classifier to perform sentiment analysis of tweets related
to COVID-19 vaccines. Mansoor et al. [82] used Long Short-Term Memory (LSTM) and Artificial
Neural Networks (ANN) to perform sentiment analysis of the public discourse on Twitter about
COVID-19. Singh et al. [83] studied two datasets, one of tweets from people all over the world and
the second restricted to tweets only by Indians. They conducted sentiment analysis using the BERT
model and achieved a classification accuracy of 94%. Imamah et al. [84] conducted a sentiment
classification of 355384 tweets using Logistic Regression. The objective of their work was to study the
negative effects of ‘stay at home’ on people’s mental health. Their model achieved a sentiment
classification accuracy of 94.71%. As can be seen from this review, a considerable number of works
in this field have focused on the sentiment analysis of tweets about COVID-19. In the context of online
learning during COVID-19, understanding the underlying patterns of public emotions becomes
crucial and this has been investigated in multiple prior works in this field. A review of the same is
presented in Section 2.2.
2.2. Review of Recent Works in Twitter Data Mining and Analysis related to Online Learning during
COVID-19
Sahir et al. [85] used the Naïve Bayes classifier to perform sentiment analysis of tweets about
online learning posted in October 2020 from individuals in Indonesia. The results showed that the
percentage of negative, positive, and neutral tweets were 74%, 25%, and 1%, respectively. Althagafi
et al. [86] analyzed tweets about online learning during COVID-19 posted by individuals from Saudi
Arabia. They used the Random Forest approach and the K-Nearest Neighbor (KNN) classifier
alongside Naïve Bayes and found that most tweets were neutral about online learning. Ali [87] used
Naïve Bayes, Multinomial Naïve Bayes, KNN, Logistic Regression, and SVM to analyze the public
opinion towards online learning during COVID-19. The results showed that the SVM classifier
achieved the highest accuracy of 89.6%. Alcober et al. [88] reported the results of multiple machine
learning approaches such as Naïve Bayes, Logistic Regression, and Random Forest for performing
sentiment analysis of tweets about online learning.
While Remali et al. [89] also used Naïve Bayes and Random Forest, their research additionally
utilized the Support Vector Machine (SVM) approach and a Decision Tree-based modeling. The
classifiers evaluated tweets posted between July 2020 and August 2020. The results showed that the
SVM classifier using the VADER lexicon achieved the highest accuracy of 90.41% [89]. The work of
Senadhira et al. [90] showed that an Artificial Neural Network (ANN)-based approach outperformed
an SVM-based approach for sentiment analysis of tweets about online learning. Lubis et al. [91] used
a KNN-based method for sentiment analysis of tweets about online learning. The model achieved a
performance accuracy of 88.5% and showed that a higher number of tweets were positive. These
findings are consistent with another study [92] which reported that for tweets posted between July
2020 and August 2020, 54% were positive tweets. The findings of the work by Isnain et al. [93]
indicated that the public opinion towards online learning between February 2020 and September 2020
was positive. These results were computed with a KNN-based approach that reported an accuracy
of 84.65%.
Aljabri et al. [94] analyzed results at different education stages. Using Term Frequency-Inverse
Document Frequency (TF-IDF) as a feature extraction to a Logistic Regression classifier, the model
developed by the authors achieved an accuracy of 89.9%. The results indicated positive sentiment
from elementary through high school, but negative sentiment for universities. The work by Asare et
al. [95] aimed to cluster the most commonly used words into general topics or themes. The analysis
of different topics found 48.9% of positive tweets, with “learning,” “COVID,” “online,” and
“distance” being the most used words. Mujahid et al. [96] used TF-IDF alongside Bag of Words (BoW)
for analyzing tweets about online learning. They also used SMOTE to balance the data. The results
demonstrated that the Random Forest and SVM classifier achieved an accuracy of 95% when used
with the BoW features. Al-Obeidat [97] also used TF-IDF to classify sentiments related to online
education during the pandemic. The study reported that students had generally negative feelings
towards online learning. In view of the propagation of misinformation on Twitter during the
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
7
pandemic, Waheeb et al. [98] proposed eliminating noise using AutoEncoder in their work. The
results found that their approach yielded a higher accuracy for sentiment analysis, with an F1-score
value of 0.945. Rijal et al. [99] aimed to remove bias from sentiment analysis using concepts of feature
selection. Their methodology involved the usage of the AdaBoost approach on the C4.5 method. The
results found that the accuracy of C4.5 and Random Forest went up from 48.21% and 50.35% to
94.47% for detecting sentiments in tweets about online learning. Martinez [101] investigated negative
sentiments about “teaching and schools” and “teaching and online” using multiple concepts of
Natural Language Processing. Their study reported negativity towards both topics. At the same time,
a higher negative sentiment along with expressions of anger, distrust, or stress towards “teaching
and school,” was observed.
As can be seen from this review of works related to the analysis of public discourse on Twitter
about online learning during COVID-19, such works have multiple limitations centered around lack
of reporting from multiple sentiment analysis approaches to explain the trends of sentiments, lack of
focus on subjectivity analysis, lack of focus on toxicity analysis, and lack of focus on gender-specific
tweeting patterns. Addressing these research gaps serves as the main motivation for this work.
3. Methodology
This section presents the methodology that was followed for this research work. This section is
divided into 2 parts. In Section 3.1 a description of the dataset that was used for this research work is
presented. Section 3.2 discusses the procedure and the methods that were followed for this research
work.
3.1. Data Description
The dataset used for this research was proposed in [102]. The dataset consists of about 50,000
unique Tweet IDs of Tweets about online learning during COVID-19. The dataset covers tweets
posted on Twitter between November 9, 2021, and July 13, 2022. The dataset includes tweets in 34
different languages, with English being the most common. The dataset spans 237 different days, with
the highest tweet count recorded on January 5, 2022. These tweets were posted by 17,950 distinct
Twitter users, with a combined follower count of 4,345,192,697. The dataset includes 3,273,263
favorites and 556,980 retweets. Furthermore, 5,722 tweets in the dataset were posted by verified
Twitter accounts, while the rest came from unverified accounts. There are a total of 7,869 distinct
URLs embedded in these tweets. The Tweets IDs present in this dataset are organized into nine .txt
files based on the date range of the tweets. The dataset was developed by mining tweets that referred
to COVID-19 and online learning at the same time. To perform the same, a collection of synonyms of
COVID-19 such as COVID, COVID19, coronavirus, Omicron, etc., and a collection of synonyms of
online learning such as online education, remote education, remote learning, e-learning etc. were
used. Thereafter, duplicate tweets were removed to obtain a collection of about 50,000 Tweet IDs. The
standard procedure for working with such a dataset is the hydration of the Tweet IDs. However, this
dataset was developed by the first author of this paper. So, the tweets were already available, and
hydration was not necessary. In addition to the Tweet IDs, the dataset file that was used comprised
several characteristic properties of Tweets and Twitter users who posted these Tweets, such as the
Tweet Source, Tweet Text, Retweet count, user location, username, user favorites count, user follower
count, user friends, count, user screen name, and user status count.
The dataset complies with the FAIR principles (Findability, Accessibility, Interoperability, and
Reusability) of scientific data management. It is designed to be findable through a unique and
permanent DOI. It is accessible online for users to locate and download. The dataset is interoperable
as it uses .txt files, enabling compatibility across various computer systems and applications. Finally,
it is reusable because researchers can obtain tweet-related information, such as user ID, username,
and retweet count, for all Tweet IDs through a hydration process, facilitating data analysis and
interpretation while adhering to Twitter policies.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
8
3.2. System Design and Development
At first, the data preprocessing of these Tweets was performed by writing a program in Python
3.11.5. The data preprocessing involved the following steps. The pseudocode of this program is
shown in Algorithm 1.
a) Removal of characters that are not alphabets.
b) Removal of URLs
c) Removal of hashtags
d) Removal of user mentions
e) Detection of English words using tokenization.
f) Stemming and Lemmatization.
g) Removal of stop words.
h) Removal of numbers
Algorithm 1: Data Preprocessing
Input: Dataset
Output: New Attribute of Preprocessed Tweets
File Path
Read data as dataframe
English words: nltk.download('words')
Stopwords: nltk.download('stopwords')
Initialize an empty list to store preprocessed text
corpus[]
for i from 0 to n do
Obtain Text of the Tweet (‘text’ column)
// RegEx to remove characters that are not alphabets
text = re.sub('[^a-zA-Z]', ' ', string)
text = re.sub(r'http\S+', '', string) // RegEx to remove URLs
text = text.lower()
text = review.split()
ps = PorterStemmer() //stemming
all_stopwords = stopwords.words('english')
text = [ps.stem(word) for word in text if not word in set(all_stopwords)]
text = ' '.join(review)
//Regex to remove user mentions and special characters
text = ' '.join(re.sub("(#[A-Za-z0-9]+)|
(@[A-Za-z0-9]+)|([^0-9A-Za-z\t])|
(\w+:\/\/\S+)", " ", string).split())
text = ''.join("" if c.isdigit() else c for c in text)
text = ' '.join(w for w in nltk.wordpunct_tokenize(review) if w.lower() in words)
corpus append(text)
End of for loop
New Attribute Preprocessed Text (from corpus)
After performing data preprocessing, the GenderPerformr package in Python developed by
Wang et al. [104,105] was applied to the usernames to detect their gender. GenderPerformr uses an
LSTM model built in PyTorch to analyze usernames and detect genders in terms of male or female.
The working of this algorithm was extended to classify usernames into 4 categories male, female,
none, and maybe. The algorithm classified a username as 'male' if that username matched a male
name from the list of male names accessible to this Python package. Similarly, the algorithm classified
a username as 'female' if that username matched a female name from the list of female names
accessible to this Python package. The algorithm classified a username as 'none' if that username was
a word in the English dictionary that cannot be a person's name. Finally, the algorithm classified a
username as 'maybe' if the username was a word absent in the list of male and female names
accessible to this Python package and the username was also not an English word. The classification
performed by this algorithm was manually verified. Furthermore, all the usernames that were
classified as ‘maybe’ were manually classified as male, female, or none. The pseudocode that was
written in Python 3.11.5 to detect genders from Twitter usernames is presented as Algorithm 2.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
9
Algorithm 2: Detect Gender from Twitter Usernames
Input: Dataset
Output: File with the Gender of each Twitter User
File Path
Read data as dataframe
procedure
P
REDICT
G
ENDER
(csv file)
gp Initialize GenderPerformr
output file Initialize empty text file
regex Initialize RegularExpression
df Read csv file into Dataframe
for
each column in df
do
if
column is user name column
then
name values Extract values of the column
end if
End of for loop
for
each name in name values
do
if
name is ”null”, ”nan”, empty, or None
then
Write
name and
”None” to Gender
else if
name does not match regex and name is not None
then
Write name to output file
Count number of words in name
if
words
>
1
then
splittedname Split name by spaces
name First element of splittedname
end if
str result
Perform gender prediction
using gp gender
str result extract
gender
if
gender 2 is ”M”
then
Write ”Male” to Gender
else if
gender 2 is ”F”
then
Write ”Female” to
Gender
else if gender 2 is empty or whitespace then
Write ”None” to Gender
else
if
name in lowercase exists in set of common words
then
Write ”None” to Gender
else
Write ”Maybe” to Gender
end if
end if
else
Write name and ”None” to Gender
end if
End of for loop
end
procedure
Write df with a new “Gender” information to a new .CSV file
Export .CSV file
Thereafter, three different models for sentiment analysis VADER, Afinn, and TextBlob were
applied to the Tweets. VADER (Valence Aware Dictionary and sEntiment Reasoner), developed by
Hutto et al. [106] is a lexicon and rule-based sentiment analysis tool that is specifically attuned to
sentiments expressed in social media. The VADER approach can analyze a text and classify it as
positive, negative, or neutral. Furthermore, it can also detect the compound sentiment score and the
intensity of the sentiment (0 to +4 for positive sentiment and 0 to -4 for negative sentiment) expressed
in a given text. The AFINN lexicon developed by Nielsen is also used to analyze Twitter sentiment
[107]. The AFINN lexicon is a list of English terms manually rated for valence with an integer between
-5 (negative) and +5 (positive). Finally, TextBlob, developed by Lauria [108] is a lexicon-based
sentiment analyzer that also uses a set of predefined rules to perform sentiment analysis and
subjectivity analysis. The sentiment score lies between (-1 to 1) where -1 identifies the most negative
words such as ‘disgusting’, ‘awful’, and ‘pathetic’, and 1 identifies the most positive words like
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
10
‘excellent’, and ‘best’. The subjectivity score lies between (0 and 1), It shows the amount of personal
opinion, if a sentence has high subjectivity i.e., close to 1, it resembles that the text contains more
personal opinion than factual information. These three approaches for performing sentiment analysis
of tweets have been very popular as can be seen from several recent works in this field which used
VADER [109112], Afinn [113116], and TextBlob [117120]. The pseudocodes of the programs that
were written in Python 3.11.5 to apply VADER, Afinn, and TextBlob to these Tweets are shown in
Algorithms 3, 4, and 5, respectively.
Algorithm 3: Detect Sentiment of Tweets Using VADER
Output: File with Sentiment of each Tweet
File Path
Read data as dataframe
Import VADER
sid obj Initialize SentimentIntensityAnalyzer
for each row in df[’PreprocessedTweet’] do
tweet text df[’PreprocessedTweet’][row]
if tweet text is null then
sentiment score 0
else
sentiment_dict = sid_obj.polarity_scores(df[’PreprocessedTweet’][row])
compute sentiment_dict['compound']
sentiment score compound sentiment
end if
if sentiment score >= 0.05 then
sentiment ’positive’
else if sentiment score <= -0.05 then
sentiment ’negative’
else
sentiment ’neutral’
end if
df [row] compound sentiment and sentiment score
End of for loop
Write df with two new attributes sentiment class and sentiment score to a new .CSV
file
Algorithm 4: Detect Sentiment of Tweets Using Afinn
Output: File with Sentiment of each Tweet
File Path
Read data as dataframe
Import Afinn
afn Instantiate Afinn
for each row in df[’PreprocessedTweet’] do
tweet text df[’PreprocessedTweet’][row]
if tweet text is null then
sentiment score 0
else
apply afn.score() to df[’PreprocessedTweet’][row]
sentiment score afn.score(df[’PreprocessedTweet’][row])
end if
if sentiment score > 0 then
sentiment ’positive’
else if sentiment score < 0 then
sentiment ’negative’
else
sentiment ’neutral’
end if
df [row] sentiment and sentiment score
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
11
file
Algorithm 5: Detect Polarity and Subjectivity of Tweets Using TextBlob
Output: File with metrics for polarity and subjectivity of each Tweet
File Path
Read data as dataframe
Import TextBlob
Initialize Lists for Blob, Polarity, Subjectivity, Polarity Class, and Subjectivity Class
for row in df[’PreprocessedTweet’] do
convert item to TextBlob and append to Blob List
End of for loop
for each blob in Blob List do
for each sentence in blob do
calculate polarity and subjectivity
append them to Polarity and Subjectivity Lists respectively
End of for loop
End of for loop
for each value in Polarity List do
if (p > 0):
pclass.append('Positive')
elif (p < 0):
pclass.append('Negative')
else:
pclass.append('Neutral')
End of for loop
for each value in Subjectivity List do
if (s > 0.6):
sclass.append('Highly Opinionated')
elif (s < 0.4):
sclass.append('Least Opinionated')
else:
sclass.append('Neutral')
End of for loop
Write df with four new attributes polarity, polarity class, subjectivity, and subjectivity
class to a new CSV file
Thereafter, toxicity analysis of these Tweets was performed using the Detoxify package [121]. It
includes three different trained models and outputs different toxicity categories. These models are
trained on data from the three Kaggle jigsaw toxic comment classification challenges [122124]. Using
this package, each Tweet received a score in terms of the degree of toxicity, obscene content, identity
attack, insult, threat, and sexually explicit content. The pseudocode of the program that was written
in Python 3.11.5 to apply the Detoxify package to these Tweets is shown in Algorithm 6.
Algorithm 6: Perform Toxicity Analysis of the Tweets Using Detoxify
Input: Preprocessed Dataset (output from Algorithm 1)
Output: File with metrics of toxicity for each Tweet
File Path
Read data as dataframe
Import Detoxify
Instantiate Detoxify
predictor = Detoxify('multilingual')
Initialize Lists for toxicity, obscene, identity attack, insult, threat, and sexually explicit
for each row in df[’PreprocessedTweet’] do
apply predictor.predict() to df[’PreprocessedTweet’][row]
data predictor.predict (df[’PreprocessedTweet’][row])
toxic_value = data['toxicity']
obscene_value = data['obscene']
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
12
identity_attack_value = data['identity_attack']
insult_value = data['insult']
threat_value = data['threat']
sexual_explicit_value = data['sexual_explicit']
append values to lists for toxicity, obscene, identity attack, insult, threat, and
sexually explicit
score [] = scores for toxicity, obscene, identity attack, insult, threat, and sexually
explicit
max_value = maximum value in Score[]
label = class for max_value
append values to the corpus
End of for loop
data = []
for each i from 1 to n do:
create an empty list tmp
append tweet id, text, score[],max_value, and label to tmp
append tmp to data
End of for loop
Write new attributes toxicity, obscene, identity attack, insult, threat, and sexually
explicit, and label to a new CSV file
Export .CSV file
Figure 2 represents a flowchart summarizing the working of Algorithms 1 to 6. In addition to
the above, average activity analysis of different genders (male, female, and none) was also performed.
The pseudocode of the program that was written in Python 3.11.5 to compute and analyze the average
activity of different genders is shown in Algorithm 7. This program uses the formula for the total
activity calculation of a twitter user which was proposed in an earlier work in this field [125]. This
formula is shown in Equation (1):
Activity of a Twitter User = Author Tweets count + Author favorites count (1
)
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
13
Figure 2. A flowchart representing the working of Algorithm 1 to Algorithm 6 for the development
of the master dataset.
Algorithm 7: Compute Average Activity of different Genders on a monthly basis
Output: Average Activity per gender per month
File Path
Read data as dataframe
Initialize lists for distinct males, distinct females, and distinct none
for each row in df[’created_at’] do
extract month and year
append data
End of for loop
Create new attribute month_year to hold month and year
for each month in df[’month_year’] do
d_males = number of distinct males based on df[’user_id’] and df[’gender’]
d_females = number of distinct females based on df[’user_id’] and df[’gender’]
d_none = calculate number of distinct none based on df[’user_id’] and df[’gender’]
for each male in d_males
activity = author tweets count + author favorites count
males_total_activity = males_total_activity + activity
End of for loop
males_avg_activity = males_total_activity/d_males
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
14
females_total_activity = females_total_activity + activity
End of for loop
females_avg_activity = females_total_activity/d_females
for each none in d_none
activity = author tweets count + author favorites count
none_total_activity = none_total_activity + activity
End of for loop
none_avg_activity = none_total_activity/d_none
Finally, the trends in tweeting patterns related to online learning from different geographic
regions were also analyzed to understand the gender-specific tweeting patterns from different
geographic regions. To perform this analysis, the PyCountry [127] package was used. Specifically,
the program that was written in Python applied the fuzzy search function available in this package
to detect the country of a twitter user based on the publicly listed city, county, state, or region on their
Twitter profile. Algorithm 8 shows the pseudocode of the Python program that was written to
perform this task. The results of applying all these algorithms on the dataset are discussed in Section
4.
Algorithm 8: Detect Locations of Twitter Users, Visualize Gender-Specific Tweeting
Patterns
Input: Dataset
Output: File with locations (country) of each user, visualization of gender-specific
tweeting patterns
File Path
Read data as dataframe
Import PyCountry
Import Folium
Import Geodata data package
for each row in df[’user_location’] do
location_values = columnSeriesObj.values
End of for loop
For each location in location_values
If location
is ”null”, ”nan”, empty, or None
then
country = none
Else
If spaces = location.count(' ')
if (spaces>0):
for word in location.split():
country = pycountry.countries.search_fuzzy(word)
defaultcountry = country.name
if (spaces=)
country = pycountry.countries.search_fuzzy(location)[0]
append values to corpus
End of for loop
write new attribute “country” to the dataset
df pivotdata ’user location’ as the index and ’Gender’ as attributes
pivotdata [attributes] ’Female’, ’Male’, and ’None’
pivot data [total] - add ’Male’, ’Female’, and ’None’ columns
Instantiate Folium map m
define threshold scale list of threshold values for colored bins
choropleth layer custom color scale, ranges, and opacity
pivotdata [key] mapping
legend name pivotdata [attributes]
GenerateMap()
4. Results and Discussion
This section presents and discusses the results of this study. As stated in Section 3, Algorithm 2
was run on the dataset to detect the gender of each Twitter user. After obtaining the output from this
algorithm, the classifications were manually verified as well and the ‘maybe’ labels were manually
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
15
classified as either male, female, or none. Thereafter, the dataset contained only three labels for the
“Gender” attribute male, female, and none. Figure 3 shows a pie chart-based representation of the
same. As can be seen from Figure 3, out of the tweets posted by males and females, males posted a
higher percentage of the tweets.
Figure 3. A pie chart to represent different genders from the “Gender” attribute.
The results obtained from Algorithm 3 are presented next. Figure 4 presents a pie chart to show
the percentage of tweets in each of the sentiment classes (positive, negative, and neutral) by taking
all the genders together. As can be seen from this Figure, the percentages of positive, negative, and
neutral tweets as per VADER were 41.704%, 29.932%, and 28.364%, respectively.
Figure 4. A pie chart to represent the distribution of positive, negative, and neutral sentiments (as per
VADER) in the tweets.
Next, for each sentiment class (positive, negative, and neutral) the distribution in terms of tweets
posted by males, females, and twitter accounts assigned a none gender was calculated. The results of
the same are shown in Figures 5 to 7.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
16
Figure 5. A pie chart to represent the percentage of positive tweets (as per VADER) posted by each
gender.
Figure 6. A pie chart to represent the percentage of negative tweets (as per VADER) posted by each
gender.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
17
Figure 7. A pie chart to represent the percentage of neutral tweets (as per VADER) posted by each
gender.
As can be seen from Figures 5 to 7, for each sentiment label (positive, negative, and neutral)
between males and females, males posted a higher percentage of tweets. A similar analysis was
performed by applying Algorithm 4 and 5 on the dataset. The results of applying Algorithm 4 are
presented in Figures 8 to 11.
Figure 8. A pie chart to represent the distribution of positive, negative, and neutral sentiments (as per
Afinn) in the tweets.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
18
Figure 9. A pie chart to represent the percentage of positive tweets (as per Afinn) posted by each
gender.
Figure 10. A pie chart to represent the percentage of negative tweets (as per Afinn) posted by each
gender.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
19
Figure 11. A pie chart to represent the percentage of neutral tweets (as per Afinn) posted by each
gender.
As can be seen from Figure 8, the percentage of positive tweets (as per the Afinn approach for
sentiment analysis) was higher than the percentage of negative and neutral tweets. This is consistent
with the findings from VADER (presented in 4). Furthermore, Figures 9 to 11 show that for each
sentiment label (positive, negative, and neutral) between males and females, males posted a higher
percentage of the tweets. This finding is also consistent with the results obtained from VADER as
shown in Figures 5 to 7. The results of applying TextBlob for performing sentiment analysis are
shown in Figures 12 to 15.
Figure 12. A pie chart to represent the distribution of positive, negative, and neutral sentiments (as
per TextBlob) in the tweets.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
20
Figure 13. A pie chart to represent the percentage of positive tweets (as per TextBlob) posted by each
gender.
Figure 14. A pie chart to represent the percentage of negative tweets (as per TextBlob) posted by each
gender.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
21
Figure 15. A pie chart to represent the percentage of neutral tweets (as per TextBlob) posted by each
gender.
From Figure 12, it can be inferred that, as per Afinn, the percentage of positive tweets was higher
as compared to the percentage of negative and neutral tweets. This is consistent with the results of
VADER (Figure 4) and Afinn (Figure 8). Furthermore, Figures 13 to 15 show that for each sentiment
class (positive, negative, and neutral), between males and females, males posted a higher percentage
of the tweets. Once again, the results are consistent with the observations from VADER (Figures 5 to
7) and Afinn (Figures 9 to 11). In addition to sentiment analysis, TextBlob also computed the
subjectivity of each tweet and categorized each tweet as highly opinionated, least opinionated, or
neutral. The results of the same are shown in Figures 16 to 19.
Figure 16. A pie chart to represent the results of subjectivity analysis using TextBlob.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
22
Figure 17. A pie chart to represent the percentage of highly opinionated tweets (as per TextBlob)
posted by each gender.
Figure 18. A pie chart to represent the percentage of least opinionated tweets (as per TextBlob) posted
by each gender.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
23
Figure 19. A pie chart to represent the percentage of tweets for the neutral subjectivity class (as per
TextBlob) posted by each gender.
As can be seen from Figure 16, more than a majority of the tweets were least opinionated. To
add to this, Figures 17 to 19 show that for each subjectivity class (i.e. highly opinionated, least
opinionated, and neutral), between males and females, males posted a higher percentage of the
tweets. The results obtained from Algorithm 6 are discussed next. This algorithm analyzed all the
tweets and categorized them into one of toxicity classes - toxicity, obscene, identity attack, insult,
threat, and sexually explicit. The number of tweets that were classified into each of these classes was
36081, 8729, 3411, 1165, 18, and 4, respectively. This is shown in Figure 20. Thereafter, the percentage
of tweets posted by each gender for each of these categories of toxic content was analyzed and the
results are presented in Figures 21 to 26.
Figure 20. Representation of the variation of different categories of toxic content present in the
tweets.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
24
Figure 21. A pie chart to represent the percentage of tweets for the toxicity class (as per Detoxify)
posted by each gender.
Figure 22. A pie chart to represent the percentage of tweets for the obscene class (as per Detoxify)
posted by each gender.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
25
Figure 23. A pie chart to represent the percentage of tweets for the identity attack class (as per
Detoxify) posted by each gender.
Figure 24. A pie chart to represent the percentage of tweets for the insult class (as per Detoxify) posted
by each gender.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
26
Figure 25. A pie chart to represent the percentage of tweets for the threat class (as per Detoxify) posted
by each gender.
Figure 26. A pie chart to represent the percentage of tweets for the sexually explicit class (as per
Detoxify) posted by each gender.
From Figures 21 to 24, it can be seen that for the classes toxicity, obscene, identity attack, and
insult, between males and females, males posted a higher percentage of the tweets. Furthermore,
Figure 25 shows that there wasn’t any tweet from females that was assigned a threat label. Figure 26
shows that for those tweets that were categorized as sexually explicit, between males and females,
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
27
females posted a higher percentage of those tweets. It is worth mentioning here that the results of
Figures 25 and 26 are based on data that constitutes less than 1% of the tweets present in the dataset.
So, in a real-world scenario, these percentages could vary when a greater number of tweets are posted
for each of the two categories threat and sexually explicit.
In addition to analyzing the varying trends in sentiments and toxicity, the content of the
underlying tweets was also analyzed using word clouds. For generation of these word clouds the top
100 words (in terms of frequency were considered). To perform the same, a consensus of sentiment
labels from the three different sentiment analysis approaches was considered. For instance, to prepare
a word cloud of positive tweets, all those tweets that were labeled as positive by VADER, Afinn, and
TextBlob were considered. A word cloud was developed to represent the same. Thereafter, for all the
positive tweets, gender-specific tweeting patterns were also analyzed to compute the top 100 words
used by males for positive tweets, the top 100 words used by females for positive tweets, and the top
100 words used by twitter accounts associated with a none gender label. A high degree of overlap in
terms of the 100 words for all these scenarios was observed. More specifically, a total of 79 words
were common amongst the lists of the top 100 words for positive tweets, the top 100 words used by
males for positive tweets, the top 100 words used by females for positive tweets, and the top 100
words used by Twitter accounts associated with a none gender label. So, to avoid redundancy, Figure
27 shows a word cloud-based representation of the top 100 words used in positive tweets. Similarly,
a high degree of overlap in terms of the 100 words was also observed for the analysis of different lists
for negative tweets and neutral tweets. So, to avoid redundancy, Figures 28 and 29 show word cloud-
based representations of the top 100 words used in negative tweets and neutral tweets, respectively.
In a similar manner, the top 100 frequently used words for the different subjectivity classes were also
computed and word cloud-based representations of the same are shown in Figures 30 to 32.
Figure 27. A word cloud-based representation of the 100 most frequently used in positive tweets.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
28
Figure 28. A word cloud-based representation of the 100 most frequently used in negative tweets.
Figure 29. A word cloud-based representation of the 100 most frequently used in neutral tweets.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
29
Figure 30. A word cloud-based representation of the 100 most frequently used words in tweets that
were highly opinionated.
Figure 31. A word cloud-based representation of the 100 most frequently used words in tweets that
were least opinionated.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
30
Figure 32. A word cloud-based representation of the 100 most frequently used words in tweets that
were categorized as having a neutral opinion.
After performing this analysis, a similar word frequency-based analysis was performed for the
different categories of toxic content that were detected in the tweets using Algorithm 6. These classes
were toxicity, obscene, identity attack, insult, threat, and sexually explicit. As explained in Algorithm
6, each tweet was assigned a score for each of these classes and whichever class received the highest
score, the label of the tweet was decided accordingly. For instance, if the toxicity score for a tweet
was higher than the scores that the tweet received for the classes - obscene, identity attack, insult,
threat, and sexually explicit, then the label of that tweet was assigned as toxicity. Similarly, if the
obscene score for a tweet was higher than the scores that the tweet received for the classes - toxicity,
identity attack, insult, threat, and sexually explicit, then the label of that tweet was assigned as
obscene. The results of this word cloud-based analysis for the top 100 words (in terms of frequency)
for each of these classes are shown in Figures 33 to 38.
Figure 33. A word cloud-based representation of the 100 most frequently used words in tweets that
belonged to the toxicity category.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
31
Figure 34. A word cloud-based representation of the 100 most frequently used words in tweets that
belonged to the obscene category.
Figure 35. A word cloud-based representation of the 100 most frequently used words in tweets that
belonged to the identity attack category.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
32
Figure 36. A word cloud-based representation of the 100 most frequently used words in tweets that
belonged to the insult category.
Figure 37. A word cloud-based representation of the 100 most frequently used words in tweets that
belonged to the threat category.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
33
Figure 38. A word cloud-based representation of the 100 most frequently used words in tweets that
belonged to the threat category.
As can be seen from Figures 33 to 38 the patterns of communication were diverse for each of the
categories of toxic content designated by the classes - toxicity, identity attack, insult, threat, and
sexually explicit. At the same time, Figures 37 and 38 appear significantly different in terms of the
top 100 words used. This also shows that for tweets that were categorized as threat (Figure 37) and
as containing sexually explicit content (Figure 38) the paradigms of communication and information
exchange in those tweets were very different as compared to tweets categorized into any of the
remaining classes representing toxic content. In addition to performing this word cloud-based
analysis, the scores each of these classes received were analyzed to infer the trends of their intensities
over time. To perform this analysis, the mean value of each of these classes was computed per month
and the results were plotted in a graphical manner as shown in Figure 39.
Figure 39. A graphical representation of the variation of the intensities of different categories of toxic
content on a monthly basis.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
34
From Figure 39, several insights related to the tweeting patterns of the general public can be
inferred. For instance, the intensity of toxicity was higher than the intensity of obscene, identity
attack, insult, threat, and sexually explicit content. Similarly, the intensity of insult was higher than
the intensity of obscene, identity attack, threat, and sexually explicit content. Next, gender-specific
tweeting patterns for each of these categories of toxic content were analyzed to understand the trends
of the same. These results are shown in Figures 40 to 45. This analysis also helped to unravel multiple
paradigms of tweeting behavior of different genders in the context of online learning during COVID-
19. For instance, Figures 40 and 44 show that the intensity of toxicity and threat in tweets by males
and females has increased since July 2022. The analysis shown in Figure 41, shows that the intensity
of obscene content in tweets by males and females has decreased since May 2022.
Figure 40. A graphical representation of the variation of the intensity of toxicity on a monthly basis
by different genders.
Figure 41. A graphical representation of the variation of the intensity of obscene content on a monthly
basis by different genders.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
35
Figure 42. A graphical representation of the variation of the intensity of identity attacks on a monthly
basis by different genders.
Figure 43. A graphical representation of the variation of the intensity of insult on a monthly basis by
different genders.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
36
Figure 44. A graphical representation of the variation of the intensity of threat on a monthly basis by
different genders.
Figure 45. A graphical representation of the variation of the intensity of sexually explicit content on a
monthly basis by different genders.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
37
Figure 46. A graphical representation of the variation of the average activity on twitter (in the context
of tweeting about online learning during COVID-19) on a monthly basis.
The result of Algorithm 7 is shown in Figure 46. As can be seen from this Figure, between males
and females, the average activity of females has been higher in all months other than March 2022.
The results from Algorithm 7 are presented in Figures 47 and 48, respectively. Figure 47 shows the
trends in tweets about online learning during COVID-19 posted by males from different countries of
the world. Similarly, Figure 48 shows the trends in tweets about online learning during COVID-19
posted by females from different countries of the world.
Figure 47. Representation of the trends in tweets about online learning during COVID-19 posted by
males from different countries of the world.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
38
Figure 48. Representation of the trends in tweets about online learning during COVID-19 posted by
females from different countries of the world.
Figures 47 and 48 reveal the patterns of posting tweets by males and females about online
learning during COVID-19. These patterns include similarities as well as differences. For instance,
from these two figures, it can be inferred that in India a higher percentage of the tweets were posted
by males as compared to females. However, in Australia, a higher percentage of the tweets were
posted by females as compared to males. Finally, a comparative study is presented in Table 2 where
the focus area of this work is compared with the focus areas of prior areas in this field to highlight its
novelty and relevance. As can be seen from this Table, the work presented in this paper is the first
work in this area of research where the focus area has included text analysis, sentiment analysis,
analysis of toxic content, and subjectivity analysis of tweets about online learning during COVID-19.
It is worth mentioning here that the work by Martinez et al. [101] considered only two types of toxic
content insults and threats whereas the work presented in this paper performs the detection of six
types of toxic content - toxicity, obscene, identity attack, insult, threat, and sexually explicit.
Furthermore, no prior work in this field has performed a gender-specific analysis of tweets about
online learning during COVID-19. As this paper analyzes the tweeting patterns in terms of gender,
the authors would like to clarify three aspects. First, the results presented and discussed in this paper
aim to address the research gaps in this field (as discussed in Section 2). These results are not
presented with the intention to comment on any gender directly or indirectly. Second, the authors
respect the gender identity of every individual and do not intend to comment on the same in any
manner by presenting these results. Third, the authors respect every gender identity and associated
pronouns [126]. The results presented in this paper take into account only three gender categories
male, female, and none as the GenderPerformr package (the current state-of-the-art method that
predicts gender from usernames) has limitations.
Table 2. A comparative study of this work with prior works in this field in terms of focus areas.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
39
Work
Text Analysis of
Tweets about Online
Learning during
COVID-19
Sentiment Analysis of
Tweets about Online
Learning during
COVID-19
Analysis of types of
toxic content in
Tweets about Online
Learning during
COVID-19
Subjectivity Analysis
of Tweets about
Online Learning
during COVID-19
Sahir et al. [20]
Althagafi et al. [21]
Ali et al. [22]
Alcober et al. [23]
Remali et al. [24]
Senadhira et al. [25]
Lubis et al. [26]
Arambepola [27]
Isnain et al. [28]
Aljabri et al. [29]
Asare et al. [30]
Mujahid et al. [31]
Al-Obeidat et al. [32]
Waheeb et al. [33]
Rijal et al. [34]
Martinez [36]
Thakur et al.
[this work]
5. Conclusions
To reduce the rapid spread of the SARS-CoV-2 virus, several universities, colleges, and schools
across the world transitioned to online learning. This was associated with a range of emotions in
students, educators, and the general public who used social media platforms such as Twitter during
this time to share and exchange information, views, and perspectives related to online learning
leading to the generation of Big Data. Twitter has been popular amongst researchers from different
domains for the investigation of patterns of public discourse related to different topics. Furthermore,
out of several social media platforms, Twitter has the highest gender gap as of 2023. There have been
a few works published in the last few months where sentiment analysis of tweets about online
learning during COVID-19 was performed. However, those works have multiple limitations centered
around a lack of reporting from multiple sentiment analysis approaches, a lack of focus on
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
40
subjectivity analysis, a lack of focus on toxicity analysis, and a lack of focus on gender-specific
tweeting patterns. The work presented in this paper aims to address these research gaps as well as
aims to contribute towards advancing research and development in this field. A dataset comprising
about 50,000 Tweets about online learning during COVID-19, posted on Twitter between November
9, 2021, and July 13, 2022, was analyzed for this study. This work reports multiple novel findings.
First, the results of sentiment analysis from VADER, Afinn, and TextBlob show that a higher
percentage of the tweets were positive. The results of gender-specific sentiment analysis indicate that
for positive tweets, negative tweets, and neutral tweets, between males and females, males posted a
higher percentage of the tweets. Second, the results from subjectivity analysis show that the
percentage of least opinionated, neutral opinionated, and highly opinionated tweets were 56.568%,
30.898%, and 12.534%, respectively. The gender-specific results for subjectivity analysis show that for
each subjectivity class (least opinionated, neutral opinionated, and highly opinionated) males posted
a higher percentage of tweets as compared to females. Third, toxicity detection was applied to the
tweets to detect different categories of toxic content - toxicity, obscene, identity attack, insult, threat,
and sexually explicit. The gender-specific analysis of the percentage of tweets posted by each gender
in each of these categories revealed several novel insights. For instance, males posted a higher
percentage of tweets that were categorized as toxicity, obscene, identity attack, insult, and threat, as
compared to females. However, for the sexually explicit category, females posted a higher percentage
of tweets as compared to males. Fourth, gender-specific tweeting patterns for each of these categories
of toxic content were analyzed to understand the trends of the same. These results unraveled multiple
paradigms of tweeting behavior of different genders in the context of online learning during COVID-
19. For instance, the results show that the intensity of toxicity and threat in tweets by males and
females has increased since July 2022. To add to this, the intensity of obscene content in tweets by
males and females has decreased since May 2022. Fifth, the average activity of males and females per
month in this time range was also investigated. The findings indicate that the average activity of
females has been higher in all months as compared to males other than March 2022. Finally, country-
specific tweeting patterns of males and females were also investigated which presented multiple
novel insights. For instance, in India, a higher percentage of tweets about online learning during
COVID-19 were posted by males as compared to females. However, in Australia, a higher percentage
of such tweets were posted by females as compared to males. As per the best knowledge of the
authors, no similar work has been done in this field thus far. Future work in this area would involve
performing gender-specific topic modeling to investigate the similarities and differences in terms of
the topics that have been represented in the tweets posted by males and females.
Author Contributions: Conceptualization, N.T.; methodology, N.T., S.C, K.K, Y.N.D.; software, N.T., S.C, K.K,
Y.N.D., M.S.; validation, N.T.; formal analysis, N.T., K.K, S.C, Y.N.D., V.K.; investigation, N.T., K.K, S.C, Y.N.D.;
resources, N.T., K.K, S.C, Y.N.D.; data curation, N.T and S.Q.; writingoriginal draft preparation, N.T., V.K.,
K.K, M.S., Y.N.D, S.C; writingreview and editing, N.T.; visualization, N.T., S.C, K.K, Y.N.D.; supervision, N.T.;
project administration, N.T.; funding acquisition, Not Applicable.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data analyzed in this study are publicly available at
https://doi.org/10.5281/zenodo.6837118.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Shi, Y.; Wang, G.; Cai, X.-P.; Deng, J.-W.; Zheng, L.; Zhu, H.-H.; Zheng, M.; Yang, B.; Chen, Z. An Overview
of COVID-19. J. Zhejiang Univ. Sci. B 2020, 21, 343–360, doi:10.1631/jzus.b2000083.
2. Fauci, A.S.; Lane, H.C.; Redfield, R.R. Covid-19 Navigating the Uncharted. N. Engl. J. Med. 2020, 382,
1268–1269, doi:10.1056/nejme2002387.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
41
3. Alanagreh, L.; Alzoughool, F.; Atoum, M. The Human Coronavirus Disease COVID-19: Its Origin,
Characteristics, and Insights into Potential Drugs and Its Mechanisms. Pathogens 2020, 9, 331,
doi:10.3390/pathogens9050331.
4. Ciotti, M.; Ciccozzi, M.; Terrinoni, A.; Jiang, W.-C.; Wang, C.-B.; Bernardini, S. The COVID-19
Pandemic. Crit. Rev. Clin. Lab. Sci. 2020, 57, 365388, doi:10.1080/10408363.2020.1783198.
5. Cucinotta, D.; Vanelli, M. WHO Declares COVID-19 a Pandemic. Acta Bio Medica: Atenei Parmensis 2020, 91,
157, doi:10.23750/abm.v91i1.9397.
6. WHO Coronavirus (COVID-19) Dashboard Available online: https://covid19.who.int/ (accessed on 26
September 2023).
7. Allen, D.W. Covid-19 Lockdown Cost/Benefits: A Critical Assessment of the Literature. Int. J. Econ.
Bus. 2022, 29, 132, doi:10.1080/13571516.2021.1976051.
8. Kumar, V.; Sharma, D. E-Learning Theories, Components, and Cloud Computing-Based Learning
Platforms. Int. J. Web-based Learn. Teach. Technol. 2021, 16, 116, doi:10.4018/ijwltt.20210501.oa1.
9. Muñoz-Najar, A.; Gilberto, A.; Hasan, A.; Cobo, C.; Azevedo, J.P.; Akmal, M. Remote Learning during
COVID-19: Lessons from Today, Principles for Tomorrow. World Bank 2021.
10. Simamora, R.M.; De Fretes, D.; Purba, E.D.; Pasaribu, D. Practices, Challenges, and Prospects of Online
Learning during Covid-19 Pandemic in Higher Education: Lecturer Perspectives. Stud. Learn. Teach. 2020, 1,
185–208, doi:10.46627/silet.v1i3.45.
11. DeNardis, L. The Internet in Everything; Yale University Press: New Haven, CT, 2020; ISBN 9780300233070.
12. Gruzd, A.; Haythornthwaite, C. Enabling Community through Social Media. J. Med. Internet Res. 2013, 15,
e248, doi:10.2196/jmir.2796.
13. Belle Wong, J.D. Top Social Media Statistics and Trends of 2023 Available online:
https://www.forbes.com/advisor/business/social-media-statistics/ (accessed on 26 September 2023).
14. Morgan-Lopez, A.A.; Kim, A.E.; Chew, R.F.; Ruddle, P. Predicting Age Groups of Twitter Users Based on
Language and Metadata Features. PLoS One 2017, 12, e0183537, doi:10.1371/journal.pone.0183537.
15. Zhou, P.; Yang, X.-L.; Wang, X.-G.; Hu, B.; Zhang, L.; Zhang, W.; Si, H.-R.; Zhu, Y.; Li, B.; Huang, C.-L.; et
al. A Pneumonia Outbreak Associated with a New Coronavirus of Probable Bat Origin. Nature 2020, 579,
270–273, doi:10.1038/s41586-020-2012-7.
16. Zhou, P.; Yang, X.-L.; Wang, X.-G.; Hu, B.; Zhang, L.; Zhang, W.; Si, H.-R.; Zhu, Y.; Li, B.; Huang, C.-L.; et
al. Discovery of a Novel Coronavirus Associated with the Recent Pneumonia Outbreak in Humans and Its
Potential Bat Origin. bioRxiv 2020.
17. Wrapp, D.; Wang, N.; Corbett, K.S.; Goldsmith, J.A.; Hsieh, C.-L.; Abiona, O.; Graham, B.S.; McLellan, J.S.
Cryo-EM Structure of the 2019-NCoV Spike in the Prefusion Conformation. Science 2020, 367, 1260–1263,
doi:10.1126/science.abb2507.
18. Huang, Q.; Herrmann, A. Fast Assessment of Human Receptor-Binding Capability of 2019 Novel
Coronavirus (2019-NCoV). bioRxiv 2020.
19. Çalıca Utku, A.; Budak, G.; Karabay, O.; Güçlü, E.; Okan, H.D.; Vatan, A. Main Symptoms in Patients
Presenting in the COVID-19 Period. Scott. Med. J. 2020, 65, 127–132, doi:10.1177/0036933020949253.
20. Larsen, J.R.; Martin, M.R.; Martin, J.D.; Kuhn, P.; Hicks, J.B. Modeling the Onset of Symptoms of COVID-
19. Front. Public Health 2020, 8, doi:10.3389/fpubh.2020.00473.
21. Dial, #infinite The Infinite Dial 2022 Available online: http://www.edisonresearch.com/wp-
content/uploads/2022/03/Infinite-Dial-2022-Webinar-revised.pdf (accessed on 26 September 2023).
22. Twitter ‘Lurkers’ Follow and Are Followed by Fewer Accounts Available online:
https://www.pewresearch.org/short-reads/2022/03/16/5-facts-about-twitter-lurkers/ft_2022-03-
16_twitterlurkers_03/ (accessed on 26 September 2023).
23. Shewale, R. Twitter Statistics in 2023 (Facts after “X” Rebranding) Available online:
https://www.demandsage.com/twitter-statistics/ (accessed on 26 September 2023).
24. Lin, Y. Number of Twitter Users in the US [Aug 2023 Update] Available online:
https://www.oberlo.com/statistics/number-of-twitter-users-in-the-us (accessed on 26 September 2023).
25. Twitter: Distribution of Global Audiences 2021, by Age Group Available online:
https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users/ (accessed on 26
September 2023).
26. Feger, A. TikTok Screen Time Will Approach 60 Minutes a Day for US Adult Users Available online:
https://www.insiderintelligence.com/content/tiktok-screen-time-will-approach-60-minutes-day-us-adult-
users/ (accessed on 26 September 2023).
27. Hootsuite Inc Digital Trends - Digital Marketing Trends 2022 Available online:
https://www.hootsuite.com/resources/digital-trends (accessed on 26 September 2023).
28. Demographic Profiles and Party of Regular Social Media News Users in the U.S Available online:
https://www.pewresearch.org/journalism/2021/01/12/news-use-across-social-media-platforms-in-
2020/pj_2021-01-12_news-social-media_0-04/ (accessed on 26 September 2023).
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
42
29. Countries with Most X/Twitter Users 2023 Available online:
https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/ (accessed
on 26 September 2023).
30. Kemp, S. Twitter Users, Stats, Data, Trends, and More DataReportal Global Digital Insights Available
online: https://datareportal.com/essential-twitter-stats (accessed on 26 September 2023).
31. Singh, C. 60+ Twitter Statistics to Skyrocket Your Branding in 2023 Available online:
https://www.socialpilot.co/blog/twitter-statistics (accessed on 26 September 2023).
32. Albrecht, S.; Lutz, B.; Neumann, D. The Behavior of Blockchain Ventures on Twitter as a Determinant for
Funding Success. Electron. Mark. 2020, 30, 241257, doi:10.1007/s12525-019-00371-w.
33. Kraaijeveld, O.; De Smedt, J. The Predictive Power of Public Twitter Sentiment for Forecasting
Cryptocurrency Prices. J. Int. Financ. Mark. Inst. Money 2020, 65, 101188, doi:10.1016/j.intfin.2020.101188.
34. Saura, J.R.; Palacios-Marqués, D.; Ribeiro-Soriano, D. Using Data Mining Techniques to Explore Security
Issues in Smart Living Environments in Twitter. Comput. Commun. 2021, 179, 285–295,
doi:10.1016/j.comcom.2021.08.021.
35. Mubin, O.; Khan, A.; Obaid, M. #naorobot: Exploring Nao Discourse on Twitter. In Proceedings of the
Proceedings of the 28th Australian Conference on Computer-Human Interaction - OzCHI ’16; ACM Press:
New York, New York, USA, 2016.
36. Siapera, E.; Hunt, G.; Lynn, T. #GazaUnderAttack: Twitter, Palestine and Diffused War. Inf. Commun.
Soc. 2015, 18, 1297–1319, doi:10.1080/1369118x.2015.1070188.
37. Chen, E.; Ferrara, E. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the
War between Ukraine and Russia. Proceedings of the International AAAI Conference on Web and Social
Media 2023, 17, 1006–1013, doi:10.1609/icwsm.v17i1.22208.
38. Tao, W.; Peng, Y. Differentiation and Unity: A Cross-Platform Comparison Analysis of Online Posts’
Semantics of the RussianUkrainian War Based on Weibo and Twitter. Commun. Public 2023, 8, 105–124,
doi:10.1177/20570473231165563.
39. Jongman, B.; Wagemaker, J.; Romero, B.; de Perez, E. Early Flood Detection for Rapid Humanitarian
Response: Harnessing near Real-Time Satellite and Twitter Signals. ISPRS Int. J. Geoinf. 2015, 4, 2246–2266,
doi:10.3390/ijgi4042246.
40. Madichetty, S.; Muthukumarasamy, S.; Jayadev, P. Multi-Modal Classification of Twitter Data during
Disasters for Humanitarian Response. J. Ambient Intell. Humaniz. Comput. 2021, 12, 10223–10237,
doi:10.1007/s12652-020-02791-5.
41. Dimitrova, D.; Heidenreich, T.; Georgiev, T.A. The Relationship between Humanitarian NGO
Communication and User Engagement on Twitter. New Media Soc. 2022, 146144482210889,
doi:10.1177/14614448221088970.
42. Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Twitter, C.P.T.; Pages, 447 Twitter and Society Available online:
https://journals.uio.no/TJMI/article/download/825/746/3768 (accessed on 26 September 2023).
43. Olza, I.; Koller, V.; Ibarretxe-Antuñano, I.; Pérez-Sobrino, P.; Semino, E. The #ReframeCovid Initiative:
From Twitter to Society via Metaphor. Metaphor Soc. World 2021, 11, 98–120, doi:10.1075/msw.00013.olz.
44. Li, M.; Turki, N.; Izaguirre, C.R.; DeMahy, C.; Thibodeaux, B.L.; Gage, T. Twitter as a Tool for Social
Movement: An Analysis of Feminist Activism on Social Media Communities. J. Community Psychol. 2021, 49,
854–868, doi:10.1002/jcop.22324.
45. Hargittai, E.; Walejko, G. THE PARTICIPATION DIVIDE: Content Creation and Sharing in the Digital
Age1. Inf. Commun. Soc. 2008, 11, 239–256, doi:10.1080/13691180801946150.
46. Trevor, M.C. Political Socialization, Party Identification, and the Gender Gap. Public Opin. Q. 1999, 63, 62–
89.
47. Verba, S.; Schlozman, K.L.; Brady, H.E. Voice and Equality: Civic Voluntarism in American Politics; Harvard
University Press: London, England, 1995; ISBN 9780674942936.
48. Bode, L. Closing the Gap: Gender Parity in Political Engagement on Social Media. Inf. Commun.
Soc. 2017, 20, 587603, doi:10.1080/1369118x.2016.1202302.
49. Lutz, C.; Hoffmann, C.P.; Meckel, M. Beyond Just Politics: A Systematic Literature Review of Online
Participation. First Monday 2014, doi:10.5210/fm.v19i7.5260.
50. Strandberg, K. A Social Media Revolution or Just a Case of History Repeating Itself? The Use of Social
Media in the 2011 Finnish Parliamentary Elections. New Media Soc. 2013, 15, 1329–1347,
doi:10.1177/1461444812470612.
51. Vochocová, L.; Štětka, V.; Mazák, J. Good Girls Don’t Comment on Politics? Gendered Character of Online
Political Participation in the Czech Republic. Inf. Commun. Soc. 2016, 19, 1321–1339,
doi:10.1080/1369118x.2015.1088881.
52. Gil de Zúñiga, H.; Veenstra, A.; Vraga, E.; Shah, D. Digital Democracy: Reimagining Pathways to Political
Participation. J. Inf. Technol. Politics 2010, 7, 36–51, doi:10.1080/19331680903316742.
53. Vissers, S.; Stolle, D. The Internet and New Modes of Political Participation: Online versus Offline
Participation. Inf. Commun. Soc. 2014, 17, 937–955, doi:10.1080/1369118x.2013.867356.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
43
54. Vesnic-Alujevic, L. Political Participation and Web 2.0 in Europe: A Case Study of Facebook. Public Relat.
Rev. 2012, 38, 466470, doi:10.1016/j.pubrev.2012.01.010.
55. Krasnova, H.; Veltri, N.F.; Eling, N.; Buxmann, P. Why Men and Women Continue to Use Social
Networking Sites: The Role of Gender Differences. J. Strat. Inf. Syst. 2017, 26, 261–284,
doi:10.1016/j.jsis.2017.01.004.
56. Social Media Fact Sheet Available online: https://www.pewresearch.org/internet/fact-sheet/social-
media/?tabId=tab-45b45364-d5e4-4f53-bf01-b77106560d4c (accessed on 26 September 2023).
57. Global WhatsApp User Distribution by Gender 2023 Available online:
https://www.statista.com/statistics/1305750/distribution-whatsapp-users-by-gender/ (accessed on 26
September 2023).
58. Sina Weibo: User Gender Distribution 2022 Available online:
https://www.statista.com/statistics/1287809/sina-weibo-user-gender-distibution-worldwide/ (accessed on
26 September 2023).
59. QQ: User Gender Distribution 2022 Available online: https://www.statista.com/statistics/1287794/qq-user-
gender-distibution-worldwide/ (accessed on 26 September 2023).
60. Samanta, O. Telegram Revenue & User Statistics 2023 Available online:
https://prioridata.com/data/telegram-statistics/ (accessed on 26 September 2023).
61. Shewale, R. 36 Quora Statistics: All-Time Stats & Data (2023) Available online:
https://www.demandsage.com/quora-statistics/ (accessed on 26 September 2023).
62. Gitnux The Most Surprising Tumblr Statistics and Trends in 2023 Available online:
https://blog.gitnux.com/tumblr-statistics/ (accessed on 26 September 2023).
63. Social Media User Diversity Statistics Available online: https://blog.hootsuite.com/wp-
content/uploads/2023/03/twitter-stats-4.jpg (accessed on 26 September 2023).
64. WeChat: User Gender Distribution 2022 Available online:
https://www.statista.com/statistics/1287786/wechat-user-gender-distibution-worldwide/ (accessed on 26
September 2023).
65. Global Snapchat User Distribution by Gender 2023 Available online:
https://www.statista.com/statistics/326460/snapchat-global-gender-group/ (accessed on 26 September
2023).
66. Villavicencio, C.; Macrohon, J.J.; Inbaraj, X.A.; Jeng, J.-H.; Hsieh, J.-G. Twitter Sentiment Analysis towards
COVID-19 Vaccines in the Philippines Using Naïve Bayes. Information (Basel) 2021, 12, 204,
doi:10.3390/info12050204.
67. Boon-Itt, S.; Skunkan, Y. Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and
Topic Modeling Study. JMIR Public Health Surveill. 2020, 6, e21978, doi:10.2196/21978.
68. Marcec, R.; Likic, R. Using Twitter for Sentiment Analysis towards AstraZeneca/Oxford, Pfizer/BioNTech
and Moderna COVID-19 Vaccines. Postgrad. Med. J. 2022, 98, 544–550, doi:10.1136/postgradmedj-2021-
140685.
69. Machuca, C.R.; Gallardo, C.; Toasa, R.M. Twitter Sentiment Analysis on Coronavirus: Machine Learning
Approach. J. Phys. Conf. Ser. 2021, 1828, 012104, doi:10.1088/1742-6596/1828/1/012104.
70. Kruspe, A.; Häberle, M.; Kuhn, I.; Zhu, X.X. Cross-Language Sentiment Analysis of European Twitter
Messages Duringthe COVID-19 Pandemic. arXiv [cs.SI] 2020.
71. Vijay, T.; Chawla, A.; Dhanka, B.; Karmakar, P. Sentiment Analysis on COVID-19 Twitter Data. In
Proceedings of the 2020 5th IEEE International Conference on Recent Advances and Innovations in
Engineering (ICRAIE); IEEE, 2020.
72. Shofiya, C.; Abidi, S. Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter
Data. Int. J. Environ. Res. Public Health 2021, 18, 5993, doi:10.3390/ijerph18115993.
73. Sontayasara, T.; Jariyapongpaiboon, S.; Promjun, A.; Seelpipat, N.; Saengtabtim, K.; Tang, J.; Leelawat, N.;
Department of Industrial Engineering, Faculty of Engineering, Chulalongkorn University 254 Phayathai
Road, Pathumwan, Bangkok 10330, Thailand; International School of Engineering, Faculty of Engineering,
Chulalongkorn University, Bangkok, Thailand; Disaster and Risk Management Information Systems
Research Group, Chulalongkorn University, Bangkok, Thailand Twitter Sentiment Analysis of Bangkok
Tourism during COVID-19 Pandemic Using Support Vector Machine Algorithm. J. Disaster Res. 2021, 16,
24–30, doi:10.20965/jdr.2021.p0024.
74. Nemes, L.; Kiss, A. Social Media Sentiment Analysis Based on COVID-19. J. Inf. Telecommun. 2021, 5, 1–15,
doi:10.1080/24751839.2020.1790793.
75. Okango, E.; Mwambi, H. Dictionary Based Global Twitter Sentiment Analysis of Coronavirus (COVID-19)
Effects and Response. Ann. Data Sci. 2022, 9, 175186, doi:10.1007/s40745-021-00358-5.
76. Singh, C.; Imam, T.; Wibowo, S.; Grandhi, S. A Deep Learning Approach for Sentiment Analysis of COVID-
19 Reviews. Appl. Sci. (Basel) 2022, 12, 3709, doi:10.3390/app12083709.
77. Kaur, H.; Ahsaan, S.U.; Alankar, B.; Chang, V. A Proposed Sentiment Analysis Deep Learning Algorithm
for Analyzing COVID-19 Tweets. Inf. Syst. Front. 2021, 23, 1417–1429, doi:10.1007/s10796-021-10135-7.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
44
78. Vernikou, S.; Lyras, A.; Kanavos, A. Multiclass Sentiment Analysis on COVID-19-Related Tweets Using
Deep Learning Models. Neural Comput. Appl. 2022, 34, 19615–19627, doi:10.1007/s00521-022-07650-2.
79. Sharma, S.; Sharma, A. Twitter Sentiment Analysis during Unlock Period of COVID-19. In Proceedings of
the 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC); IEEE, 2020;
pp. 221–224.
80. Sanders, A.C.; White, R.C.; Severson, L.S.; Ma, R.; McQueen, R.; Alcântara Paulo, H.C.; Zhang, Y.; Erickson,
J.S.; Bennett, K.P. Unmasking the Conversation on Masks: Natural Language Processing for Topical
Sentiment Analysis of COVID-19 Twitter Discourse. AMIA Summits on Translational Science
Proceedings 2021, 2021, 555.
81. Alabid, N.N.; Katheeth, Z.D. Sentiment Analysis of Twitter Posts Related to the COVID-19
Vaccines. Indones. J. Electr. Eng. Comput. Sci. 2021, 24, 1727, doi:10.11591/ijeecs.v24.i3.pp1727-1734.
82. Mansoor, M.; Gurumurthy, K.; U, Anantharam R; Prasad, V.R.B. Global Sentiment Analysis of COVID-19
Tweets over Time. arXiv [cs.CL] 2020.
83. Singh, M.; Jakhar, A.K.; Pandey, S. Sentiment Analysis on the Impact of Coronavirus in Social Life Using
the BERT Model. Soc. Netw. Anal. Min. 2021, 11, doi:10.1007/s13278-021-00737-z.
84. Imamah; Rachman, F.H. Twitter Sentiment Analysis of Covid-19 Using Term Weighting TF-IDF and
Logistic Regresion. In Proceedings of the 2020 6th Information Technology International Seminar (ITIS);
IEEE, 2020; pp. 238–242.
85. Sahir, S.H.; Ayu Ramadhana, R.S.; Romadhon Marpaung, M.F.; Munthe, S.R.; Watrianthos, R. Online
Learning Sentiment Analysis during the Covid-19 Indonesia Pandemic Using Twitter Data. IOP Conf. Ser.
Mater. Sci. Eng. 2021, 1156, 012011, doi:10.1088/1757-899x/1156/1/012011.
86. Althagafi, A.; Althobaiti, G.; Alhakami, H.; Alsubait, T. Arabic Tweets Sentiment Analysis about Online
Learning during COVID-19 in Saudi Arabia. Int. J. Adv. Comput. Sci. Appl. 2021, 12,
doi:10.14569/ijacsa.2021.0120373.
87. Ali, M.M. Arabic Sentiment Analysis about Online Learning to Mitigate Covid-19. J. Intell. Syst. 2021, 30,
524–540, doi:10.1515/jisys-2020-0115.
88. Alcober, G.M.I.; Revano, T.F. Twitter Sentiment Analysis towards Online Learning during COVID-19 in
the Philippines. In Proceedings of the 2021 IEEE 13th International Conference on Humanoid,
Nanotechnology, Information Technology, Communication and Control, Environment, and Management
(HNICEM); IEEE, 2021.
89. Remali, N.A.S.; Shamsuddin, M.R.; Abdul-Rahman, S. Sentiment Analysis on Online Learning for Higher
Education during Covid-19. In Proceedings of the 2022 3rd International Conference on Artificial
Intelligence and Data Sciences (AiDAS); IEEE, 2022; pp. 142–147.
90. Senadhira, K.I.; Rupasingha, R.A.H.M.; Kumara, B.T.G.S. Sentiment Analysis on Twitter Data Related to
Online Learning during the Covid-19 Pandemic Available online:
http://repository.kln.ac.lk/handle/123456789/25416 (accessed on 27 September 2023).
91. Lubis, A.R.; Prayudani, S.; Lubis, M.; Nugroho, O. Sentiment Analysis on Online Learning during the
Covid-19 Pandemic Based on Opinions on Twitter Using KNN Method. In Proceedings of the 2022 1st
International Conference on Information System & Information Technology (ICISIT); IEEE, 2022; pp. 106
111.
92. Arambepola, N. Analysing the Tweets about Distance Learning during COVID-19 Pandemic Using
Sentiment Analysis Available online: https://fct.kln.ac.lk/media/pdf/proceedings/ICACT-2020/F-7.pdf
(accessed on 27 September 2023).
93. Isnain, A.R.; Supriyanto, J.; Kharisma, M.P. Implementation of K-Nearest Neighbor (K-NN) Algorithm for
Public Sentiment Analysis of Online Learning. IJCCS 2021, 15, 121, doi:10.22146/ijccs.65176.
94. Aljabri, M.; Chrouf, S.M.B.; Alzahrani, N.A.; Alghamdi, L.; Alfehaid, R.; Alqarawi, R.; Alhuthayfi, J.;
Alduhailan, N. Sentiment Analysis of Arabic Tweets Regarding Distance Learning in Saudi Arabia during
the COVID-19 Pandemic. Sensors (Basel) 2021, 21, 5431, doi:10.3390/s21165431.
95. Asare, A.O.; Yap, R.; Truong, N.; Sarpong, E.O. The Pandemic Semesters: Examining Public Opinion
Regarding Online Learning amidst COVID-19. J. Comput. Assist. Learn. 2021, 37, 1591–1605,
doi:10.1111/jcal.12574.
96. Mujahid, M.; Lee, E.; Rustam, F.; Washington, P.B.; Ullah, S.; Reshi, A.A.; Ashraf, I. Sentiment Analysis and
Topic Modeling on Tweets about Online Education during COVID-19. Appl. Sci. (Basel) 2021, 11, 8438,
doi:10.3390/app11188438.
97. Al-Obeidat, F.; Ishaq, M.; Shuhaiber, A.; Amin, A. Twitter Sentiment Analysis to Understand Students’
Perceptions about Online Learning during the Covid’19. In Proceedings of the 2022 International
Conference on Computer and Applications (ICCA); IEEE, 2022; Vol. 00, p. 1.
98. Waheeb, S.A.; Khan, N.A.; Shang, X. Topic Modeling and Sentiment Analysis of Online Education in the
COVID-19 Era Using Social Networks Based Datasets. Electronics (Basel) 2022, 11, 715,
doi:10.3390/electronics11050715.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
45
99. Integrating Information Gain Methods for Feature Selection in Distance Education Sentiment Analysis
during Covid-19. TEM J. 2023, 12, 285–290.
100. Fauzan, M.; Setiawan, T. Acts of Hate Speech in News on Twitter Related to Covid-19 Available online:
http://icollate.uny.ac.id/sites/icollate.uny.ac.id/files/download-file/PROCEEDING%20ICOLLATE-
4%202021-Muhammad%20Fauzan1%2C.pdf (accessed on 27 September 2023).
101. Martinez, M.A. What Do People Write about COVID-19 and Teaching, Publicly? Insulators and Threats to
Newly Habituated and Institutionalized Practices for Instruction. PLoS One 2022, 17, e0276511,
doi:10.1371/journal.pone.0276511.
102. Thakur, N. A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19
Omicron Wave. Data (Basel) 2022, 7, 109, doi:10.3390/data7080109.
103. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten,
J.-W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for Scientific Data Management
and Stewardship. Sci. Data 2016, 3, 19, doi:10.1038/sdata.2016.18.
104. Genderperformr Available online: https://pypi.org/project/genderperformr/ (accessed on 27 September
2023).
105. Wang, Z.; Jurgens, D. It’s Going to Be Okay: Measuring Access to Support in Online Communities. In
Proceedings of the Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 3345.
106. Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media
Text. Proceedings of the International AAAI Conference on Web and Social Media 2014, 8, 216–225,
doi:10.1609/icwsm.v8i1.14550.
107. Nielsen, F.Å. A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs. arXiv
[cs.IR] 2011.
108. TextBlob Available online: https://media.readthedocs.org/pdf/textblob/latest/textblob.pdf (accessed on 27
September 2023).
109. Jumanto, J.; Muslim, M.A.; Dasril, Y.; Mustaqim, T. Accuracy of Malaysia Public Response to Economic
Factors during the Covid-19 Pandemic Using Vader and Random Forest. J. Inf. Syst. Explor. Res. 2022, 1, 49–
70, doi:10.52465/joiser.v1i1.104.
110. Bose, D.R.; Aithal, P.S.; Roy, S. Survey of Twitter Viewpoint on Application of Drugs by VADER Sentiment
Analysis among Distinct Countries 2021.
111. Borg, A.; Boldt, M. Using VADER Sentiment and SVM for Predicting Customer Response Sentiment. Expert
Syst. Appl. 2020, 162, 113746, doi:10.1016/j.eswa.2020.113746.
112. Newman, H.; Joyner, D. Sentiment Analysis of Student Evaluations of Teaching. In Lecture Notes in
Computer Science; Springer International Publishing: Cham, 2018; pp. 246–250 ISBN 9783319938455.
113. Gan, Q.; Yu, Y. Restaurant Rating: Industrial Standard and Word-of-Mouth -- A Text Mining and Multi-
Dimensional Sentiment Analysis. In Proceedings of the 2015 48th Hawaii International Conference on
System Sciences; IEEE, 2015.
114. Gabarron, E.; Dechsling, A.; Skafle, I.; Nordahl-Hansen, A. Discussions of Asperger Syndrome on Social
Media: Content and Sentiment Analysis on Twitter. JMIR Form. Res. 2022, 6, e32752, doi:10.2196/32752.
115. Lee, I.T.-L.; Juang, S.-E.; Chen, S.T.; Ko, C.; Ma, K.S.-K. Sentiment Analysis of Tweets on Alopecia Areata,
Hidradenitis Suppurativa, and Psoriasis: Revealing the Patient Experience. Front. Med. (Lausanne) 2022, 9,
doi:10.3389/fmed.2022.996378.
116. Nalisnick, E.T.; Baird, H.S. Character-to-Character Sentiment Analysis in Shakespeare’s Plays Available
online: https://aclanthology.org/P13-2085.pdf (accessed on 27 September 2023).
117. Hazarika, D.; Konwar, G.; Deb, S.; Bora, D.J. Sentiment Analysis on Twitter by Using TextBlob for Natural
Language Processing. In Proceedings of the Annals of Computer Science and Information Systems; PTI,
2020; Vol. 24.
118. Mas Diyasa, I.G.S.; Marini Mandenni, N.M.I.; Fachrurrozi, M.I.; Pradika, S.I.; Nur Manab, K.R.; Sasmita,
N.R. Twitter Sentiment Analysis as an Evaluation and Service Base On Python Textblob. IOP Conf. Ser.
Mater. Sci. Eng. 2021, 1125, 012034, doi:10.1088/1757-899x/1125/1/012034.
119. Mansouri, N.; Soui, M.; Alhassan, I.; Abed, M. TextBlob and BiLSTM for Sentiment Analysis toward
COVID-19 Vaccines. In Proceedings of the 2022 7th International Conference on Data Science and Machine
Learning Applications (CDMA); IEEE, 2022.
120. Hermansyah, R.; Sarno, R. Sentiment Analysis about Product and Service Evaluation of PT Telekomunikasi
Indonesia Tbk from Tweets Using TextBlob, Naive Bayes & K-NN Method. In Proceedings of the 2020
International Seminar on Application for Technology of Information and Communication (iSemantic);
IEEE, 2020.
121. Detoxify Available online: https://pypi.org/project/detoxify/ (accessed on 27 September 2023).
122. Jigsaw Unintended Bias in Toxicity Classification Available online: https://www.kaggle.com/c/jigsaw-
unintended-bias-in-toxicity-classification (accessed on 27 September 2023).
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
46
123. Jigsaw Multilingual Toxic Comment Classification Available online: https://www.kaggle.com/c/jigsaw-
multilingual-toxic-comment-classification (accessed on 27 September 2023).
124. Toxic Comment Classification Challenge Available online: https://www.kaggle.com/c/jigsaw-toxic-
comment-classification-challenge (accessed on 27 September 2023).
125. Sharma, S.; Gupta, V. Role of Twitter User Profile Features in Retweet Prediction for Big Data
Streams. Multimed. Tools Appl. 2022, 81, 27309–27338, doi:10.1007/s11042-022-12815-1.
126. Zambon, V. Gender Identity Available online: https://www.medicalnewstoday.com/articles/types-of-
gender-identity (accessed on 28 September 2023).
127. Pycountry Available online: https://pypi.org/project/pycountry/ (accessed on 28 September 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those
of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s)
disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or
products referred to in the content.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 October 2023 doi:10.20944/preprints202310.0157.v1
Article
Full-text available
The work presented in this paper makes multiple scientific contributions related to the investigation of the global fear associated with COVID-19 by performing a comprehensive analysis of a dataset comprising survey responses of participants from 40 countries. First, the results of subjectivity analysis performed using TextBlob, showed that in the responses where participants indicated their biggest concern related to COVID-19, the average subjectivity by the age group of 41–50 decreased from April 2020 to June 2020, the average subjectivity by the age group of 71–80 drastically increased from May 2020, and the age group of 11–20 indicated the least level of subjectivity between June 2020 to August 2020. Second, subjectivity analysis also revealed the percentage of highly opinionated, neutral opinionated, and least opinionated responses per age-group where the analyzed age groups were 11–20, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, and 81–90. For instance, the percentage of highly opinionated, neutral opinionated, and least opinionated responses by the age group of 11–20 were 17.92%, 16.24%, and 65.84%, respectively. Third, data analysis of responses from different age groups showed that the highest percentage of responses indicating that they were very worried about COVID-19 came from individuals in the age group of 21–30. Fourth, data analysis of the survey responses also revealed that in the context of taking precautions to prevent contracting COVID-19, the percentage of individuals in the age group of 31–40 taking precautions was higher as compared to the percentages of individuals from the age groups of 41–50, 51–60, 61–70, 71–80, and 81–90. Fifth, a deep learning model was developed to detect if the survey respondents were seeing or planning to see a psychologist or psychiatrist for any mental health issues related to COVID-19. The design of the deep learning model comprised 8 neurons for the input layer with the ReLU activation function, the ReLU activation function for all the hidden layers with 12 neurons each, and the sigmoid activation function for the output layer with 1 neuron. The model utilized the responses to multiple questions in the context of fear and preparedness related to COVID-19 from the dataset and achieved an accuracy of 91.62% after 500 epochs. Finally, two comparative studies with prior works in this field are presented to highlight the novelty and scientific contributions of this research work.
Article
Full-text available
Students’ engagement is a fundamental challenge in large classrooms in higher education. In recent years, innovative technologies such as electronic learning and online polling platforms have made learning more engaging, effective, and interactive. By using these platforms, educators can create more inclusive and enriching learning environments. This paper presents a novel approach in which an online technology is employed to enhance students’ learning experience. In this approach, features of an online polling platform, i.e., Slido, are employed to enhance students’ engagement in an engineering module, i.e., ‘Mechanics of Solids’, which is recognised as a fundamentally challenging module with difficult subjects. This study investigates how the interactive features of such technologies, such as real-time polls, question and answer (Q&A) sessions, and quizzes, can provide a more active and practical learning environment by improving student engagement in the classroom. In total, six online polls were designed: one for ice-breaking, two on the topics of shear forces and bending moment, two on stresses, and one on deflection. Each poll was presented to the students, and they participated in them by scanning a QR code or typing the poll’s code online. The rate of students’ participation in polls is extensively discussed to show the effectiveness of the proposed method. The findings of this study show a significant increase in student participation in classroom activities compared to traditional methods. Student feedback also indicates a positive learning experience with the use of the proposed approach. It is shown that the proposed approach has the potential to transform the way engineering students engage with challenging subjects, leading to enhanced learning outcomes and a more positive learning experience.
Article
Full-text available
This study conducted a sentiment analysis of the impact of the Covid-19 pandemic in the economic sector on people's lives through socialmedia Twitter. The analysis was carried out on 23777 tweet datacollected from 13 states in Malaysia from 1 December 2019 to 17 June2020. The research process went through 3 stages, namely pre-processing, labeling, and modeling. The pre-processing stage iscollecting and cleaning data. Labeling in this study uses Vadersentiment polarity detection to provide an assessment of thesentiment of tweet data which is used as training data. The modelingstage means to test the sentiment data using the random forestalgorithm plus the extraction count vectorizer and TF-IDF features aswell as the N-gram selection feature. The test results show that thepolarity of public sentiment in Malaysia is predominantly positive,which is 11,323 positive, 4105 neutral, and 8349 negative based onVader labeling. The accuracy rate from the random forest modelingresults was obtained 93.5 percent with TF-IDF and 1 gram (PDF) Accuracy of Malaysia Public Response to Economic Factors During the Covid-19 Pandemic Using Vader and Random Forest. Available from: https://www.researchgate.net/publication/367190818_Accuracy_of_Malaysia_Public_Response_to_Economic_Factors_During_the_Covid-19_Pandemic_Using_Vader_and_Random_Forest [accessed Feb 25 2023].
Article
Full-text available
Covid represents major changes in teaching across the world. This study examined some of those changes through tweets that contained threats and insulators to habitualization of newer teaching practices. The investigator harvested tweets to determine sentiment differences between teaching and schools and teaching and online. Topic modeling explored the topics in two separate corpora. Omnibus Yuen’s robust bootstrapped t-tests tested for sentiment differences between the two corpora based on emotions such as fear, anger, disgust, etc. Qualitative responses voiced ideas of insulation and threats to teaching modalities institutionalized during the pandemic. The investigator found that ‘teaching and school’ was associated with higher anger, distrust, and negative emotions than ‘teaching and online’ corpus sets. Qualitative responses indicated support for online instruction, albeit complicated by topic modeling concerns with the modality. Some twitter responses criticized government actions as restrictive. The investigator concluded that insulation and threats towards habitualization and institutionalization of newer teaching modalities during covid are rich and sometimes at odds with each other, showing tension at times.
Article
Full-text available
Background Chronic dermatologic disorders can cause significant emotional distress. Sentiment analysis of disease-related tweets helps identify patients’ experiences of skin disease.Objective To analyze the expressed sentiments in tweets related to alopecia areata (AA), hidradenitis suppurativa (HS), and psoriasis (PsO) in comparison to fibromyalgia (FM).Methods This is a cross-sectional analysis of Twitter users’ expressed sentiment on AA, HS, PsO, and FM. Tweets related to the diseases of interest were identified with keywords and hashtags for one month (April, 2022) using the Twitter standard application programming interface (API). Text, account types, and numbers of retweets and likes were collected. The sentiment analysis was performed by the R “tidytext” package using the AFINN lexicon.ResultsA total of 1,505 tweets were randomly extracted, of which 243 (16.15%) referred to AA, 186 (12.36%) to HS, 510 (33.89%) to PsO, and 566 (37.61%) to FM. The mean sentiment score was −0.239 ± 2.90. AA, HS, and PsO had similar sentiment scores (p = 0.482). Although all skin conditions were associated with a negative polarity, their average was significantly less negative than FM (p < 0.0001). Tweets from private accounts were more negative, especially for AA (p = 0.0082). Words reflecting patients’ psychological states varied in different diseases. “Anxiety” was observed in posts on AA and FM but not posts on HS and PsO, while “crying” was frequently used in posts on HS. There was no definite correlation between the sentiment score and the number of retweets or likes, although negative AA tweets from public accounts received more retweets (p = 0.03511) and likes (p = 0.0228).Conclusion The use of Twitter sentiment analysis is a promising method to document patients’ experience of skin diseases, which may improve patient care through bridging misconceptions and knowledge gaps between patients and healthcare professionals.
Conference Paper
Full-text available
Coronavirus Disease of 2019 began in Wuhan in December 2019 and it was declared as a global pandemic by WHO. Until January 2021, it affected all of human activities on earth i.e., experiencing many obstacles from restrictions on activities, closure of tourist attractions to restrictions on face-to-face learning activities in schools or universities. Due to the policy of providing a broad influence on the community with various comments through social media, many twitter users make tweets containing positive and negative comments leading to statements about online learning or daring. The problem is that they contain so many different words, abbreviations, informal language, and symbols, creating difficulties to choose which words or groups of words that can produce positive or negative statements. K-Nearest Neighbors algorithm is used to classify positive and negative tweet data, the results were AUC for class 0: 0.754, 1: 0.635, 2: 0.721 and with a precision classification score of 0.86, recall is 0.85 so that the results of the classification of negative and positive sentences on the online learning tweet data were ROC-AUC of 0.853 and the accuracy value of 0.885.
Article
Full-text available
COVID-19 is an infectious disease with its first recorded cases identified in late 2019, while in March of 2020 it was declared as a pandemic. The outbreak of the disease has led to a sharp increase in posts and comments from social media users, with a plethora of sentiments being found therein. This paper addresses the subject of sentiment analysis, focusing on the classification of users’ sentiment from posts related to COVID-19 that originate from Twitter. The period examined is from March until mid-April of 2020, when the pandemic had thus far affected the whole world. The data is processed and linguistically analyzed with the use of several natural language processing techniques. Sentiment analysis is implemented by utilizing seven different deep learning models based on LSTM neural networks, and a comparison with traditional machine learning classifiers is made. The models are trained in order to distinguish the tweets between three classes, namely negative, neutral and positive.
Article
On February 24, 2022, Russia invaded Ukraine. In the days that followed, reports kept flooding in from laymen to news anchors of a conflict quickly escalating into war. Russia faced immediate backlash and condemnation from the world at large. While the war continues to contribute to an ongoing humanitarian and refugee crisis in Ukraine, a second battlefield has emerged in the online space, both in the use of social media to garner support for both sides of the conflict and also in the context of information warfare. In this paper, we present a collection of nearly half a billion tweets, from February 22, 2022, through January 8, 2023, that we are publishing for the wider research community to use. This dataset can be found at https://github.com/echen102/ukraine-russia. Our preliminary analysis on a subset of our dataset already shows evidence of public engagement with Russian state-sponsored media and other domains that are known to push unreliable information towards the beginning of the war; the former saw a spike in activity on the day of the Russian invasion, while the other saw spikes in engagement within the first month of the war. Our hope is that this public dataset can help the research community to further understand the ever-evolving role that social media plays in information dissemination, influence campaigns, grassroots mobilization, and much more, during a time of conflict.
Article
The current Russian–Ukrainian War sparked a new wave of misinformation across social media. However, there is a lack of cross-platform research approaches around war events. This article followed the path of comparative media analysis. We used a three-step method from Triangulation theory and gathered 309,260 relevant posts from both the Twitter and Weibo platforms. We found that (1) Weibo posts are synchronized with Chinese mainstream media, and there is an “Amplification” phenomenon; Twitter posts are delicate and provocative from the perspective of individual encounters; (2) the topics of Weibo are “wide and scattered,” together to form a panoramic broadcast of the conflict. While topics of Twitter formed a condemnation around the invasion war; (3) the positive and negative emotion volume has gone through three stages: “Confrontation,” “Polarization,” and “Extension” with the development of the war. Finally, though the two social media fields present different characteristics, the call for humanitarianism and peace constitutes the unity of public opinion.
Article
The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.