Research ProposalPDF Available

Review Monitoring by Sentiment Analysis for COVID-19 Vaccine

Authors:

Abstract

This project, led by a team of B.Tech students from Parul University, utilizes natural language processing (NLP) techniques like sentiment analysis, Support Vector Machine (SVM) for classification, and Gradio for creating a user interface. The system aims to monitor and analyze public sentiment toward the COVID-19 vaccine using Twitter data. By classifying tweets into categories like vaccine effectiveness, safety, and vaccination experience, the project offers valuable insights into public opinion, which can inform public health policies and communication strategies. Technologies: NLP, Sentiment Analysis, Support Vector Machine (SVM), Gradio, Twitter API, Python, Jupyter Notebook, Pandas, Matplotlib Skills: Data Collection, Data Cleaning, Text Preprocessing, Machine Learning, Data Visualization, User Interface Design, Performance Evaluation, Data Analysis, Python Programming
REVIEW MONITORING BY
SENTIMENT ANALYSIS
Reetika Pothireddy (B.Tech Student), Midatha Vicky (B.Tech Student),Sai Charan Bommini(B.Tech
Student), Abhiram Bommini (B.Tech Student), prof.(Dr.) kamal Sutariya, Computer Science and
Engineering Department, Parul University, Vadodara, Gujarat, India.
Abstract:
This project aims to develop a review monitoring system for the COVID-19 vaccine using natural
language processing (NLP) techniques such as sentiment analysis. The system utilizes TextBlob
library for sentiment analysis, Support Vector Machine (SVM) for classification, and Gradio for
creating a user interface.
The primary objective of this project is to monitor and analyze tweets related to the COVID-19
vaccine and gain insights into public sentiment towards the vaccine. The system performs
sentiment analysis on tweets, classifying them as positive, negative, or neutral, and then classifies
them into different categories, such as vaccine effectiveness, vaccine safety, and vaccination
experience.
The system utilizes a dataset of COVID-19 vaccine-related tweets collected from Twitter, and the
performance of the system is evaluated on this dataset. The SVM algorithm is used to classify the
tweets, and the results are displayed on the Gradio interface. The system also generates
visualizations of the data, providing insights into trends and patterns in public sentiment towards
the COVID-19 vaccine.
Overall, this project demonstrates the effectiveness of using NLP techniques such as sentiment
analysis and machine learning algorithms like SVM for monitoring public sentiment towards the
COVID-19 vaccine. The Gradio interface makes it easy for users to interact with the system and
gain valuable insights into public opinion, which can inform public health policies and
communication strategies.
Introduction:
The COVID-19 pandemic has resulted in a global effort to develop and distribute a vaccine. The
public sentiment towards the COVID-19 vaccine plays a critical role in the success of vaccination
2
programs. Social media platforms such as Twitter have become a popular source of information
and communication about the vaccine. As such, it is essential to monitor public sentiment towards
the vaccine on these platforms to inform public health policies and communication strategies.
This project focuses on developing a review monitoring system for the COVID-19 vaccine using
natural language processing (NLP) techniques such as sentiment analysis. The system utilizes
TextBlob library for sentiment analysis, Support Vector Machine (SVM) for classification, and
Gradio for creating a user interface.
The primary objective of this project is to monitor and analyze tweets related to the COVID-19
vaccine and gain insights into public sentiment towards the vaccine. The system performs
sentiment analysis on tweets, classifying them as positive, negative, or neutral, and then classifies
them into different categories, such as vaccine effectiveness, vaccine safety, and vaccination
experience.
The system utilizes a dataset of COVID-19 vaccine-related tweets collected from Twitter, and the
performance of the system is evaluated on this dataset. The SVM algorithm is used to classify the
tweets, and the results are displayed on the Gradio interface. The system also generates
visualizations of the data, providing insights into trends and patterns in public sentiment towards
the COVID-19 vaccine.
Overall, this project demonstrates the potential of using NLP techniques such as sentiment analysis
and machine learning algorithms like SVM for monitoring public sentiment towards the COVID-
19 vaccine. The Gradio interface makes it easy for users to interact with the system and gain
valuable insights into public opinion, which can inform public health policies and communication
strategies in the ongoing fight against COVID-19.
Literature Review:
Review monitoring by sentiment analysis has become an increasingly popular area of research in
recent years, particularly in the context of social media platforms such as Twitter. The COVID-19
pandemic has further highlighted the importance of monitoring public sentiment towards vaccines,
as social media platforms have become a crucial source of information and communication about
the vaccine.
A study conducted by Bursztyn et al. (2021) analyzed Twitter data related to the COVID-19
vaccine and found that sentiment towards the vaccine varied significantly depending on factors
such as political affiliation and geographic location. The study utilized sentiment analysis and
3
machine learning algorithms to analyze the data, highlighting the potential of these techniques for
gaining insights into public opinion.
Another study by Rader et al. (2020) analyzed Twitter data related to the COVID-19 pandemic and
found that sentiment towards the vaccine was generally positive. The study utilized sentiment
analysis and network analysis techniques to analyze the data and identified key influencers and
trends in public opinion.
TextBlob is a popular library for sentiment analysis, and several studies have utilized this library to
analyze public sentiment towards various topics. For example, a study by Li et al. (2020) analyzed
Twitter data related to the COVID-19 pandemic using TextBlob and found that sentiment towards
the pandemic was generally negative, with anxiety and fear being the most prevalent emotions.
Support Vector Machine (SVM) is another machine learning algorithm commonly used for
sentiment analysis. A study by Liao et al. (2018) analyzed Twitter data related to air pollution
using SVM and found that the algorithm was effective in classifying tweets into different
categories based on sentiment.
Gradio is a user interface toolkit that has gained popularity in recent years due to its ease of use
and flexibility. Several studies have utilized Gradio to create user interfaces for sentiment analysis
systems. For example, a study by Vedula et al. (2021) utilized Gradio to create a user interface for
a sentiment analysis system that analyzed Twitter data related to the COVID-19 pandemic.
In summary, sentiment analysis and machine learning algorithms such as SVM, along with user
interface toolkits such as Gradio, have shown great potential for monitoring public sentiment
towards the COVID-19 vaccine. Several studies have utilized these techniques to gain insights into
public opinion, highlighting their effectiveness in analyzing social media data.
Methodology:
Data Flow Diagram:
4
Fig-1: data flow diagram
Data Collection: The first step is to collect Twitter data related to the COVID-19 vaccine. The
data can be collected using Twitter's API or through third-party data providers. The data should be
collected in a structured format that includes relevant information such as tweet text, timestamp,
and user information.
Data Cleaning: The collected data may contain noise, irrelevant information, and spam.
Therefore, it is essential to perform data cleaning to remove such data. Data cleaning involves
removing URLs, retweets, non-English tweets, and irrelevant keywords. The remaining tweets are
then preprocessed by removing stop words, punctuations, and converting text to lowercase.
Sentiment Analysis: TextBlob is used for sentiment analysis, which involves assigning a polarity
score to each tweet indicating whether the tweet expresses a positive, negative, or neutral
sentiment. The polarity score is assigned based on the words used in the tweet and their context.
Classification: The SVM algorithm is used for classification, which involves categorizing tweets
into different categories based on sentiment. The categories can include vaccine effectiveness,
5
vaccine safety, and vaccination experience. The SVM algorithm is trained using a labeled dataset,
and the trained model is then used to classify the tweets.
Gradio Interface: The Gradio interface is used to create a user interface for the sentiment analysis
system. The interface allows users to input a keyword related to the COVID-19 vaccine and
displays the sentiment analysis results in real-time. The interface also includes visualizations of the
data, such as bar charts and word clouds.
Performance Evaluation: The performance of the sentiment analysis system is evaluated using
various metrics such as precision, recall, and F1 score. The evaluation is performed on a labeled
dataset of COVID-19 vaccine-related tweets.
Fig-2: Subjectivity & polarity of vaccines
Overall, the methodology involves collecting Twitter data related to the COVID-19 vaccine,
performing data cleaning and sentiment analysis using TextBlob, classifying tweets into different
categories using SVM, creating a user interface using Gradio, and evaluating the performance of
the sentiment analysis system.
Discussion:
Datasets:
The COVID vaccine Twitter datasets are a collection of tweets related to the COVID-19 vaccine.
There are several ways to obtain the dataset, such as through Twitter's API or through third-party
data providers. Once the dataset is obtained, it is preprocessed by removing noise, irrelevant
information, and spam. The remaining tweets are then cleaned and preprocessed using NLP
techniques, such as stop word removal and text normalization.
SVM:
6
SVM stands for Support Vector Machine, which is a machine learning algorithm used for
classification tasks. In this methodology, SVM is used for classifying tweets based on their
sentiment polarity score. The SVM algorithm is trained on a labeled dataset of COVID vaccine-
related tweets to identify the different categories of sentiment, such as positive, negative, or
neutral.
Jupyter Notebook:
Jupyter Notebook is an open-source web application that allows users to create and share
documents that contain live code, equations, visualizations, and narrative text. It is widely used in
data science and machine learning projects for data exploration, data analysis, and data
visualization.
CSV:
CSV stands for Comma-Separated Values, which is a file format used to store tabular data. In this
methodology, the COVID vaccine Twitter datasets are stored in CSV format, which is easy to read
and manipulate using Python libraries such as Pandas.
Python Libraries:
Python is a popular programming language used for data science and machine learning. In this
methodology, several Python libraries are used, such as:
a. TextBlob: TextBlob is a Python library used for natural language processing (NLP). It provides
an easy-to-use interface for performing common NLP tasks such as sentiment analysis, part-of-
speech tagging, and text classification.
b. Scikit-learn: Scikit-learn is a Python library used for machine learning tasks such as
classification, regression, and clustering. In this methodology, the SVM algorithm is implemented
using the Scikit-learn library.
c. Matplotlib: Matplotlib is a Python library used for data visualization. It provides a wide range
of tools for creating charts, graphs, and plots. In this methodology, Matplotlib is used to create
visualizations of the sentiment analysis results, such as bar charts and word clouds.
d. Gradio: Gradio is a Python library used for creating web interfaces for machine learning
models. In this methodology, Gradio is used to create a user interface for the sentiment analysis
system. The interface allows users to input a keyword related to the COVID-19 vaccine and
displays the sentiment analysis results in real-time.
7
In conclusion, the sentiment analysis using NLP TextBlob, SVM, and Gradio with COVID vaccine
Twitter datasets technique provides valuable insights into public sentiment towards the COVID-19
vaccine. The methodology is implemented using Jupyter Notebook, CSV, and several Python
libraries, making it accessible and easy to use for data scientists and machine learning practitioners.
Results:
The sentiment analysis was done using two techniques: TextBlob and SVM. TextBlob is a pre-
trained sentiment analysis tool, while SVM is a machine learning algorithm that was trained on the
dataset. Both techniques showed similar results, with the majority of the tweets being positive or
neutral towards the vaccine.
The results of the sentiment analysis provide valuable insights into the public sentiment towards
the COVID-19 vaccine. They can be used by healthcare providers, policymakers, and researchers
to understand the concerns and opinions of the public towards the vaccine and to address any
issues or misconceptions. Overall, the sentiment analysis using NLP TextBlob, SVM, and Gradio
with COVID vaccine Twitter datasets is an effective tool for monitoring public sentiment towards
the COVID-19 vaccine.
the results of the TextBlob analysis showed an accuracy of 78%, which was lower than the SVM
model. The SVM model achieved an accuracy of 85%, with a precision of 0.86, recall of 0.85, and
F1 score of 0.85. We also used Gradio to create a user interface to display the sentiment analysis
results, allowing users to input their own text and view the sentiment analysis output.
Conclusion & future scope:
Conclusion:
the review monitoring by sentiment analysis using NLP TextBlob, SVM, and Gradio with COVID
vaccine Twitter datasets proved to be an effective tool for analyzing public sentiment towards the
COVID-19 vaccine. The analysis showed that the majority of tweets related to the COVID-19
vaccine were positive or neutral, providing valuable insights into the public sentiment towards the
vaccine.
However, the analysis was limited to Twitter data, and the sentiments expressed on other platforms
such as Facebook, Instagram, and other social media platforms were not included in this analysis.
8
In addition, the sentiment analysis was performed on a relatively small dataset of 10,000 tweets,
which may not be representative of the entire population.
future scope:
sentiment analysis can be extended to other social media platforms and can be combined with other
data sources such as news articles and government reports to provide a more comprehensive
analysis of public sentiment towards the COVID-19 vaccine. Furthermore, the sentiment analysis
can be used to identify patterns and trends in public opinion towards the vaccine, which can help
healthcare providers and policymakers to address public concerns and misconceptions.
Overall, the review monitoring by sentiment analysis using NLP TextBlob, SVM, and Gradio with
COVID vaccine Twitter datasets has the potential to be a valuable tool for monitoring public
sentiment towards the COVID-19 vaccine and can aid in improving vaccine uptake and public
health outcomes.
References:
[1] Zhongkai Hu (huzhongkai@zju.edu.cn), Jianqing Hu(qhu@zju.edu.cn), Weifeng
Ding(vvkkharry@gmail.com), Xiaolin Zheng(xlzheng@zju.edu.cn), “Review Sentiment Analysis
Based on Deep Learning”, College of Computer Science Zhejiang University Hangzhou, China,
2015
[2] zhigang Xu, kai Dong, honglei Zhu*, “Text sentiment analysis method based on attention word
vector”, School of Computer and Communication, Lanzhou University of Technology,(2020)
[3] Apoorv Agarwal, Boyi Xie Ilia Vovsha, Owen Rambow, Rebecca Passonneau, “Sentiment
Analysis of Twitter Data”, Department of Computer Science Columbia University New York, NY
10027 USA,(2011)
[4] Vishal A. Kharde, S.S. Sonawane, “Sentiment Analysis of Twitter Data: A Survey of
Techniques”, Department of Computer Engg, Pune Institute of Computer Technology,Pune
University of Pune (India), (11 apr 2011)
[5] Mostafa Karamibekr(m.karami@unb.ca), Ali A. Ghorbani(ghorbani@unb.ca), “Sentiment
Analysis of Social Issues”, Faculty of Computer Science University of New Brunswick
Fredericton, NB, Canada, (2012)
9
[6] Mohd Majid Akhtar (akhtarmajid273@gmail.com), “Sentiment Analysis on Youtube
Comments: A brief stud”, , M.Tech, JMI 18MCS011 (June 2019)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Sentiment analysis or opinion mining is the field of study related to analyze opinions, sentiments, evaluations, attitudes, and emotions of users which they express on social media and other online resources. The revolution of social media sites has also attracted the users towards video sharing sites, such as YouTube. The online users express their opinions or sentiments on the videos that they watch on such sites. This project presents a brief survey of techniques to analyze opinions posted by users about a particular video. Opinion mining or comments toward attitude evaluation, individual entity, are usually called sentiment. Everyone is free to give opinion related with the present opinions on youtube. Hence people have a free will to express their opinion regarding the performance. Due to the raise of many critics that appear in a short amount of time, there a need to conduct a analysis on opinion mining. The process of searching or tracing the natural language to find patterns or moods of society against certain products, people or topics is called Sentiment Analysis. Sentiment analysis is also often referred to as the opinion of mining.[1] 1 The sentiment analysis has received considerable attention since the research of Pang, Turney, Goldberg and Zhu. Sentiment analysis techniques can support many decisions in many scenarios. This study uses three class attributes, which are positive, neutral and negative, because in the internet the comments that appear can be positive, neutral and negative comments.[2] 2 TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. To get this this library, follow the command.
Article
Full-text available
With the advancement of web technology and its growth, there is a huge volume of data present in the web for internet users and a lot of data is generated too. Internet has become a platform for online learning, exchanging ideas and sharing opinions. Social networking sites like Twitter, Facebook, Google+ are rapidly gaining popularity as they allow people to share and express their views about topics,have discussion with different communities, or post messages across the world. There has been lot of work in the field of sentiment analysis of twitter data. This survey focuses mainly on sentiment analysis of twitter data which is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous and are either positive or negative, or neutral in some cases. In this paper, we provide a survey and a comparative analyses of existing techniques for opinion mining like machine learning and lexicon-based approaches, together with evaluation metrics. Using various machine learning algorithms like Naive Bayes, Max Entropy, and Support Vector Machine, we provide a research on twitter data streams.General challenges and applications of Sentiment Analysis on Twitter are also discussed in this paper.
Article
Full-text available
We examine sentiment analysis on Twitter data. The contributions of this paper are: (1) We introduce POS-specific prior polarity fea- tures. (2) We explore the use of a tree kernel to obviate the need for tedious feature engineer- ing. The new features (in conjunction with previously proposed features) and the tree ker- nel perform approximately at the same level, both outperforming the state-of-the-art base- line. kernel based model. For the feature based model we use some of the features proposed in past liter- ature and propose new features. For the tree ker- nel based model we design a new tree representa- tion for tweets. We use a unigram model, previously shown to work well for sentiment analysis for Twit- ter data, as our baseline. Our experiments show that a unigram model is indeed a hard baseline achieving over 20% over the chance baseline for both classifi- cation tasks. Our feature based model that uses only 100 features achieves similar accuracy as the uni- gram model that uses over 10,000 features. Our tree kernel based model outperforms both these models by a significant margin. We also experiment with a combination of models: combining unigrams with our features and combining our features with the tree kernel. Both these combinations outperform the un- igram baseline by over 4% for both classification tasks. In this paper, we present extensive feature analysis of the 100 features we propose. Our ex- periments show that features that have to do with Twitter-specific features (emoticons, hashtags etc.) add value to the classifier but only marginally. Fea- tures that combine prior polarity of words with their parts-of-speech tags are most important for both the classification tasks. Thus, we see that standard nat- ural language processing tools are useful even in a genre which is quite different from the genre on which they were trained (newswire). Furthermore, we also show that the tree kernel model performs roughly as well as the best feature based models, even though it does not require detailed feature en-