Content uploaded by Mohd Majid Akhtar
Author content
All content in this area was uploaded by Mohd Majid Akhtar on Sep 01, 2020
Content may be subject to copyright.
Sentiment Analysis on Youtube Comments: A brief study
Mohd Majid Akhtar, M.Tech, JMI
18MCS011 akhtarmajid273@gmail.com
Data Mining Project
Under Mr. Mohd Zeeshan Ansari
Abstract
Sentiment analysis or opinion mining is the field of study related to analyze opinions,
sentiments, evaluations, attitudes, and emotions of users which they express on social
media and other online resources. The revolution of social media sites has also attracted
the users towards video sharing sites, such as YouTube. The online users express their
opinions or sentiments on the videos that they watch on such sites. This project presents a
brief survey of techniques to analyze opinions posted by users about a particular video.
Keywords: Opinion Mining, Sentiment Analysis, Social Media, Social Networking, User
Reviews, Video Sharing, YouTube.
Problem Description
Opinion mining or comments toward attitude evaluation, individual entity, are usually called
sentiment. Everyone is free to give opinion related with the present opinions on youtube. Hence
people have a free will to express their opinion regarding the performance. Due to the raise of
many critics that appear in a short amount of time, there a need to conduct a analysis on opinion
mining.
The process of searching or tracing the natural language to find patterns or moods of society
against certain products, people or topics is called Sentiment Analysis. Sentiment analysis is
also often referred to as the opinion of mining.[1]
1
The sentiment analysis has received
considerable attention since the research of Pang, Turney, Goldberg and Zhu. Sentiment analysis
techniques can support many decisions in many scenarios. This study uses three class attributes,
which are positive, neutral and negative, because in the internet the comments that appear can be
positive, neutral and negative comments.[2]
2
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for
diving into common natural language processing (NLP) tasks such as part-of-speech tagging,
noun phrase extraction, sentiment analysis, classification, translation, and more.
To get this this library, follow the command.
1
G. Vinodhini., RM. Chandrasekaran ., “Sentiment Analysis and Opinion Mining: A Survey”, International Journal of Advanced
Research in Computer Science and Software Engineering, Issue Volume 2, 2012.
2
Anto Satriyo Nugroho., Arief Budi Witarto., Dwi Handoko., “Support Vector Machine – Teoridan Aplikasinya dalam
Bioinformatika1”, 2003.
pip install textblob
TextBlob finds all the words and phrases that it can assign a polarity and subjectivity to, and
averages all of them together.
Output: Each comment is assigned one polarity, where polarity means how positive or negative a
word or a sentence is and range between -1 to +1. -1 is very negative, +1 is very positive.
Not sophisticated technique, simply a rule based technique.
In this project, we have not used subjectivity, and only concentrated on the polarity of the
comments.
How Polarity and subjectivity plays role and works.
DATASET DESCRIPTION
In our case, we have taken a data from YouTube video named “Liverpool FC- The road to
Madrid- UCL2019” by channel MRCLFCompilations. This video was chosen as to do analysis
on the viewers of football and to know the reaction of football fans on the winning of Liverpool
(an English Premier League Football Club) vs Barcelona FC. This video is whole journey of
Liverpool FC in the astonishing tournament named Champions League Trophy.
Second, reason to choose this video is because of my own interest and passion in football. As I
know football, it became easy for me to develop a dataset which is having text written by the
user and along with it the polarity value (which was given by me considering as Expert in
football), which help in finding accuracy, precision and recall in the later part of the project.
All said about the kind of data, now the ID of the video is Y-XHMlaJL-s.
With the help of this ID of video, that YouTube uniquely assigns, we will download the
comments and which will be saved in a json format. Also, while downloading the comments, we
will be asked to state the number of comments we would like to download. In my case, I have
kept data small i.e, Dataset Size 50. The output saved file name is ‘football_video.json’.
The file have extracted 4 key value pair, namely, [cid, text, author, time]. Here, cid is the
Comment ID.
FEATURE DESCRIPTION
Even after, having data, it not much useful. The downloaded file is in Json file Format. We first
need to convert it into csv (comma separated value).
This will be done by convert_json_to_csv.ipynp file.
While converting into csv, we will only keep the column=[‘text’], rest we don‟t care much about
because we only need the comment text for data. So these are all are achieved by the
convert_json_to_csv file.
Now after getting the 50 row data in csv file, we will see each sentence and gives a rating
according to the polarity of it. Since, here I was the one supervising it, so from my intuition and
domain knowledge, I distributed the polarity to each in either Positive(1), Negative(-1), and
neutral(0).
As, you can see, the value is the text, and the value after the comma(,) is the value of polarity
that I have assigned.
Now using this we will build our model.
Methodology
Steps
1. Get all files in the same directory.
2. Earlier, I tried to work with Youtube API Data v3 but it only provides 100 comments and
version 3 is not even stable for comments retrieving yet. So alternative is to have a script
written in python that get the video comments downloaded. This script will ask for 3
parameters i.e. first is the youtube video id, second being the name of the output file and
third being the limit of number of comments to be downloaded.
This file (script) can also be run via Command Prompt with the following command:
run python downloader.py --youtubeid [name] --output [output name] --limit [comment
number] from cmd in the anaconda directory
3. Note:- to run this you need package named „request, lxml‟ in the same directory folder, it
will download data in json format
4. then convert json file to csv file for further data analysis via another program or simple
by online through www.json-csv.com
Download Dataset of Youtube Video
comments via running a script
Convert Json file to CSV file via
convert_json_to_csv.ipynb file
Run 'sentinment_analysis' python file and
the results showing positive, negative and
neutral response & knowing the accuracy
of model.
5. open Jupyter and run the data analysis file i.e (sentiment_analysis) by giving file name
including the .csv
6. Output is shown in jupyter in a Bar Graph.
CMD(command prompt) Screenshot for downloading the youtube comment
The name of the video is Liverpool FC- The road to Madrid- UCL2019 (Y-XHMlaJL-s)
We want 50 comments of it.
Let‟s name the output file as football_video
This is Youtube ID. In this case it is
Y-XHMlaJL-s
Output
RESULTS:
ACCURACY: 70%
PRECISION: 100%
RECALL: 75%
These are calculated by
Accuracy = TP+TN/TP+FP+FN+TN
Precision = TP/TP+FP
Recall = TP/TP+FN
Analysis by Graph
CONCLUSION
Classification of general events and detection of Sentiment Polarity of user comments in
YouTube is a challenging task for researchers so far. A lot of work is done in this regard but still
have a long way to go to overcome this problem.
In this project, I have emphasized on following problems in order to find the polarity of
comments given by the users of YOUTUBE.
1) Current sentiment dictionaries having limitations.
2) Informal language styles used by users,
3) Estimation of sentiments for community-created terms,
4) To assign proper labels to events,
5) Achieve satisfactory classification performance
6) Challenges involving social media sentiment analysis.
7) The dataset is not too clean. The proper way should be to tokenize it, exclude stop words and
generate a bag of words from it. These steps were ignored in this project.
8) Youtube comments can‟t be much trusted as they can belong to any domain, and this textblob
technique only give polarity based on the text and not in relation with the context of the video.
For example if the video is about mobile phone review and a comment is „today is a wonderful
day‟ then this comment will get polarity 1 and counted in the positive comment part. Whereas,
these comment are not at all related to the review of mobile phone.
REFERENCES
1) Sentiment Analysis on YouTube: A Brief Survey. Muhammad Zubair Asghar
2) Choudhury, Smitashree, and John G. Breslin. "User sentiment detection: a YouTube use case."
(2010).
3) en.wikipedia.org
4) www.academia.edu
5) https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-
measures/
6) TextBlob read blogs. Version v0.15.2.
7) G. Vinodhini., RM. Chandrasekaran ., “Sentiment Analysis and Opinion Mining: A Survey”,
International Journal of Advanced Research in Computer Science and Software Engineering,
Issue Volume 2, 2012.
8) Anto Satriyo Nugroho., Arief Budi Witarto., Dwi Handoko., “Support Vector Machine –
Teoridan Aplikasinya dalam Bioinformatika1”, 2003.