ArticlePDF Available

DETECTION AND CHARACTERIZATION OF FAKE ACCOUNTS ON THE PINTEREST SOCIAL NETWORK A

Authors:
  • Department of computer science faculty of sciences-minia university-egypt
  • Faculty of Computers and Artificial Intelligence - Cairo University

Abstract and Figures

1. ABSTRACT Since the beginning of Online Social Networks (OSN) they have played a main role in the way people pursue they social life. Almost everyone's social life has become associated with social networks, building your own community and keeping in contact with friends has not been easier and enjoyable. Some OSNs did not survive for long, while others manage to stand still. With the rapid growth of any OSN, and as they rise in popularity, many problems also arise, like online impersonation, fake accounts, and spam. Pinterest, the new member of the OSN family, has become a star in almost no time. As Pinterest gains that much publicity among users, spammers have found their way to it too. In this paper, we try to take a closer look on spammers' activity on Pinterest, trying to make a good understanding of how they operate and how they target end users. Using our analysis we were able to build a detection system and ran it over our Dataset of real user accounts from Pinterest, our system were able to distinguish between fake accounts and legitimate user accounts. The true positive rate of our system exceeds 90% while the false positive rate was 0%. 2. 3. INTRODUCTION Pinterest, the new member of the Online Social Networks family, has become a star in almost no time. It is considered the fastest growing Social Network, as a report from ComeScore (Perez, 2012a) indicated, in this report Pinterest growth was calculated using both unique visitors and clicks from search engines, and it was found to be the fastest growing network in both categories, showing +4377% growth between May 2011 and May 2012, compared to a growth rate for Twitter by 58%, LinkedIn 67%, Tumblr 168%, and Facebook 4% during the same period of time. Moreover, Pinterest users were found to spend more time, buy more items and conduct more transactions online than other social media buyers. Pinterest rapid growth has been the hot topic among users, reviewers, and reporters such as (Collins, shows that the percentage of internet users using Pinterest is nearly the same as for those using Twitter. As the popularity of Pinterest grows, so do the spammers' traffic to it. And hence, many articles have shed the light on the spam problem in Pinterest. In a blog post in April 13 2012,
Content may be subject to copyright.
www.tjprc.org editor@tjprc.org
International Journal of Computer Networking,
Wireless and Mobile Communications (IJCNWMC)
ISSN(P): 2250-1568; ISSN(E): 2278-9448 Vol. 4, Issue 3, Jun 2014, 21-28 © TJPRC
Pvt. Ltd.
DETECTION AND CHARACTERIZATION OF FAKE ACCOUNTS ON THE PINTEREST
SOCIAL NETWORK
ENAS ELGELDAWI1, AHMED A RADWAN2, FATMA OMARA3, TAREK M MAHMOUD4 & HARSHA V
MADHYASTHA5
1,2,4
Department of Computer Science, Faculty of Science, Minia University, Egypt
3
Department of Computer Science, Faculty of Computers and Information, Cairo University, Egypt
5
Department of Computer Science and Engineering, University of California, Riverside, USA
1. ABSTRACT
Since the beginning of Online Social Networks (OSN) they have played a main role in the way people pursue they
social life. Almost everyone’s social life has become associated with social networks, building your own community and
keeping in contact with friends has not been easier and enjoyable. Some OSNs did not survive for long, while others manage
to stand still. With the rapid growth of any OSN, and as they rise in popularity, many problems also arise, like online
impersonation, fake accounts, and spam. Pinterest, the new member of the OSN family, has become a star in almost no time.
As Pinterest gains that much publicity among users, spammers have found their way to it too. In this paper, we try to take a
closer look on spammers’ activity on Pinterest, trying to make a good understanding of how they operate and how they target
end users. Using our analysis we were able to build a detection system and ran it over our Dataset of real user accounts from
Pinterest, our system were able to distinguish between fake accounts and legitimate user accounts. The true positive rate of
our system exceeds 90% while the false positive rate was 0%.
2. KEYWORDS: Pinterest, Online Social Networks, Spammers, Fake Accounts
3. INTRODUCTION
Pinterest, the new member of the Online Social Networks family, has become a star in almost no time. It is
considered the fastest growing Social Network, as a report from ComeScore (Perez, 2012a) indicated, in this report Pinterest
growth was calculated using both unique visitors and clicks from search engines, and it was found to be the fastest growing
network in both categories, showing +4377% growth between May 2011 and May 2012, compared to a growth rate for
Twitter by 58%, LinkedIn 67%, Tumblr 168%, and Facebook 4% during the same period of time. Moreover, Pinterest users
were found to spend more time, buy more items and conduct more transactions online than other social media buyers.
Pinterest rapid growth has been the hot topic among users, reviewers, and reporters such as (Collins, 2012), (Constine, 2012),
(Orsin, 2012)[a, b, c, d], (Romari, 2013), (Semicoast SAS, 2013), (Slegg, 2013), (White, 2013).
A late 2012 survey by the Pew Research Center’s Internet & American Life Project (Pew Research Center, 2013)
shows that the percentage of internet users using Pinterest is nearly the same as for those using Twitter.
As the popularity of Pinterest grows, so do the spammers’ traffic to it. And hence, many articles (Doshi, 2012),
22 Enas Elgeldawi, Ahmed A Radwan, Fatma Omara, Tarek M Mahmoud & Harsha V Madhyastha
Impact Factor (JCC): 5.3963 Index Copernicus Value (ICV): 3.0
(Enguage Company, 2012), (Greenfield, 2012), (Honigman, 2012), (Horwitz, 2013), (Lunden, 2013), (McHugh, 2012),
(Protalinski, 2012), (Perez, 2012b) have shed the light on the spam problem in Pinterest. In a blog post in April 13 2012,
Pinterest acknowledged its spam issues, explaining its ever-improving Spam-fighting technology and overall effort to make
things better. At mid and late 2012 they made a noticeable effort regarding the spam problem by banning several spamming
accounts. However, spammers still find new ways to stay there, usually by creating fake accounts to hide in, and here we are
trying to dig deeper in such a problem.
We define a fake account to be an account that is not genuine, in other words it belongs to a person who claims to
be someone they are not, doing some malicious and undesirable activity, causing problems to the social network and fellow
users. This leads us to the question of why would a person create a fake account?
This could be for so many reasons such as online impersonation to defame a person, manipulating people, stealing
confidential information from people, campaigning a person, advertising a product, etc. In our case, we claim that the reason
for creating a fake account in Pinterest most probably falls into advertising and campaigning. Both involve the creation of
mass number of fake accounts. Advertising is accomplished by making these accounts promote the same product(s), while
campaigning tends to raise the popularity of a certain account by making such mass of fake accounts follow the account in
concern.
Regardless of the reason of creating a huge number of fake accounts, they are usually created in an automated or
semi-automated fashion, making these accounts look similar to each other in somehow. Our goal here is to find fake accounts
in Pinterest based on the similarity of the accounts.
We Frame Our Contribution As
Although Pinterest is increasingly becoming the target of spam attacks, this paper is considered the first
study in literature to address the spam problem on Pinterest.
We built our Dataset from the Pinterest web site containing (3920) accounts.
We identified characteristics of some spammers’ networks in Pinterest.
We developed a system to detect fake accounts, and successfully found (1503) fake account.
The paper is organized as follows, section 4 gives a brief description of the main terms of Pinterest. We discuss
our experimental methodology in Section 5, while the validation methodology is given in section 6. A thorough look at the
output is contained in Section 6. And, finally, we give the conclusion in Section 7.
4. PINTEREST… WHAT IT IS?
We quote the definition Pinterest.com used for their own site as, “Pinterest is a tool for collecting and organizing
things you love”. Pinterest is an OSN that allows users to “Pin” materials they find “Interesting” onto virtual pinboards and
share them with others. Below we list the basic terms we will be using throughout the paper and you should be aware of
when using Pinterest (Radwan et al, 2014):
Pin: A Pin could be an image or a video, you add a Pin either by uploading it right from your computer, or by
using the “Pin it” bookmarking button which you install from the Pinterest website and got it added to your browser’s
bookmark toolbar, so you can instantly pin any image or video you come across as you are browsing the Internet. You also
have the option of setting a link to your Pin. A Pin can be repined by other users, and all repined pins should link back to the
Detection and Characterization of Fake Accounts on the Pinterest Social Network 23
www.tjprc.org editor@tjprc.org
original source. When viewing someone’s pin, you have three options, to like the pin, comment on the pin, or repin the pin
to one of your own boards.
Board: A Board is where you group your pins together and organize them by topic, giving each board a
meaningful name that describes the nature of the pins it contains.
Followers: They are the users who follow you. A user can follow you either by following all your boards, or just
the ones they are interested in.
Following: They are the users who you follow.
Collaboration Board: Pinterest does allow collaboration in boards. Such that, a group of users share the same
board and each one of them can add pins to that board. But the board owner has to enable contributors, a function only the
board owner can perform. The prerequisite is that you, as a board owner, must follow one or more boards of others you
would like to add as contributors before adding them. The collaborative board appears in all of the contributors profiles as if
it is one of their own boards.
For a thorough overview over Pinterest the reader is urged to read (Radwan et al, 2014).
5. EXPERIMENTAL METHODOLOGY
Our journey in finding fake accounts went through two steps, we discuss these steps below.
5.1 Constructing Our Data Set
As Pinterest does not provide API’s till the time this paper was written, so we had to build our own Crawler System
to collect user detailed information. The system has been built in JAVA, and we extracted the following data:
For each account: Number of Boards, Pins, Likes, Followers, and we extracted some of the Followings, as we were
more concerned about the Followers more than the Followings.
For each Board: Board Name, and Number of Pins in each Board
For each Pin: Pin ID, Link, and Image
We’ve been able to collect 886,444 Pins in 3920 accounts. Our collected data is considered to be the first dataset ever obtained
from the Pinterest web site
5.2 Building the Detection System
First, we start our search by manually identifying 50 fake accounts, which use the same set of pins that target the
same set of web sites. So we thought that tracing the followers’ hierarchy of one fake account would led to
discovering other accounts belong to the same network and may be to other networks.
To widen our investigation, we used the search page in Pinterest to find other accounts which target the identified
web sites, and may belong to different networks.
We used these 50 accounts as a seed to our system to find more fake accounts that follow the same pattern.
We maintain the set of targeted URLs by those 50 fake accounts into two lists “Links” and “Domains”. The first list
contains the whole URL of the web site, while the second list contains just the domains, then extracting the top
100000 sites from the domain list. We want to emphasis that the web sites included in “Domains” are not necessarily
24 Enas Elgeldawi, Ahmed A Radwan, Fatma Omara, Tarek M Mahmoud & Harsha V Madhyastha
Impact Factor (JCC): 5.3963 Index Copernicus Value (ICV): 3.0
black listed sites, it may contain legitimate web sites but they are targeted domains from the accounts we identified
as fake.
For it seems that we only need one list, either “Links” or “Dominas”, but in fact we need both lists. The reason
behind this is because some fake accounts which tend to promote products in a web site, they create pins for
product1, product2 …etc., which all have the same domain but a different URL, in this case we are concerned with
the domain itself and not the individual products link. On the other hand, a fake account may use a specific resource
on one of the top rated web sites, for example a video on youtube.com that explains how to increase the number of
followers or to become a millionaire in no time, and as we exclude the top 10000 web sites from the “Domain” list,
so in such case we look for the whole URL rather than just the domain.
We maintain the accounts to be tested in a list “MaybeFake”. Each member of the list is a candidate fake account.
The goal now is to classify each member of this list as being fake account or legitimate user account based on the
web sites targeted by the pins of each account.
We build a two-stage detection algorithm. Our first intuition for spotting fake accounts in “MaybeFake” list was to
examine all the domains and links of the pins in the account in concern against those in our “Domains” and “Links”
lists. if the number of pins targeting web sites in either the “Links” list or the “Domains” list exceeds a certain
threshold (Threshold 1 = 30), then the user account has to undergo stage two, in which we check the similarity
between this account and previously identified fake accounts using another threshold (Threshold 2 = 50). The
purpose for stage one is for complexity matters, meaning that if the account is a genuine user account, then it doesn’t
have to be checked with all the fake accounts identified before. We used a test dataset to determine our thresholds.
We tested our algorithm against several threshold values, our main interest when determining these thresholds was
to obtain the best precision value that doesn’t produce false positives.
We test the above procedure on a subset of “MaybeFake” list in 4 different test data sets and the result came with
90.25% precision to identify fake accounts. It missed some fake accounts, but still produces zero false positives.
Figure 1 and Figure 2 summarize the crawling module and the classification module respectively.
6. VALIDATION METHODOLOGY
Our validation methodology includes the following:
The accounts banned by Pinterest: Fake accounts discovered by Pinterest are usually banned. If any account from
our Dataset is no longer exist, this indicates that it has been discovered and banned by Pinterest, then we consider
this account as fake, and used the information of the account (URLs of the Pins, Board Names, and Followers) as
a source of validation.
Google BlackList: We used Google safe browsing API to write a JAVA module that checks the URLs of the Pins
posted by the user against the Google safe browsing list.
Spam contents keywords: Certain keywords often appear in spam messages and spam web site. Many of these
sites attempt to sell the usual assortment of products. We built a set of well known keywords that are indicative of
spam, such as “Free” “Survey” and “SEO” We then performed full-text searches on the Pin Link, and classify the
resulting URL that matches our keyword set as malicious.
Detection and Characterization of Fake Accounts on the Pinterest Social Network 25
www.tjprc.org editor@tjprc.org
Manual Analysis: Since widespread spam campaigns are likely to be reported and discussed by people on the web,
we can use manual validation to confirm our results. All the accounts have been manually analyzed.
Figure 1: Crawling Module
Figure 2: Classification Module
7. RESULTS AND DISCUSSIONS
We initiate our search using a seed sample of 50 manually checked fake accounts, then we tested our procedure
on a subset of “MaybeFake” list in 4 different test data sets each containing 100 accounts and the result came with an average
of 90.25% precision to identify fake accounts. It missed some fake accounts, but still produces zero false positives.
In our seed sample we were able to identify 3 different groups of spammer networks. In this section we study seed
data as well as the output of our detection system in order to draw some characteristics of fake accounts.
7.1 A Closer Look at Fake Accounts Characteristics
Spamming Network is a mass number of fake accounts usually created by the same person. Although the accounts
belong to the same network differ in the number of boards, pins, followers and followings, they share some characteristics
as:
They promote the same products.
They may use the same names for boards. We exclude from the board names Pinterest default boards, “Books worth
reading”, “My style”, “Products I love”, “For the home”, and “Favorite places and spaces”, these were the default
boards Pinterest used to initialize a new account, but not anymore.
They use the same set of pins.
For most of the accounts, we find inconsistency between the name of the board and the contents of the board (i.e.,
The account /maxy111/ has a board named “Books worth reading” which contains 2584 pins and there is no single book in
it, another example “/activerog/” has a board “cars and motorcycle” which contains 223 pins with no cars or motorcycles
related pins).
These accounts follow each other forming one huge volume. There are two types of accounts within a network,
26 Enas Elgeldawi, Ahmed A Radwan, Fatma Omara, Tarek M Mahmoud & Harsha V Madhyastha
Impact Factor (JCC): 5.3963 Index Copernicus Value (ICV): 3.0
accounts with large number of boards, pins and followers, which we call “Star” accounts, and the rest of the network act as
“Supporting role” accounts, usually have few numbers of boards, pins and followers. While “Supporting role” accounts also
help in promoting the same set of products, the main role for these accounts is to support “Star” accounts by following them.
The more followers any account has, the more popular it becomes. And the more popular an account becomes, the more
legitimate users it can attract.
7.2 Classification of Our Seed Data
From our extensive study of hundreds of accounts in Pinterest we were able to categorize our seed data into three
groups, we explain here the characteristics of each group:
Group 1: All the members of this group share the same set/subset of the pins which are not repined, but rather
uploaded by the user. Each account has small number of boards, usually 1 or 2 boards for most members, containing a large
number of pins. They use the same set of board names. Moreover, some members have a lot of collaborate boards besides
their own boards.
Group 2: On the other hand the members of this group have large number of boards, with each board contains 1
or 2 pins. They also use the same set of board names.
Group 3: This group has special characteristics that differentiate it from the others, in the sense that each account
has exactly 4 boards, the account id is exactly 8 or 9 characters (i.e. /cadegown/, /kiditrio/, etc.). They have about the same
number of pins, followers, and following. They don’t use the same board names, instead they use the same theme of the
boards. The boards are organized such that one board for favorite pins, a second for sports, a third for food recipes, and a
fourth for crafts.
8. CONCLUSIONS
In this paper we present a detection system that classifies an account of being either a fake account or a real user
account. Our technique classifies accounts based on the strong similarity between them in terms of matching URL
destination of the pins and the name of the boards. After running our Program on 886444 pins in 3920 accounts, it found
1503 fake account containing 345000 pins. We tested the Program in 4 different test data sets and the result came with an
average of 90.25% precision of classification. It missed some fake accounts, but still produces zero false positives.
The key novelty of our paper lies in the following points
Identifying characteristics of some spammers’ networks in Pinterest.
Developing a system to detect fake accounts, and successfully found (1503) fake account. Although
Pinterest is Increasingly becoming The Target of Spam Attacks, This Paper Is Considered the first study
in Literature To Address The Spam Problem On Pinterest.
To the best of our knowledge, our study is the first to investigate the problem of fake accounts in pinterest social
network.
9. REFERENCES
1. Black Hat World Forum. (March 16, 2012). Retrieved From Http://Www.blackhatworld.com/Blackhat-Seo/buysell-
trade/419090-My-Personal-Pinterest-com-bot-collection.html
Detection and Characterization of Fake Accounts on the Pinterest Social Network 27
www.tjprc.org editor@tjprc.org
2. Matt Collins. (March 21, 2012). Total Pinterest. Retrieved from http://totalpinterest.com/revealed-the-fakeaccounts-
invading-pinterest/
3. Josh Constine. (February 7, 2012). Pinterest Hits 10 Million U.S. Monthly Uniques Faster Than Any Standalone
Site Ever -Comscore. Retrieved from http://techcrunch.com/2012/02/07/pinterest-monthly-uniques/
4. Nishant Doshi. (March 13, 2012). Semantic.Com. Survey Scammers Moving To Pinterest. Retrieved From
http://www.symantec.com/connect/blogs/survey-scammers-moving-pinterest
5. Enguage Agency. (March 6, 2012). Pinterest: A Review Of Social Media’s Newest Sweetheart. Retrieved From
http://www.engauge.com/assets/pdf/engauge-pinterest.pdf
6. Rebecca Greenfield. (April 24, 2012). The Atlantic Wire. Pinterest's Spam Problems Are Getting Worse.
Retrieved From Http://Www.Theatlanticwire.Com/Technology/2012/04/Pinterests-Spam-Problems-Are-Getting-
Worse/51494/
7. Brian Honigman. (November 29, 2012). Huffington Post. 100 Fascinating Social Media Statistics And Figures From
2012. Retrieved from http://www.huffingtonpost.com/brian-honigman/100-fascinating-socialme_b_2185281.html
8. Josh Horwitz. (July 10, 2013). The Next Web. Semiocast: Pinterest Now Has 70 Million Users And Is Steadily
Gaining Momentum Outside The Us. Retrieved From http://thenextweb.com/socialmedia/2013/07/10/
semiocast-pinterest-now-has-70-million-users-and-is-steadily-gaining-momentum-outside-the-us/
9. Ingrid Lunden. (June 10, 2013). Pinterest Pushes Global Growth With A Localized Version For France, Its First
Non-English Site. Retrieved from http://techcrunch.com/2013/06/10/pinterest-pushes-global-growth-with-
alocalized-version-for-france-its-first-non-english-site/
10. Molly Mchugh. (March 29, 2012). Digital Trends. Inside the underground pinterest spam rings turning your Clicks
Into Cash. Retrieved from http://www.digitaltrends.com/social-media/inside-the-underground-pinterestspam-rings-
turning-your-clicks-into-cash/
11. Nielsen company. (December 4, 2012). state of the media: The social media report 2012. Retrieved from
http://www.nielsen.com/us/en/reports/2012/state-of-the-media-the-social-media-report-2012.html
12. Lauren Rae Orsini. (February 22, 2012a). The Daily Dot. How to hijack popular brands on pinterest for free
publicity. retrieved from http://www.dailydot.com/culture/how-to-hijack-popular-brands-pinterest/
13. Lauren Rae Orsini. (March 01, 2012b). The daily dot. Why obama looks like a total bro on pinterest. Retrieved from
http://www.dailydot.com/news/barack-obama-pinterest-bropin/
14. Lauren Rae Orsini. (March 12, 2012c). The Daily Dot. Retrieved from http://www.dailydot.com/business/how-
tostop-spam-pinterest-collaboration-boards/
15. Lauren Rae Orsini. (March 27, 2012d). The Daily Dot. A Pinterest Spammer Tells All. Retrieved from
http://www.dailydot.com/news/pinterest-steve-amazon-spammer-tells-all/
16. Richard Owen. (March 4, 2013). Boot Camp Media. Pinterest: The Fastest Growing Social Media Platform.
Retrieved from http://www.bootcampmedia.co.uk/blog/pinterest-fastest-growing-social-network/
17. Pew Research Center. (Feb 14, 2013). The Demographics Of Social Media Users 2012. Retrieved from
http://pewinternet.org/~/media//files/reports/2013/pip_socialmediausers.pdf
28 Enas Elgeldawi, Ahmed A Radwan, Fatma Omara, Tarek M Mahmoud & Harsha V Madhyastha
Impact Factor (JCC): 5.3963 Index Copernicus Value (ICV): 3.0
18. Pinterest official website. Retrieved from http://www.pinterest.com
19. Sarah Perez. (June 14, 2012a). Tech Crunch. Comscore: U.S. Internet Report: Yoy, Pinterest Up 4000+%,
Amazon Up 30%, Android Top Smartphone & More. Retrieved from
http://techcrunch.com/2012/06/14/comscore-us-internet-report-yoy-pinterest-up-4000-amazon-up-30-android-
topsmartphone-more/
20. Sarah Perez. (December 11, 2012b). Tech Crunch. Report: Roughly 20 Percent Of Pinterest’s Top 10 Users’
Followers Were Spammers And Fake Accounts. Retrieved from http://techcrunch.com/2012/12/11/roughly-
20percent-of-pinterests-top-10-users-followers-were-spammers-and-fake-accounts/
21. Emil Protalinski. (September 11, 2012). The Next Web. Pinterest users complain about hacked accounts as
spam spills onto facebook, twitter. Retrieved from
22. Ahmed A. Radwan, Harsha V. Madhyastha, Fatma Omara, Tarek M. Mahmoud & Enas Elgeldawi, (Feb 28, 2014).
Pinterest Attraction between Users and Spammers. International Journal Of Science Engineering And Information
Technology Research, Vol. 4, Issue 1, Page (53-72).
23. Monica Jade Romari. (February 25, 2013). Social Media Today. The Soaring Popularity Of Pinterest. Retrieved
from http://socialmediatoday.com/monica-romeri/1254696/soaring-popularity-pinterest-infographic
24. Semiocast Sas. (July 10, 2013). Pinterest Has 70 Million Users More Than 70% Are In The U.S. Retrieved from
http://semiocast.com/en/publications/2013_07_10_pinterest_has_70_million_users
25. Jenniger Slegg. (July 16, 2013). Tech Crunch. Pinterest Tops 70 Million Users; 30% Pinned, Repinned, or Liked in
June [Study]. Retrieved from http://searchenginewatch.com/article/2282835/pinterest-tops-70-million-users-
30pinned-repinned-or-liked-in-june-study
26. Steven White. (February 9, 2013). Social Media Growth 2006 To 2012. Retrieved from
http://dstevenwhite.com/2013/02/09/social-media-growth-2006-to-2012/
... Today, machine learning algorithms is integrated in almost every scientific discipline such as networking [18][19][20], text analysis [21,22], image processing [23][24][25][26][27][28][29][30], cloud computing [31] and social networks [32,33]. This broad range of machine learning application disciplines is due to their promising predictive performance. ...
Article
Full-text available
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine, biotechnology and more. Protein secondary structure prediction (PSSP) has a significant role in the prediction of protein tertiary structure, as it bridges the gap between the protein primary sequences and tertiary structure prediction. Protein secondary structures are classified into two categories: 3-state category and 8-state category. Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems, respectively. The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures, however, Q8 prediction has been found to be very challenging, that is why all previous work done in PSSP have focused on Q3 prediction. In this paper, we develop an ensemble Machine Learning (ML) approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP. The ensemble members considered for constructing the ensemble models are well known classifiers, namely SVM (Support Vector Machines), KNN (K-Nearest Neighbor), DT (Decision Tree), RF (Random Forest), and NB (Naïve Bayes), with two feature extraction techniques, namely LDA (Linear Discriminate Analysis) and PCA (Principal Component Analysis). Experiments have been conducted for evaluating the performance of single models and ensemble models, with PCA and LDA, in Q8 PSSP. The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem. The experimental results confirmed that ensemble ML models are more accurate than individual ML models. They also indicated that features extracted by LDA are more effective than those extracted by PCA.
Article
Full-text available
Machine learning models are used today to solve problems within a broad span of disciplines. If the proper hyperparameter tuning of a machine learning classifier is performed, significantly higher accuracy can be obtained. In this paper, a comprehensive comparative analysis of various hyperparameter tuning techniques is performed; these are Grid Search, Random Search, Bayesian Optimization, Particle Swarm Optimization (PSO), and Genetic Algorithm (GA). They are used to optimize the accuracy of six machine learning algorithms, namely, Logistic Regression (LR), Ridge Classifier (RC), Support Vector Machine Classifier (SVC), Decision Tree (DT), Random Forest (RF), and Naive Bayes (NB) classifiers. To test the performance of each hyperparameter tuning technique, the machine learning models are used to solve an Arabic sentiment classification problem. Sentiment analysis is the process of detecting whether a text carries a positive, negative, or neutral sentiment. However, extracting such sentiment from a complex derivational morphology language such as Arabic has been always very challenging. The performance of all classifiers is tested using our constructed dataset both before and after the hyperparameter tuning process. A detailed analysis is described, along with the strengths and limitations of each hyperparameter tuning technique. The results show that the highest accuracy was given by SVC both before and after the hyperparameter tuning process, with a score of 95.6208 obtained when using Bayesian Optimization.
Retrieved From Http://Www.blackhatworld.com/Blackhat-Seo/buy- sell-trade/419090-My-Personal-Pinterest-com-bot-collection
  • World Black Hat
  • Forum
Black Hat World Forum. (March 16, 2012). Retrieved From Http://Www.blackhatworld.com/Blackhat-Seo/buy- sell-trade/419090-My-Personal-Pinterest-com-bot-collection.html
Total Pinterest. Retrieved from http://totalpinterest.com/revealed-the-fake- accounts-invading-pinterest
  • Matt Collins
Matt Collins. (March 21, 2012). Total Pinterest. Retrieved from http://totalpinterest.com/revealed-the-fake- accounts-invading-pinterest/
Pinterest Hits 10 Million U.S. Monthly Uniques Faster Than Any Standalone Site Ever -Comscore. Retrieved from http
  • Josh Constine
Josh Constine. (February 7, 2012). Pinterest Hits 10 Million U.S. Monthly Uniques Faster Than Any Standalone Site Ever -Comscore. Retrieved from http://techcrunch.com/2012/02/07/pinterest-monthly-uniques/
Semantic.Com. Survey Scammers Moving To Pinterest. Retrieved From http
  • Nishant Doshi
Nishant Doshi. (March 13, 2012). Semantic.Com. Survey Scammers Moving To Pinterest. Retrieved From http://www.symantec.com/connect/blogs/survey-scammers-moving-pinterest
Pinterest: A Review Of Social Media's Newest Sweetheart. Retrieved From http
  • Enguage Agency
Enguage Agency. (March 6, 2012). Pinterest: A Review Of Social Media's Newest Sweetheart. Retrieved From http://www.engauge.com/assets/pdf/engauge-pinterest.pdf
The Atlantic Wire. Pinterest's Spam Problems Are Getting Worse. Retrieved From Http
  • Rebecca Greenfield
Rebecca Greenfield. (April 24, 2012). The Atlantic Wire. Pinterest's Spam Problems Are Getting Worse. Retrieved From Http://Www.Theatlanticwire.Com/Technology/2012/04/Pinterests-Spam-Problems-Are-Getting- Worse/51494/
Huffington Post. 100 Fascinating Social Media Statistics And Figures From 2012 Retrieved from http
  • Brian Honigman
Brian Honigman. (November 29, 2012). Huffington Post. 100 Fascinating Social Media Statistics And Figures From 2012. Retrieved from http://www.huffingtonpost.com/brian-honigman/100-fascinating-social- me_b_2185281.html
The Next Web Semiocast: Pinterest Now Has 70 Million Users And Is Steadily Gaining Momentum Outside The Us. Retrieved From http://thenextweb.com/socialmedia/2013/07/10/ semiocast-pinterest-now-has-70-million-users-and-is-steadily-gaining-momentum-outside-the-us
  • Josh Horwitz
Josh Horwitz. (July 10, 2013). The Next Web. Semiocast: Pinterest Now Has 70 Million Users And Is Steadily Gaining Momentum Outside The Us. Retrieved From http://thenextweb.com/socialmedia/2013/07/10/ semiocast-pinterest-now-has-70-million-users-and-is-steadily-gaining-momentum-outside-the-us/
Pinterest Pushes Global Growth With A Localized Version For France, Its First Non-English Site Retrieved from http://techcrunch.com/2013/06/10/pinterest-pushes-global-growth-with-a- localized-version-for-france-its-first-non-english-site
  • Ingrid Lunden
Ingrid Lunden. (June 10, 2013). Pinterest Pushes Global Growth With A Localized Version For France, Its First Non-English Site. Retrieved from http://techcrunch.com/2013/06/10/pinterest-pushes-global-growth-with-a- localized-version-for-france-its-first-non-english-site/
Digital Trends Inside the underground pinterest spam rings turning your Clicks Into Cash Retrieved from http://www.digitaltrends.com/social-media/inside-the-underground-pinterest- spam-rings-turning-your-clicks-into-cash
  • Molly Mchugh
Molly Mchugh. (March 29, 2012). Digital Trends. Inside the underground pinterest spam rings turning your Clicks Into Cash. Retrieved from http://www.digitaltrends.com/social-media/inside-the-underground-pinterest- spam-rings-turning-your-clicks-into-cash/