Content uploaded by Ponnurangam Kumaraguru
Author content
All content in this area was uploaded by Ponnurangam Kumaraguru on Aug 19, 2017
Content may be subject to copyright.
Pinned it! A Large Scale Study of the Pinterest Network∗
Sudip Mittal, Neha Gupta, Prateek Dewan, Ponnurangam Kumaraguru
Indraprastha Institute of Information Technology, Delhi (IIIT-D)
{sudip09068, neha1209, prateekd, pk}@iiitd.ac.in
ABSTRACT
Pinterest is an image-based online social network, which was
launched in the year 2010 and has gained a lot of traction,
ever since. Within 3 years, Pinterest has attained 48.7 mil-
lion unique users. This stupendous growth makes it interest-
ing to study Pinterest, and gives rise to multiple questions
about it’s users, and content. We characterized Pinterest
on the basis of large scale crawls of 3.3 million user profiles,
and 58.8 million pins. In particular, we explored various
attributes of users, pins, boards, pin sources, and user loca-
tions, in detail and performed topical analysis of user gen-
erated textual content. The characterization revealed most
prominent topics among users and pins, top image sources,
and geographical distribution of users on Pinterest. We then
tried to predict gender of American users based on a set of
profile, network, and content features, and achieved an accu-
racy of 73.17% with a J48 Decision Tree classifier. We then
exploited the users’ names by comparing them to a corpus
of top male and female names in the U.S.A., and achieved
an accuracy of 86.18%. To the best of our knowledge, this
is the first attempt to predict gender on Pinterest.
Categories and Subject Descriptors
H.3.5 [Online Information Services]: Web-based services
Keywords
Online social networks, Pin, Classification
1. INTRODUCTION
Online Social Networks (OSNs) like Facebook, Twitter,
LinkedIn, and Google+ are web-based platforms that help
users to interact, share thoughts, interests, and activities.
These OSNs allow their users to imitate real life connections
over the Internet. A report by the International Telecom-
munication Union states that the total number of online
∗The first two authors contributed equally to this work.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
social media users has crossed the 1 billion mark as of May,
2012 [27]. According to Nielsen’s Social Media Report, users
continue to spend more time on social networks than on any
other kind of websites on the Internet [31]. With this out-
burst in the number of social media users across the world,
online social media has moved to the next level of innova-
tion. While all the aforementioned conventional social media
services are mostly text-intensive, some of them have gone
beyond text, and have introduced images as their building
blocks. Services like Instagram, and Tumblr have gained im-
mense popularity in the recent years, with Instagram (now
part of Facebook) attaining 100 million monthly active users,
and 40 million photo uploads per day [17]. These numbers
indicate successful entrance of image based social networks
in the world of online social media.
Pinterest is one of the most recent additions to this popu-
lar category of image-based online social networks. Within
a year of its launch, Pinterest was listed among the “50 Best
Websites of 2011” by Time Magazine [29]. It was also the
fastest site to break the 10 million unique visitors mark [8].
Number of users since then have increased, with Reuters
stating a figure of 48.7 million unique users in February
2013 [37]. Although fairly new to the social media fraternity,
Pinterest is being heavily used by many big business houses
like Etsy, The Gap, Allrecipes, Jettsetter, etc. to advertise
their products. 1Further, Pinterest drives more revenue per
click than Twitter or Facebook, and is currently valued at
USD 2.5 billion [37, 45].
The immense upsurge and popularity of Pinterest has
given rise to multiple basic questions about this network.
What is the general user behavior on Pinterest? What are
the most common characteristics of users, pins, and boards?
What is the sentiment associated with user-generated tex-
tual content? What is the geographical distribution of users?
Is it possible to predict gender of Pinterest users? There ex-
ists little research work on Pinterest [2, 5, 22, 32, 42, 43]; but
none of this work addresses the aforementioned basic ques-
tions. To answer these questions, and get deeper insights
into Pinterest, we collected and analyzed a dataset com-
prising of user details (3,323,054), pin details (58,896,156),
board details (777,748), and images (498,433). We applied
multiple machine learning algorithms to predict gender on
a true positive data-set of 6,309 male and 6,309 female Pin-
terest users living in U.S.A.
Based on our analysis, some of our key contributions are
summarized as follows:
1. Topical analysis of user generated textual content on
1http://business.pinterest.com/stories/
Pinterest: We found that the most common topics
across users, and pins were design, fashion, photog-
raphy, food, and travel.
2. User, pin, and board characterization: We analyzed
various user profile attributes, their geographical dis-
tribution, top pin sources, and board categories. Less
than 5% of all images on Pinterest are uploaded by
users; over 95% are pinned from pre-existing web
sources.
3. Gender prediction for American users: We extracted
true positive gender information from Facebook, for
over 66,000 Pinterest users from U.S.A., and were able
to achieve an accuracy of 86.18% while predicting gen-
der. We applied various machine learning algorithms
using multiple content and network based features from
Pinterest.
The rest of the paper is organized as follows. We discuss
the related work in Section 2. We then discuss Pinterest as
a social network in Section 3. In Section 4, we describe our
data collection methodology. Analysis of the collected data
and its results are covered in Section 5. Section 6 contains
discussion, limitations, and future work.
2. RELATED WORK
Social network characterization.
Online social networks, in general, have been studied in
detail by various researchers in the computer science com-
munity. Mislove et al. conducted a large scale measure-
ment study and analysis of Flickr, YouTube, LiveJournal,
and Orkut [30]. Their results confirmed power-law, small-
world, and scale-free properties of online social networks.
In a more recent work, Magno et al. performed a detailed
analysis of the Google+ network, and identified some key
differences and similarities between Google+, and existing
social networks like Facebook, and Twitter [28]. Ugander et
al. performed a large-scale analysis of the entire Facebook
social graph and found that 99.91% of all the users belonged
to a single large connected component [40]. They confirmed
the ‘six degrees of separation’ phenomenon and showed that
the value had dropped to 3.74 degrees of separation in the
entire Facebook network of active users.
Pinterest Introduction.
Considering the rapid growth rate of Pinterest since its
launch, there still exist only a few studies on this social net-
work. In closely related work by Gilbert et al., authors pre-
sented a statistical overview of the Pinterest network, and
showed that female users get more repins but lesser follow-
ers on Pinterest [12]. Their analysis was based on a smaller
dataset of 2.9 million pins, and 989,355 users, in contrast to
our dataset of over 58 million pins, and 3.3 million users.
Chang et al. worked towards finding activity patterns for
attracting attention on Pinterest. Some of the key findings
of this work revealed that male users were not particularly
interested in stereotypically male topics; sharing diverse con-
tent increases attention to a certain level; and homophily
drives repinning. Their dataset consisted of 46,365 users,
and 3.1 million pins [6]. Ottoni et al. analyzed Pinterest in
a gender-sensitive fashion, and found that the network was
heavily dominated by female users. Authors of this work
found that females on Pinterest make more use of lightweight
interactions than males, invest more effort in reciprocating
social links, are more active and generalist in content gen-
eration, and describe themselves using words of affection
and positive emotions. This study spanned across a large
dataset consisting of over 2 million users [32]. Kamath et al.
described a supervised model for board recommendation on
Pinterest. They used a content-based filtering approach for
recommending high quality information to users [21]. Du-
denhoffer et al. tried to use Pinterest as a library marketing
and information literacy tool at the Central Methodist Uni-
versity. They reported that the number of followers viewing
the library pinboards had outpaced usage of the text-based
lists in just one semester [10]. In another similar work by
Zarro et al., authors talked about how digital libraries and
other organizations could take advantage of Pinterest to ex-
pand the reach of their material, allowing users to create
personalized collections, incorporating their content [42]. In
their next piece of work, Zarro et al. found that Pinter-
est serves as infrastructure for repository building that sup-
ports discovery, collection, collaboration and publishing of
content, especially for professionals [43].
Gender prediction on other social networks.
Rao et al. attempted to predict gender of Twitter users
based on a rich set of profile, content, and network at-
tributes, and achieved an accuracy of 72.33% using a SVM
classifier. This was the first attempt to predict gender on
Twitter [36]. Burger et al. achieved a 74% accuracy using
Balanced Winnow2 classifier for predicting gender of Twit-
ter users. Their corpus comprised of 4.1 million tweets, and
15.6 million distinct features [4]. Pennacchiotti et al. tried
to extract gender information from Twitter users’ profiles
by applying regular expressions on users’ bio field. Authors
were able to extract gender information of 80% users from
a sample of 14 million users, but with a very low accuracy.
A manual annotation of over 15,000 users using only pro-
file / avatar picture revealed that only 57% images were
correlated with a specific gender [33]. Tang et al. applied
a name-centric approach for predicting gender of New York
City Facebook users, and achieved an accuracy of 95.2% [39].
Zheleva and Getoor [44] proposed techniques to predict the
private attributes of users in four real-world datasets (in-
cluding Facebook) using general relational classfication and
group-based classification. Their accuracy for gender infer-
ence with their Facebook dataset, was 77.2% based on users’
group affiliations, and the sample dataset used in their study
was quite small (1,598 users in Facebook). Other papers [15,
16, 26, 41] have also attempted to infer private information
inside social networks. Methods they used are mainly based
on link-based traditional Naive Bayes classifiers.
3. UNDERSTANDING PINTEREST
Pinterest is an image-based social bookmarking media,
where users share images which are of interest to them, in
the form of pins on a pinboard. It emphasizes on discov-
ery and curation of images rather than original content cre-
ation. 2This makes Pinterest a very promising conduit for
the promotion of commercial activities online.
2http://blogs.constantcontact.com/product-blogs/social-
media-marketing/what-the-heck-is-pinterest-and-why-
should-you-care/
Similar to other OSNs, Pinterest also uses some specific
terminology to refer to various elements and services it pro-
vides. Some terms are as follows:
1. Pins: A pin is an image that has some meta-data in-
formation associated with it. Pins can be thought of
as basic building blocks of Pinterest. The act of post-
ing a pin is known as pinning, and the user who posts
a pin is the pinner. Similar to images on Facebook,
pins can be liked and shared. Each of these pins has
the following meta-data associated with it – unique
pin number,description,number of likes,number of
comments,number of repins,board name,source, and
content in comments. The act of sharing an already
existing pin is referred to as repinning.
2. Pinboards: They are a themed collection of pins, or-
ganized by a user. Each board (“boards” and “pin-
boards” are used interchangeably) has a name, a de-
scription (optional), category (optional, e.g. Animals,
Art, Celebrities, Food and Drink, Design, Education,
Gardening), and an option to make it Secret. Secret
boards are only visible to the users who create them.
This analogy of pins and pin-boards replicates the real-
world concept of images on a scrapbook.
3. Source: Each pin on Pinterest has a source URL as-
sociated with it. As the name suggests, this is the
actual URL from which the image has been pinned by
a user. Images uploaded by users directly to Pinter-
est from their local computer, have pinterest.com as
their source, whereas images which are pinned from
an existing website (e.g. flickr.com) have this source
website (flickr.com) as the source.
4. Pin-It button: A Pin-It button is a browser book-
mark used to upload content to Pinterest. Some pop-
ular websites like Amazon, eBay, BHG, and Etsy also
provide their own pin-it button next to their product
images. This pin-it button makes it easier for a user
to share the content that she likes on Pinterest.
3.1 User Accounts
A user begins by creating an account using her Facebook
ID, Twitter ID, or an email address. On account creation,
Pinterest asks each new user to follow 5 boards to complete
the creation process, as a mandatory step to get started.
Each user has a profile page (Figure 1) that is publicly visible
to everyone, listing the user’s name, a description, location,
connected Facebook account (if available), connected Twit-
ter account (if available), a profile website, boards (which
are not secret), and associated pins, likes, followers, and fol-
lowees. A user also has a timeline where all pins from the
users she follows, are displayed.
3.2 Social Ties
A user has the option to follow a particular user or a
specific board of any other user. If a user follows another
user, she gets updates about all the boards owned by that
user. But, in case a user follows specific boards, she gets
updates only from those particular boards. This relationship
is quite similar to Twitter’s follower / followee relationship.
Interactions on Pinterest are in the form of pins. A user pins
an image, and can add a pin description to better describe
Profile Description
Personal Website, Facebook ID, Twitter ID
Location
Board Pins
Figure 1: User Profile on Pinterest. The profile
description, websites, location, board, and pin are
marked separately in the screen snapshot.
the pin. Other users can then repin the shared pin, like
it or share their views through a comment. These features
are similar to Facebook’s share, like, and comment features
respectively.
4. DATA COLLECTION
In this section, we discuss the methodology that we ap-
plied for data collection, and describe the data that we col-
lected. Given the size of the entire Pinterest network (48.7
million users), it would have been hard, and computationally
very expensive to be able to capture the entire network.
Pinterest does not provide a public API for data collec-
tion. Therefore, in order to collect data, we designed and
implemented a breadth first search (BFS) crawler in Python.
All data was collected using a Dell PowerEdge R620 server,
with 64 Gigabytes of RAM, 24 core processor, connected to a
1 Gbps Internet connection. The entire data collection pro-
cess spanned from December 26, 2012 to February 1, 2013.
Broadly, this process (Figure 2) was split into three phases
as described below:
4.1 User Handles Collection
The data collection process was initiated by selecting the
top 5 profiles in terms of the number of followers on Pinter-
est, as initial seeds, and feeding them into the crawler. The
crawler first extracted 4,995,974 direct followers of these 5 in-
put seeds, and then repeatedly crawled through the “follow-
ers of followers”. We collected a total of 17,964,574 unique
user handles through this process, which is slightly over 36%
of the entire Pinterest population [37]. We call this, the
userhandles dataset. This technique of snowball-sampling
is commonly used in online social media research [32].
4.2 User Data Collection
Next, we started data collection for user profiles of the
17.96 million user handles collected in the previous step,
and obtained a total of 3,323,054 user profiles, called the
userprofile dataset (we present the analysis on 3.3 million
userprofiles in this paper; though our data collection pro-
Loca%on'
331,530&
Facebook&
profile&data&
1,667,973&
Board'
Details'
(777,748)&
Source'
Details'
58,896,156'
Userprofiles'
3,323,054&
Twi?er&
profile&data&
49,416&
Pins'
58,896,156&
Images'
498,433&
EXIF'Data'
9,950&
Seed'
Users'
5&
Userhandles'
17,964,574'
Figure 2: Flow diagram depicting the flow se-
quence of our data collection process. The darkened
blocks represent our initial seed users, and primary
datasets. The lighter blocks denote the additional
information extracted through the primary dataset.
cess is still active). This userprofile dataset includes user
display name, description field, profile picture, number of
followers, number of followees, number of boards, number
of pins, boards, profile website, Facebook handle, Twitter
handle, location, pins, and likes. Along with user profiles,
we extracted 777,748 boards and their corresponding details
(called the boards dataset). These details include board cat-
egory, number of followers, and number of pins for each pin-
board.
Many times users also mention their Facebook and / or
Twitter profile URLs on Pinterest. Using this information
from the userprofile dataset, we collected publicly available
Facebook information of 1,667,973 users (50.19% of the user-
profile dataset) and Twitter information of 49,416 users (1.4%
of the userprofile dataset). Many Pinterest users also men-
tion location in their profile. We found location details
for 331,530 users (9.93% of the userprofile dataset). Some
users mentioned only their country, whereas others men-
tioned their city as well. Some users gave their location
as “The beach”, “mentally in lala land”, etc. In order to
verify the credibility of such location information, we used
Yahoo Placefinder API 3and obtained the correct details
for 192,261 (57.99%) of these locations.
4.3 Pin Data Collection
Using user profiles as seeds, we collected 58,896,156 unique
pins and their related information. We call this the pin
dataset. This information consists of the pin description,
number of likes, number of comments, number of repins,
board name, and source for each pin. We also collected a
random sample of 498,433 images (called the images dataset)
from these pins. For each of these images, we extracted
their Exchangeable Image File Format (EXIF) information
for further analysis. 4Most common pieces of EXIF infor-
mation available were date, time, image description, artist,
3http://developer.yahoo.com/boss/geo/docs/requests-
pf.html
4http://fotoforensics.com/tutorial-meta.php#EXIF
copyright, and camera make / model. We also extracted in-
formation about pin sources for each pin, referred to as the
source dataset.
5. ANALYSIS
We now present our analysis of the users, pins, and boards
in detail.
5.1 User characterization
5.1.1 Profile description
From our userprofile dataset of 3,323,054 user profiles,
we found that only 589,193 (17.73%) users had profile de-
scription. We observed that users revealed private details
through this field, like age, marital status, personal traits,
email IDs, phone numbers, etc. The profile description of
one user said, “I am 35, happily married, love kids & cats,
and have a disturbing sense of humor!” We extracted 100
most frequently occurring words from the profile descrip-
tion, and found topics like fashion, design, food, music, art,
photography, and travel as the most popular user interests
(Figure 3). We observed that the most common interests
were in line with the most common professions (like artist,
designer, cook, photographer) mentioned by the users. This
shows that large proportion of Pinterest consumers make
use of the network for professional activities.
Figure 3: Tag cloud of the top 100 words taken from
user’s profile description field.
5.1.2 Social and commercial links
Another source of information on the user profile is the
“website” field, where users can provide URLs to their per-
sonal websites, and blogs. In our dataset, we found that
177,462 (5.34%) users had mentioned a website. The top-
most domain was Facebook, where 9,697 (5.46%) users had
mentioned a link to their Facebook profiles. Twitter, Etsy,
YouTube, Flickr, About.me, LinkedIn, etc. were the other
domains which constitute the top 10. Apart from the web-
site field, Pinterest separately provides users with an option
to connect their Facebook and / or Twitter accounts with
their Pinterest profiles. Out of over 3.3 million user profiles
that we collected, over 2.71 million users (81.78%) had con-
nected their Facebook profiles with Pinterest. Only 328,570
(9.88%) users connected their Twitter accounts with Pin-
terest. Less than 4% (132,553) users had connected both
Facebook and Twitter, while 12.3% (409,399) users had con-
nected neither. Further, we found that 86,641 (26.36%) out
of 328,570 users had identical usernames on Twitter and
Pinterest. However, only 5,419 (5.02%) out of 107,910 users
had identical usernames on Facebook and Pinterest. Two
hundred and ninety seven (0.22%) users had identical user-
names on all three networks. Analysis of usernames for the
same user on various social networks can be useful for iden-
tity resolution across multiple OSNs [19].
5.1.3 Connections and popularity
The maximum number of followers for a user was found
to be 11,992,745 (as of January 2013). Table 1 lists the
description, number of followers and followees for the top
10 most followed users on Pinterest (to maintain users’ pri-
vacy, we do not mention usernames anywhere). The aver-
age number of followers per Pinterest user was found to be
approximately 176, as compared to 208 followers per Twit-
ter user [38]. With only one-tenth the number of users as
Twitter, this average number of followers depicts that the
Pinterest network is very-well connected.
Followers Followees Interests / Profession
11,992,745 149 Designer / Blogger / Food
9,099,998 143 Designer / Magic / Food
8,056,723 1,176 Interior Designer
7,519,854 205 Not Mentioned
6,004,793 1,106 Lifestyle Blog
5,023,007 242 Beauty Enthusiast / Blogger
4,793,914 310 Architecture Student/Blogger
4,409,097 66 Not Mentioned
4,126,895 1,001 Artist
3,658,844 383 Freelancer / Blogger
Table 1: Top 10 user profiles on Pinterest based
on number of followers (as of January, 2013). The
table also shows number of followees for users, and
interests / profession as captured from the about
field.
We then plotted the ratio of number of followers versus
the number of followees for all users (except for the users
with 0 followees) on a log scale as shown in figure 4(a), and
found that more than 70% users had more followees than
followers. The graph depicts that a very small fraction of
users had this ratio skewed, and most users on Pinterest
in our dataset had a comparable number of followers and
followees. Krishnamurthy et al. [23] found a similar relation
between followers and followees for Twitter users.
From the 328,570 users who had connected their Twitter
accounts with Pinterest, we extracted the number of Twitter
followers and followees for 93,659 users. We then plotted
the ratio of followers / followees for these users for both,
Pinterest and Twitter, on a log scale, as shown in figure 4(b).
As the plot suggests, the ratio of followers / followees on
Pinterest was weakly correlated with the ratio of followers /
followees on Twitter (correlation = 0.32). Users who were
popular on Pinterest were not necessarily popular on Twitter
(and vice versa).
5.1.4 Gender distribution
We extracted gender information from Facebook profiles
of over 1.85 million users who had linked their Pinterest pro-
files with Facebook. Over 1.61 million users (87.15%) were
females, and only 130,945 users (7.04%) were males. The
rest (5.81%) did not have their gender information publicly
available. This gender distribution is quite similar to the one
observed by Ottoni et al. in their work on Pinterest [32].
5.2 Pin characterization
5.2.1 Pin description
To understand the most common type of pins on Pinter-
est, we extracted the textual content present in the “pin
description” fields from all the pins, and analyzed the most
frequently occurring terms. Figure 4(c) represents the tag
cloud of the top 100 terms present in pin description. Similar
to user descriptions, terms related to food and creative arts
dominated the pin description. Other than food, decoration
and wedding related pins were also found to be very com-
mon in pin description. For example, “Printable Snowflake
Wedding Invitations”, “Silk Bride Bouquet Peony Flowers
Pink Cream Lavender Shabby Chic Wedding Decor. $94.99,
via Etsy.”, “Wedding dresses and bridals gowns by David
Tutera for Mon Cheri for every bride at an affordable price
Wedding Dress Style”, “Vintage Wedding Decorating Ideas”.
5.2.2 Statistics and topical analysis
From our dataset of over 58 million pins, the average num-
ber of pins per user was 444.86 (min = 0, max = 100,135).
The average number of repins per pin was found to be 0.72
(min = 0, max = 20,212). Almost 79% pins in our dataset
never got repinned. The average number of likes per pin was
0.21 (min = 0, max = 5,640). Also, 90.32% pins were not
“liked” by anyone. This low percentage of repins and likes
shows that there is a limited set of pins that get popular,
and that a majority of pins go unnoticed. In case of com-
ments, the results are even more skewed compared to pins.
The average number of comments on a pin was 0.0065 (min
= 0, max = 3,345), and 99.53% pins had no comments. This
shows lack of utility of the comment feature on Pinterest.
To get an insight about the content of these comments, we
randomly crawled 643,653 (1.1%) pins from our pin dataset,
and were able to extract 2,544 comments. We then applied
the Linguistic Inquiry and Word Count (LIWC) tool [34] on
these comments, pin descriptions (Section 5.2.1), and user
profile descriptions (Section 5.1.1). We found that a large
portion of the comments reflected positive emotion (Fig-
ure 5). A similar pattern of positive emotion was observed
for user description, as well as board names. In general, the
network was found to have a large fraction of social content
suggesting active human interaction. Presence of sad emo-
tion, anger, anxiety, and swear words was found to be min-
imal. Textual content depicting biological processes, work,
and leisure activities was also found in substantial quantity.
From all this analysis, we conclude that a user usually leaves
a positive remark for a pin on Pinterest, and posts positive
textual content in general.
5.3 Source Analysis
Each pin has a source embedded in it. This source is the
original URL of the image from where it is “pinned”. 5How-
ever, if the user has directly uploaded an image to Pinterest,
the source field is set as “pinterest.com”. Table 2 shows that
the top source for images on Pinterest is the users them-
selves, i.e. a large portion of images are directly uploaded
and pinned by the users. Out of all the pins in our dataset,
2,768,851 pins (4.7%) were uploaded by users, second spot
was taken by Google, which included images from Google
5Example of an image source:www.cookingchanneltv.com/
recipes/spanish-tortilla-recipe/index.html
(a) (b) (c)
Figure 4: (a) Followers / followees for the users on Pinterest, on a log scale. (b) The follower / followee ratio
on Pinterest had no correlation with the ratio on Twitter. (c) Pin description on Pinterest. Similar to user
descriptions, pin descriptions were also dominated by terms related to food and creative arts, and partially
overlapped with terms present in user descriptions.
Figure 5: LIWC analysis of textual content on Pin-
terest. Majority of the content comprised of positive
sentiment words, or words indicating social interac-
tions.
Image Search, and other Google products, followed by Etsy,
at the third spot. Not surprisingly, free image sharing plat-
forms dominated the top 10 sources. Six out of the top 10
sources on Pinterest were among the top 1,000 most visited
websites in the world [1]. Etsy, a commercial website being
ranked high, shows that a reasonable amount of user traffic
on Pinterest comes from e-commerce websites, and depicts
that commercial activity is widespread on Pinterest.
Source Count W.A.R. Category
Pinterest.com 2,768,851 N/A N/A
Google 1,293,749 1 Search engine
Etsy 1,157,815 164 Commercial
Flickr 625,686 70 Image sharing
Tumblr 486,984 31 Image sharing
Imgfave 376,179 9,462 Image sharing
Weheartit 306,443 970 Image sharing
Someecards 296,908 6,648 E-cards
Houzz 294,065 958 Home decor.
Marthastewart 292,128 2,439 Food / Art
Table 2: Top 10 image sources on Pinterest.
W.A.R.= Worldwide Alexa Rank. Apart from
free image sharing / social network platforms, top
sources include commercial platforms like Etsy.
5.4 Pinboard analysis
In addition to the above Pin analysis, we also analyzed
the names of Pinboards. The most common terms occur-
ring in board names were home, style, recipes, food, wed-
ding, crafts, etc. Pinterest also provides an option with 33
different predefined categories for board creation. We ana-
lyzed the popularity of all these categories based on 3 factors,
number of boards in each category, number of pins on these
boards, and number of followers of these boards under each
category. We saw that 69.37% boards were created with
no standard category selected. Apart from these, the top
three categories for board creation were food drink (5.6%)
followed by diy crafts (2.3%), and hair beauty (2.4%). Fol-
lowers of boards in the “travel” category outnumbered all
the other boards by a big margin, and had the highest ratio
of followers per pin (23.69 followers per pin). The next most
famous boards in terms of followers per pin were education
(10.34 followers per pin), health fitness (5.37 followers per
pin), and home decor (4.71 followers per pin).
5.5 Location analysis
We investigated location information to find the Pinterest
population distribution across the world. From our dataset,
we collected 192,261 valid user locations, and performed a
lookup using Yahoo PlaceFinder API. We inferred the top 10
countries in terms of number of users (Table 3) from Yahoo’s
API output. Similar to Facebook and Twitter [23], a ma-
jority of Pinterest users also came from the U.S.A., Canada,
U.K., Brazil, India, and Europe. We found minimal users
from Africa, Russia, and China. Table 3 also lists Pinterest’s
regional traffic ranks taken from Alexa, on 2nd June 2013.
These ranks show that Pinterest is among the top most pop-
ular sites in countries like U.S.A., Canada, U.K., Australia,
Brazil, India, etc., which are also the top user locations in
our dataset. After analyzing country-wise distribution, we
did a city level location analysis for these top 10 countries
(Table 3), and found that most Pinterest users belonged to
big metropolitan cities. More than half of the cities in top
20 were from the U.S.A. Pinterest’s penetration was found
to be quite low in smaller cities.
As most Pinterest users in our dataset were females (Sec-
tion 5.1.4), we analyzed gender distribution with respect to
location. We observed that approximately 88% of users from
the U.S.A. were females, and approximately 7% were males.
A similar trend was observed in U.K., Australia, Europe,
and Brazil (Table 3). India was the only country in the top
Countries Cities
Country P.R.R. Females (%) Males (%) World City Count World City Count
1. U.S.A 15 83.88 8.80 1. New York 5597 11. Dallas 1275
2. Canada 21 82.73 10.66 2. London 3424 12. Austin 1249
3. U.K. 38 72.79 18.47 3. Los Angeles 3194 13. San Diego 1213
4. Australia 23 80.59 11.05 4. Chicago 2593 14. Houston 1169
5. Brazil 73 73.94 18.47 5. Toronto 1752 15. Sidney 1157
6. Spain 54 66.83 24.56 6. San Francisco 1659 16. Paris 1078
7. Italy 142 62.91 27.04 7. Atlanta 1472 17. Melbourne 1034
8. France 183 70.36 22.53 8. Washington 1428 18. Portland 1010
9. India 20 45.30 46.64 9. Seattle 1332 19. Vancouver 959
10. Netherlands 29 75.88 16.52 10. Boston 1329 20. Philadelphia 851
Table 3: Top 10 countries, and top 20 cities in decreasing order of Pinterest population. Apart from India, all
other countries were dominated by female users. The penetration of Pinterest is maximum in big metropolitan
cities. P.R.R.= Pinterest Regional Rank.
10, where the number of male users (46.64%) was greater
than the number of female users (45.30%).
5.6 Gender Prediction
As mentioned in section 5.1.4, we collected gender infor-
mation of over 1.85 million Pinterest users (130,945 males,
and 1.61 million females) from Facebook, who had connected
their Facebook accounts with their Pinterest profile. Con-
sidering this information as true, we attempted to predict
gender of Pinterest users on the basis of profile, network,
and content based features. For this experiment, we limited
our analysis to users from USA only. 6
5.6.1 Dataset and feature description
For gender prediction, our training dataset comprised of
6,309 male users, and 60,047 female users from the USA.
To maintain a balance between the class sizes for applying
machine learning, and achieve better confidence, we picked
up six random samples of 6,309 female users from the 60,047
total female users, and calculated an average accuracy over
all of them. A similar technique was used by Benevenuto
et al. while classifying spam on Twitter using unbalanced
training data samples [3]. We used a total of 9 features for
classification, as listed below:
1. Number of followers: The number of users who fol-
low a given user.
2. Number of followees: The number of users who, the
given user follows.
3. Number of pins: The number of pins pinned by the
given user.
4. Number of boards: The number of boards created
by the given user.
5. Content from “about” field: We extracted the top
1000 most frequently occurring terms in the “about”
field of male and female users’ profiles, and normal-
ized these frequencies with the number of users in their
respective categories (male / female). Each data in-
stance was then assigned a male-female ratio score as
6We picked USA, since it had the largest proportion of users
in terms of country-wise user distribution. See table 3.
follows:
About Ratiom/f =Pn
i=1 Wi×(NMWi)
Pn
i=1 Wi×(NFWi)
where
Wi=W ords in the about f ield
NMWi=N ormaliz ed f requency f or word Wif or males
NFWi=N ormaliz ed f requency f or word Wif or f emales
6. Board names: Similar to the previous feature, we ex-
tracted the top 1000 most frequently occurring board
names from male and female users’ profiles separately,
and normalized these frequencies with the total num-
ber of users in their respective categories. Each data
instance was then assigned a male-female ratio score
as follows:
Board Descm/f =Pn
i=1 Pi×(NMPi)
Pn
i=1 Pi×(NFPi)
where
Pi=Individual pinboard in user0s set of pinboards
NMPi=N ormaliz ed f requency f or pinboard Pifor males
NFPi=N ormaliz ed f requency f or pinboard Pifor f emales
7. Presence of a linked Twitter account with pro-
file: True, if the Pinterest user has connected his / her
Twitter account; false otherwise.
8. Presence of personal website: True, if the Pinter-
est user has mentioned a website in their website field;
false otherwise.
9. Name: We got a list of the most common male and
female first names in the US population during the
1990 census, 7and assigned a ternary integer score to
each data instance according to the user’s name being
present in the list of males, females, or both / none.
The last feature is independent of the Pinterest network.
We wanted to examine the performance of Pinterest-specific
features for predicting gender, as compared to features based
on only names; which is a completely independent feature in
itself. We performed all classification tasks using WEKA [14].
7http://names.mongabay.com/
5.6.2 Classification results
First, we attempted to predict gender using a feature set
F8of only the first 8 features, i.e. features extracted from
Pinterest. We applied 3 classifiers on our dataset of 12,618
users, and achieved a maximum average accuracy of 73.17%
with 10-fold cross validation using the J48 Decision Tree
classifier. To enhance the prediction accuracy, we introduced
the Name feature Fname to our feature-set. Note that this
feature completely relies on the name of the user, and is
independent of Pinterest. We were able to achieve a better
accuracy of 86.18% with the addition of this feature, using
the J48 Decision Tree classifier. However, using only the
Fname feature for classification, we still achieved an accuracy
of 83.64%. Table 4 summarizes the results.
Classifier Feature
set
Accuracy
(σ)
F-Measure
(σ)
NB
F8+Fname 62.96%
(5.85)
0.586
(0.100)
F860.88%
(5.41)
0.554
(0.096)
Fname 56.71%
(0.18)
0.533
(0.001)
J48 DT
F8+Fname 86.18%
(0.29)
0.861
(0.003)
F873.17%
(0.39)
0.732
(0.004)
Fname 83.64%
(0.27)
0.834
(0.003)
RF
F8+Fname 85.26%
(0.31)
0.853
(0.003)
F871.38%
(0.41)
0.713
(0.004)
Fname 83.64%
(0.27)
0.834
(0.003)
Table 4: Classification results for Naive Bayesian,
J48 Decision Tree, and Random Forest classi-
fiers. The accuracy and weighted average F-measure
scores are averaged over a labeled dataset of 6,309
male users, and 6 random samples of 6,309 instances
each, from 60,047 female users.
From the six random samples of training data we picked
for female users, the J48 Decision Tree classifier performed
the best across all the samples individually. We achieved a
maximum accuracy of 73.51% using F8(Pinterest-specific
features), 86.53% using F8+Fname (all 9 features), and
83.99% using only Fname; across all samples. Table 5 rep-
resents the confusion matrix for these results. As expected,
Fname was the most informative feature, followed by board
names, number of pins, presence of personal website, num-
ber of boards, content from about field, and presence of
linked Twitter account. The number of followers and fol-
lowees were the least informative features. Rao et al. [36]
achieved a similar score of 72.33% while predicting gender
of Twitter users, with the help of a rich feature-set using a
SVM classifier. Since the ratio of male to female users on
Pinterest is highly skewed [32] as opposed to Twitter (which
is fairly balanced 8), the size of our training dataset was
limited. We believe that with this limited training data, our
classification accuracy is reasonable.
8http://www.beevolve.com/twitter-statistics/#a1
Feature
set
Cls TP FP Precision Recall F-
Meas
F8+Fname
F 0.842 0.112 0.883 0.842 0.862
M 0.888 0.158 0.849 0.888 0.868
F8
F 0.76 0.29 0.724 0.76 0.742
M 0.71 0.24 0.748 0.71 0.728
Fname
F 0.724 0.044 0.943 0.724 0.819
M 0.956 0.276 0.776 0.956 0.857
Table 5: Confusion matrix representing true posi-
tive, false positive, precision, recall, and F-measure
scores for J48 Decision Tree classifier in the best
case. #F = Number of features; Cls = Class (Male
/ Female); TP = True Positive score; FP = False
Positive score.
These results imply that even though gender is not a pub-
licly accessible attribute on Pinterest, it is not difficult to
predict gender using a small number of other publicly avail-
able attributes.
5.7 Privacy and security issues
To study privacy implications of the public nature of Pin-
terest, we attempted to extract email addresses and phone
numbers from the publicly available user description field
from users’ profiles. We found that a total of 9,926 users in
our dataset shared their email addresses publicly. We then
searched for phone numbers, which are widely considered to
be PII [25], and found a total of 1,046 phone numbers and
/ or BBM pins from the users’ profile description field. Re-
search shows that it is possible for third-parties to link PII,
which is leaked via OSNs, with user actions both within
OSN sites and elsewhere resulting in privacy leakage [24].
A recent study also investigated the risks of sharing phone
numbers publicly on Facebook, and Twitter, and highlighted
the extent to which these phone numbers could be exploited
to gather much more private information about a user [18].
While various brands are using Pinterest for legitimate
commercial purposes by promoting their work through pin-
boards, Pinterest has also attracted spammers and malicious
users. With the growth in the number of users, there has
been a simultaneous growth in the number of spammers on
Pinterest. 9Numerous online scams have been reported in
recent times [7, 9, 20, 35], and Pinterest has taken measures
to solve this problem. 10 To get a better understanding
of the presence of spam and malware on Pinterest, we used
Google’s Safe Browsing API 11 to check for malicious source
URLs on the network. We analyzed the source URLs of a
random sample of 5.5 million pins from our pin dataset and
found 1,322 (0.024%) unique malware pins. Despite numer-
ous reported incidents of spam and malware, such low num-
ber suggests that the techniques deployed by Pinterest to
avert malware are indeed effective. Since we collected these
pins in January 2013, we wanted to check if the captured
malware continued to exist on Pinterest. We then crawled
these 1,322 pins again in May 2013 and observed that 33
of these pins no longer existed. It is hard to predict if the
users themselves deleted these pins, or Pinterest removed it.
We re-checked the source URLs of the 1,322 malware pins
in May 2013, and found that 223 source URLs no longer ex-
9http://mashable.com/2012/12/06/pinterest-spam-
accounts/
10http://blog.pinterest.com/post/37347668045/fighting-
spam
11https://developers.google.com/safe-browsing/
ist. Corresponding to the 1,322 malware pins, we identified
1,171 unique users from our dataset. Re-crawling these user
accounts in May 2013 revealed that 100 out of these 1,171
user accounts did not exist. This shows that other than re-
moving malicious content, Pinterest also take measures to
remove malicious user profiles.
6. DISCUSSION
In this work, we characterized the Pinterest social net-
work, and tried to predict it’s users’ gender using profile,
content, and network based features. We collected 17,964,574
unique user handles, 3,323,054 complete user profiles, 777,748
boards with their corresponding details, and 58,896,156 unique
pins with their related information, using Snowball sam-
pling [13]. Our analysis was based on a partial subgraph
of the Pinterest network, and suggests that Pinterest is a
social network dominated by “fancy” topics like fashion, de-
sign, food, travel, love etc. across users, boards, and pins.
A large part of the network was found to have a compara-
ble number of followers and followees. Only a small frac-
tion of people had large number of followers as compared
to followees and vice-versa. The largest contributors of con-
tent (images) on Pinterest were the users themselves, with
2,768,851 (4.7%) users uploading original content; the re-
maining content (95.3%) was pinned from pre-existing web
sources. Google Images, and Etsy followed as the next most
famous sources, from where images are pinned onto Pin-
terest. USA, Canada, and UK contributed the maximum
proportion of users, together accounting for over 73% of the
total Pinterest population.
We then focused our analysis on predicting gender of Pin-
terest users based on their profile, content, and network fea-
tures. Our labeled dataset consisted of a total of 12,618 user
profiles from USA, with equal distribution, and we were able
to achieve an accuracy of 73.17% using only Pinterest spe-
cific features. Addition of the “name” feature increased the
accuracy to 86.18%. Using only the name feature, we were
able to achieve an accuracy of 83%, which shows that adding
Pinterest specific features helps very little in predicting gen-
der of a USA Pinterest user.
Finally, we did some preliminary analysis to explore the
privacy and security implications associated with Pinterest,
and found multiple instances of publicly available PII leak-
age due to the all-public nature of Pinterest. We also found
presence of malware, and discovered that most of this mal-
ware continued to exist for at least 4 months; between our
two crawls of the network. Given that Pinterest is fairly new
in the social media fraternity, we suspect that the amount
of malware would only grow in the near future.
We picked the initial seeds for our data collection process
as the top 5 most followed users on Pinterest. We under-
stand that this technique suffers from bias, and the sample
taken is not completely random. We crawled only partial
sub-graphs for all the 5 seed users. Similarly, on the next
level of our BFS crawl, we crawled not more than 48 fol-
lowers for each user. Since there is not much prior work
on Pinterest, we do not have enough academic literature to
claim that our dataset is representative of the whole Pin-
terest population. However, the previous work by Ottoni
et al. [32], Gilbert et al. [12], and a report from Engauge,
a digital marketing agency [11], show similar gender distri-
butions for users, and similar topic distributions for boards
and pins as our dataset.
In future, we would like to perform a deeper analysis for
gender prediction on Pinterest. Our current feature set of
9 features can be expanded to accommodate features based
on natural language, content, profile attributes, and net-
work features. Users’ about field, and comments can be
utilized for this purpose. More network based features like
betweenness, and closeness centrality can also be explored.
We would also like to generalize gender prediction over the
entire geographic Pinterest population, rather than limiting
it to users from USA only.
To the best of our knowledge, this is one of the first at-
tempt to characterize Pinterest, and study its various com-
ponents in depth, on such a large scale. For this analysis,
we use profile information for only about 3.3 million users
from over 17 million unique user handles that we had in our
dataset. Our data collection process is still active, and we
would like to redo our analysis on the largest connected com-
ponent (LCC) of the complete Pinterest network. We would
also like to perform a more detailed analysis of image-spam,
and copyright violations on this network. Given that Pin-
terest has been the fastest growing social network in recent
times, it would be interesting to see if malicious users are
targeting Pinterest for spiteful purposes.
7. REFERENCES
[1] Alexa Internet Inc. Alexa: The web information
company. http:// www.alexa.com/ , 2013.
[2] K. K. Ana-Maria Popescu and J. Caverlee. Mining top
users in pinterest categories. UMAP, 2013.
[3] F. Benevenuto, G. Magno, T. Rodrigues, and
V. Almeida. Detecting spammers on twitter. In CEAS,
volume 6, 2010.
[4] J. D. Burger, J. Henderson, G. Kim, and G. Zarrella.
Discriminating gender on twitter. In EMNLP, pages
1301–1309. Association for Computational Linguistics,
2011.
[5] C. Carpenter. Copyright infringement and the second
generation of social media websites: Why pinterest
users should be protected from copyright infringement
by the fair use defense. Available at SSRN 2131483,
2012.
[6] S. Chang, V. Kumar, E. Gilbert, and L. Terveen.
Specialization, homophily, and gender in a social
curation site: Findings from pinterest. 2014.
[7] G. Cluley. Pinterest spam promotes acai berry diet.
http:// nakedsecurity.sophos.com/ 2012/ 04/ 02/
pinterest-spam-acai-berry-diet/ , 2012.
[8] J. Constine. Pinterest hits 10 million u.s. monthly
uniques faster than any standalone site ever -comscore.
http:// techcrunch.com/ 2012/ 02/ 07/ pinterest-
monthly-uniques/ , 2012.
[9] Consumer Threat Alerts. New wave of social scams
target pinterest users, mcafee warns.
http:// blogs.mcafee.com/ consumer/ consumer-threat-
notices/ new-wave-of- social-scams-target-pinterest-
users-mcafee-warns, 2012.
[10] C. Dudenhoffer. Pin it! pinterest as a library
marketing and information literacy tool. College &
Research Libraries News, 73(6):328–332, 2012.
[11] Engauge Insights. Pinterest: A review of social
media’s newest sweetheart.
http:// www.engauge.com/ assets/ pdf/ Engauge-
Pinterest.pdf , 2012.
[12] E. Gilbert, S. Bakhshi, S. Chang, and L. Terveen. I
need to try this?: A statistical overview of pinterest.
In CHI, pages 2427–2436. ACM, 2013.
[13] L. A. Goodman. Snowball sampling. The annals of
mathematical statistics, 32(1):148–170, 1961.
[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer,
P. Reutemann, and I. H. Witten. The weka data
mining software: an update. ACM SIGKDD
Explorations Newsletter, 11(1):10–18, 2009.
[15] J. He, W. W. Chu, and Z. V. Liu. Inferring privacy
information from social networks. In Intelligence and
Security Informatics, pages 154–165. Springer, 2006.
[16] R. Heatherly, M. Kantarcioglu, and
B. Thuraisingham. Preventing private information
inference attacks on social networks. 2009.
[17] Instagram Press Center.
http:// instagram.com/ press/ , 2013.
[18] P. Jain, P. Jain, and P. Kumaraguru. Call me maybe:
Understanding nature and risks of sharing mobile
numbers on online social networks. ACM COSN, 2013.
[19] P. Jain, P. Kumaraguru, and A. Joshi. @ i seek ‘fb.
me’: identifying users across multiple online social
networks. In WoLE, pages 1259–1268. IW3C2, 2013.
[20] I. Jelea. Scammers blur lines between pinterest and
facebook.
http:// www.hotforsecurity.com/ blog/ scammers-blur-
lines-between-pinterest-and-facebook-1283.html , 2012.
[21] K. Kamath, A.-M. Popescu, and J. Caverlee. Board
recommendation in pinterest. UMAP, 2013.
[22] K. Y. Kamath, A.-M. Popescu, and J. Caverlee. Board
coherence in pinterest: non-visual aspects of a visual
site. In WWW, pages 49–50. IW3C2, 2013.
[23] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps
about twitter. In WOSN, pages 19–24. ACM, 2008.
[24] B. Krishnamurthy and C. E. Wills. On the leakage of
personally identifiable information via online social
networks. In WOSN, pages 7–12. ACM, 2009.
[25] P. Kumaraguru and N. Sachdeva. Privacy in india:
Attitudes and awareness v 2.0. Available at SSRN
2188749, 2012.
[26] J. Lindamood, R. Heatherly, M. Kantarcioglu, and
B. Thuraisingham. Inferring private information using
social network data. In WWW, pages 1145–1146.
ACM, 2009.
[27] I. Lunden. There are now over 1 billion users of social
media worldwide, most on mobile.
http:// techcrunch.com/ 2012/ 05/ 14/ itu-there-are-
now-over-1-billion-users- of-social-media-worldwide-
most-on- mobile/ , 2012.
[28] G. Magno, G. Comarela, D. Saez-Trumper, M. Cha,
and V. Almeida. New kid on the block: Exploring the
google+ social graph. In IMC, pages 159–170. ACM,
2012.
[29] H. McCracken. Pinterest - the 50 best websites of 2011
- time.
http:// www.time.com/ time/ specials/ packages/
article/ 0,28804, 2087815 2088159 2088155,00.html ,
2011.
[30] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel,
and B. Bhattacharjee. Measurement and analysis of
online social networks. In IMC, pages 29–42. ACM,
2007.
[31] Nielsen. Social media report 2012: Social media comes
of age. http:
// www.nielsen.com/ us/ en/ newswire/ 2012/ social-
media-report-2012- social-media-comes-of-age.html,
2012.
[32] R. Ottoni, J. P. Pesce, D. Las Casas,
G. Franciscani Jr, W. Meira Jr, P. Kumaraguru, and
V. Almeida. Ladies first: Analyzing gender roles and
behaviors in pinterest. ICWSM, 2013.
[33] M. Pennacchiotti and A.-M. Popescu. A machine
learning approach to twitter user classification. In
ICWSM, 2011.
[34] J. W. Pennebaker, C. K. Chung, M. Ireland,
A. Gonzales, and R. J. Booth. The development and
psychometric properties of liwc2007. Austin, TX,
LIWC. Net, 2007.
[35] A. Pichel. Survey scams find their way into pinterest.
http:// blog.trendmicro.com/ trend labs-security-
intelligence/ survey-scams-find- their-way-into-
pinterest/ , 2012.
[36] D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta.
Classifying latent user attributes in twitter. In SMUC,
pages 37–44. ACM, 2010.
[37] Reuters. Start-up pinterest wins new funding, $2.5
billion valuation.
http:// www.reuters.com/ article/ 2013/ 02/ 21/ net-us-
funding-pinterest-idUSBRE91K01R20130221 , 2013.
[38] C. Smith. By the numbers: 16 amazing twitter stats.
Digital Marketing Ramblings
http:// expandedramblings.com/ index.php/ march-
2013-by- the-numbers-a-few- amazing-twitter-stats/ ,
May, 2013.
[39] C. Tang, K. Ross, N. Saxena, and R. Chen. What’s in
a name: A study of names, gender inference, and
gender behavior in facebook. In DASFAA, pages
344–356. Springer, 2011.
[40] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow.
The anatomy of the facebook social graph. arXiv
preprint arXiv:1111.4503, 2011.
[41] W. Xu, X. Zhou, and L. Li. Inferring privacy
information via social relations. In ICDEW, pages
525–530. IEEE, 2008.
[42] M. Zarro and C. Hall. Pinterest: Social collecting for
#linking #using #sharing. In JCDL, 2012, pages
417–418.
[43] M. Zarro, C. Hall, and A. Forte. Wedding dresses and
wanted criminals: Pinterest. com as an infrastructure
for repository building. In ICWSM, 2013.
[44] E. Zheleva and L. Getoor. To join or not to join: the
illusion of privacy in social networks with mixed
public and private user profiles. In WWW, pages
531–540. ACM, 2009.
[45] J. Zwelling. Pinterest drives more revenue per click
than twitter or facebook. http:
// venturebeat.com/ 2012/ 04/ 09/ pinterest-drives-
more-revenue-per-click-than- twitter-or-facebook/ .