Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
IDENTIFYING PROFESSIONAL PHOTOGRAPHERS THROUGH
IMAGE QUALITY AND AESTHETICS IN FLICKR ∗
Sofia Strukova, Rubén Gaspar Marco, José A. Ruipérez-Valiente, Félix Gómez Mármol
Department of Information and Communications Engineering
University of Murcia
Murcia (Spain)
{strukovas, ruben.gasparm, jruiperez, felixgm}@um.es
ABS TR ACT
In our generation, there is an undoubted rise in the use of social media and specifically photo and
video sharing platforms. These sites have proved their ability to yield rich data sets through the
users’ interaction which can be used to perform a data-driven evaluation of capabilities. Nevertheless,
this study reveals the lack of suitable data sets in photo and video sharing platforms and evaluation
processes across them. In this way, our first contribution is the creation of one of the largest labelled
data sets in Flickr with the multimodal data which has been open sourced as part of this contribution.
Predicated on these data, we explored machine learning models and concluded that it is feasible
to properly predict whether a user is a professional photographer or not based on self-reported
occupation labels and several feature representations out of the user, photo and crowdsourced sets.
We also examined the relationship between the aesthetics and technical quality of a picture and the
social activity of that picture. Finally, we depicted which characteristics differentiate professional
photographers from non-professionals. As far as we know, the results presented in this work represent
an important novelty for the users’ expertise identification which researchers from various domains
can use for different applications.
Keywords Artificial Intelligence
·
Photography Capabilities
·
User Expertise
·
Computational Social Science
·
Data-driven Evaluation ·Data Mining
1 Introduction
Nowadays, we observe the emergence of a wide range of online technology-mediated portals. They have proved their
ability to generate rich data sets through the users’ interaction, which can be used to perform a data-driven evaluation
of competencies and capabilities [Strukova et al.,2022a]. Across them, there is a wide group of photo and video
sharing platforms which are gaining indispensable popularity at the present time. This is explained by the fact that, in
accordance with a comprehensive survey, users have five primary social and psychological motives to use one of the
rising photo-sharing social networking services, which are social interaction, archiving, self-expression, escapism, and
peeking [Lee et al.,2015]. More than that, images and photos are powerful tools based on their potential impact on
people’s knowledge, attitudes, and perceptions regarding diverse topics [Hameleers et al.,2020]. In this way, the mobile
applications of most photo and video sharing platforms came onto the market at the right moment in the history of
technology and made them the dominant image-sharing social media in the second decade of the 21st century [Fung
et al.,2020].
Regardless of the high acceptance of photo and video sharing platforms throughout all segments of the world’s
population and their escalating use, there is no publicly available data set containing multiple data types and covering a
considerable fraction of users from these platforms. Besides, despite the fact that there already exists ample evidence of
vindicated methods used to measure the expertise of users across the group of sites that can be called content sharing
and consumption [Strukova et al.,2022b], there has not been much attention given to the in-depth exploration of photo
∗This is a pre-print version of the article.
arXiv:2307.01756v1 [cs.CY] 4 Jul 2023
Strukova et al.
and video sharing platforms that hold much potential to infer not only common metrics like popularity [Ding et al.,
2019a] but also a range of users’ competencies or capabilities. One of the most valuable skills to detect and explore in
this context is photography capability. As could be expected, photography skills are subjective and people often disagree
with each other on the matter of taste. This is due to the fact that it is hard to conclude which photo is the best in terms
of aesthetic and technical qualities. Since it is already not a trivial task for a person to identify technically sound and
aesthetically attractive pictures, it is even more complicated for a machine to evaluate the quality of a picture explained
by the fact that machines have to cope with noise in the picture represented by intensity levels, colour saturation,
lighting, compression, artefacts, etc. [Ding et al.,2019b]. Also, machines do not have prior knowledge and struggle to
understand some of the aspects of our world. As a good solution to the challenge of image preprocessing, Convolutional
Neural Networks (CNN) trained with human-labelled data hold the potential to fill this gap [Gyawali et al.,2020].
Data generated on the photo and video sharing platforms hold the potential to be used in various contexts. Across them,
we can highlight the possibility of the creation of pathways for learning about the user’s behaviour, general traits of Web
navigation and the ability to perform data-driven content analysis. More than that, this knowledge could be valuable
for informal learning focused on acquiring new attainments or competencies [Mehrvarz et al.,2021]. From another
perspective, online content can yield violence in the user community, which is considered one of the most important
problems of the 21st century [Dikwatta and Fernando,2019]. In this way, data generated online hold the potential
not only to infer valuable information about the users but also about vulnerabilities surrounding virtual life. Besides,
the data set from any photo and video sharing platform with multimodal data would be able to infer the photography
capabilities of users. This will open an opportunity to automatically detect good photographers on the Web and offer
personalised aesthetic-based photo recommendations.
In this work, we examine several photo and video sharing platforms and the existing studies focusing on analysing
data available across them. Based on these grounds and the encountered gaps, our first step was to create one of the
largest data sets available from the Flickr platform [Gaspar Marco et al.,2022]. We collected data from 27,538 users
who uploaded photos to Flickr in December 2021, specifically those who specified their occupation. Additionally, we
enriched the data set with features resulting from the automated analysis of the photos and their comments including
three Image Quality Assessment scores representing aesthetic and technical aspects of the photos. Also, we labelled the
data to indicate whether the user is a professional photographer. We are releasing the data set as part of this papers’
contribution. Thus, it is open sourced and is available in the following URL: [dat,2022]. Next, we propose our
method to infer if a user is a professional photographer or not based on self-reported occupation labels, which is a
novel contribution to the literature. Finally, to the best of our knowledge, this is the first time that characterisation of
professional and non-professional users is presented in any photo and video platform.
Accordingly, the first objective of the paper at hand was to create a data set focused on the Flickr photo and video
sharing platform with multimodal data including crowdsourced, user and photo features that would allow to answer the
following Research Questions (RQs) that we state next:
•
RQ1. What model is better to infer if a user is a professional photographer? One based on photo features
including aesthetics and technical quality scores, one based on the social network activity of the photographer,
or one based on crowdsourced features that represent the interaction of other users with the photo?
•
RQ2. What is the relationship between the aesthetics, the technical quality and the social activity of a given
picture?
• RQ3. What characteristics differentiate professional photographers from non-professionals?
The remainder of this paper is structured as follows. In Section 2, we focus on the background of our study uncovering
the subject of photo and video sharing platforms. In Section 3, we present our research methodology. We expand this
section by selecting the photo and video sharing platform and explaining the data collection process. Next, we depict
the final data collection and describe machine learning (ML) algorithms to identify professional photographers. Our
findings are outlined in Section 4, while we extend the results in Section 5. Finally, we draw our conclusions and future
research directions in Section 6.
2 Background
2.1 Photo & Video Sharing Platforms
The main goal of photo and video sharing platforms is to allow their users to share various multimedia content, including
photos and videos. Some of the platforms have built-in editing filters and organisation by hashtags and geographical
tagging. Most of these sites also include a social networking service permitting users to connect with each other
through comments or messages, browse other users’ content, share and receive feedback. In this way, some material
2
Strukova et al.
Photo &
Video portal
Foundation
year
Area
served
Languages
available
Registration to
browse/contribute Free version Number of
monthly users API Photo editing
features
Community of
photographers
Ability to
write comments
Flickr 2004 Worldwide 10 languages
incl. English ✗/✓1,000 photos Over 60
million ✓ ✓ ✓ ✓
500px 2009 Worldwide English ✗/✓7 photos per week
2000 photos N/A ✗ ✗ ✓ ✓
Instagram 2010 Worldwide 32 languages
incl. English Limited/✓Unlimited photos Over 2,000
million ✓ ✓ ✓ ✓
1x.com 2007 Worldwide English ✗/✓✗N/A ✗ ✗ ✓ ✓
SmugMug 2002 Worldwide English ✗/✓Up to 500MB N/A ✓ ✓ ✓ ✓
Pinterest 2010 Worldwide 37 languages
incl. English ✗/✓200,000 photos Over 430
million ✓✗✓ ✓
Table 1: Photo & Video Sharing Platforms Comparison
can be shared publicly or with pre-approved followers. In Table 1, we present a comparison of the leading photo
and video sharing platforms, namely, Flickr, 500px
2
, Instagram
3
, 1x.com
4
, SmugMug
5
and Pinterest
6
, across several
characteristics.
There were three fundamental points of comparison for our research: the number of monthly users, the access to an
Application Programming Interface (API), and the ability to write comments. We could not find the up-to-the-minute
number of active users per month in 500px, 1x.com and SmugMug, while among others, the most visited portal is
Instagram with its 2,000 million users per month, followed by Pinterest and Flickr with their 430 and 90 million users,
respectively. Finally, most of the portals that we explored offer an API, except 1x.com and 500px that shut down its
API access in 2018.
Flickr was a pioneer in online photo sharing and nowadays is one of the leading photo-sharing platforms worldwide
which attracts extensive research attention [Höpken et al.,2020]. Its users include diverse profiles of both professional
and amateur photographers who want to share their portfolios. In 2018, it was acquired by SmugMug, a paid photo-
sharing service. Similarly, SmugMug is characterised as a premium online photo and video sharing service business
which currently has material uploaded by amateur and professional photographers around the world [Erturk and
Obrutsky,2016]. 500px and 1x.com are also more suitable for serious cameramen and they offer an image-focused
design. On the contrary, Instagram is a social photo-sharing service launched in 2010 as an iPhone application fitting
for non-professional users. Its users can take and manipulate photographs by adding filters and frames that enhanced
the users’ experience. They can also share them online where other users can react by means of comments and “likes".
Instagram is bringing an opportunity to communicate experiences through both choice of photo subject and ways to
manipulate and present them [Weilenmann et al.,2013]. Lastly, Pinterest was launched in 2010 as a Web site where
users can save an image (known as a “pin") that they upload or find on a Web page onto a collection of these pins. A
more detailed description of these platforms can be found in [Strukova and Ruipérez-Valiente,2022].
2.2 Related work
There exist many studies disclosing the potential of Web portals to yield a significant amount of data, which can allow the
detection of potential experts. On the whole, the expertise finding is focused on detecting topical authority in a selected
topic in forums and question and answer websites (e.g., Reddit [Strukova et al.,2022b,Lim et al.,2017] or Quora [Patil
and Lee,2016]). In contrast, most of the research in this domain is centred on proficiency in different programming
languages, libraries or tools across portals highly related to the field of computer science such as GitHub [Saxena and
Pedanekar,2017] and StackOverflow [Constantinou and Kapitsaki,2016]. However, there is not much work done on
discovering artistic skills which are crucial to look at things from different perspectives and to remain competitive
globally [Wesołowski,2022]. We also did not find any study aiming to identify professional photographers through
image quality, aesthetics or any other photo-related features, thus both our novel data set [Gaspar Marco et al.,2022]
and our research in this study significantly contributes to the literature.
From another perspective, the vast majority of the studies are making use of single-mode data sources. For example,
Kantharaju et al. utilised clickstream data to trace player knowledge in educational games [Kantharaju et al.,2019]
and Pal et al. extracted textual data represented in questions and answers of users of a question and answer portal [Pal
et al.,2011]. On the contrary, very few researchers decided to employ multimodal data sources. One of the examples
of such an approach is [van Dijk et al.,2015] demonstrating the use of textual, behavioural and time-aware features
2https://500px.com/
3https://www.instagram.com/
4https://1x.com/
5https://www.smugmug.com/
6https://www.pinterest.com
3
Strukova et al.
in StackOverflow. The results of this work proved the utility of adding behavioural and time-aware features to the
baseline method with an accuracy improvement for early detection of expertise. Even though there is a clear trend in
using multimodal data, we did not find previous studies that operated various types of data in photo and video sharing
platforms.
Also, we saw a heightened interest towards photo and video sharing platforms which could be able to reflect important
information about users. A few studies are revealing that rich data sets from these portals could be used to explicitly or
implicitly perform a data-driven evaluation of diverse capabilities. For example, Pal et. al. presented a novel approach
to finding topical authorities in Instagram [Pal et al.,2016]. Their method is based on the self-described interests of the
follower base of popular accounts. Similarly, Purba et al. carried out an analysis of popularity trends and predictions on
Instagram, using a set of features acquired from users’ metadata, posts, hashtags, image assessment, and history of
actions [Purba et al.,2021]. In the analysis of popularity trends, engagement grade is used in comparison to respect the
lower engagement rate of users with a higher number of followers. It was found that image quality, posting time, and
type of image highly impact engagement rate. However, neither of these studies of Instagram focused on photography
capabilities as we do.
Finally, despite the enhancing relevance of photo and video sharing platforms and research across them, there are no
publicly available data sets that could be used for the exploration of users’ personality traits or capabilities. This is an
important gap existing in the current domain.
3 Methodology
In this section, we describe the methodology process of building a supervised learning model for the final goal of
professional photographers’ identification. First, we explain the photo and video sharing platform selection followed by
the description of its API service. Next, we give details on the feature engineering process. Then, we describe the final
data collection and the ground truth. Finally, we explain the ML models that we chose for the stated goal and evaluation
metrics to estimate their performance.
3.1 Methodology overview
To answer the RQs stated at the beginning of our study, we pursued the methodology process presented in Figure 1.
In the first step, we selected the photo and video sharing platform based on various metrics presented in Table 1. In the
second step, we downloaded the photos from the selected site. In the third, fourth and fifth steps, we obtained the user,
crowdsourced and photography features and ground truth in order to build the ML model in the sixth step. In the eighth
step, we chose the best model to infer if a user is a professional photographer or not based on self-reported occupation
labels. Next, in the ninth step, we explored the relationship between the aesthetics and technical quality of a picture and
its social activity. Finally, in the tenth step, we found the common characteristics of professional photographers and
non-professionals.
3.2 Photo and video sharing platform selection
Based on Table 1presented in the previous section, we can conclude that the most active photo and video sharing
platforms are Instagram, Pinterest and Flickr, being visited by 2,000, 430 and 60 million active users every month,
respectively. Flickr has a PRO service where a user can get unlimited storage, making it one of the cheapest hosting
sites around. To keep Pinterest running smoothly, the users can create up to 200,000 pins and 2,000 boards which is a
collection where users save specific pins. In contrast, Instagram allows its users to upload an unlimited quantity of
photos. Moreover, as discussed earlier, these three platforms offer API services that can be helpful while acquiring the
needed data.
Although Instagram has the huge privilege of having many registered users and the facility of uploading an unlimited
number of photos, for our study we see it as a disadvantage because it could be hard to find those users whose behaviour
on the website would correspond with the profile of professional photographers. Moreover, it does not allow uploading
original-sized photos. On the other hand, Pinterest is not focused on pictures taken by users themselves but rather on
drawings, paintings or artworks created on a computer. Therefore, we will focus on a photo and video sharing platform
Flickr as a proxy for the photography skills of users.
Flickr also differs from Instagram by providing online communities and other groups on numerous social media and
other platforms to improve customer relationships. Groups are a place to share ideas and photos with other like-minded
members. Some group administrators first have to approve the users’ request to join. Flickr offers its users to create
profiles with personal information, albums/photosets which are helpful to organise their photos and galleries/collections
4
Strukova et al.
Photo&Video
Sharing portal
selection
1
Downloading photos
(2,647,928 pictures
taken by 27,516 users)
2
5
3Obtaining user,
crowdsourced and
photography
features
Set of features
Building
ML model 6
Evaluating
ML model
7
Differentiating
professional
photographers
from non-
professionals
Deriving users'
self-reported
occupation
(ground truth)
4
Applying DL
models to obtain
aesthetic and
technical scores
Choosing the best
model to infer if a
user is a professional
photographer
8 - RQ1
Common
characteristics
Common
characteristics
10 - RQ3 Determining the
relationship between the
aesthetics and technical
quality of a picture and
its social activity
9 - RQ2
4,108
professional
photographers
Figure 1: Overview of the methodology to identify professional photographers in Flickr
to which they can add other users’ media. Flickr is also geared toward beginners and enables them to edit the photos
directly on the platform, such as adjusting brightness and contrast and applying various filters. There is also a concept
of photostream which is a collection of media files that solely belongs to a user (public – others can visit the profile and
see what the user uploaded, private – only the user and the list of permitted users will be able to view the content). All
users have a list of their favourite photos. There is also an ability to connect with other users. As a social photo-sharing
site, Flickr allows users to maintain a list of contacts. From the perspective of a registered user of Flickr, there are five
categories of people on Flickr: the user, the user’s family, the user’s friends, the user’s contacts who are neither family
nor friends, and everyone else [Yee,2008]. Statistics for a free account show the total number of views, favourites, and
comments it has. From another point of view, users can use tags to categorise and search for photos. There are several
ways to tag pictures, either one at a time or in batches. Flickr lets users add up to 75 tags to each picture including the
geotagging feature. Finally, every user can set a license representing the copyright permission for a given picture.
3.3 Flickr API – data collection
Flickr provides an API service which facilitates significantly the process of data collection. Primarily, we decided to
download the data of only those users who had filled the occupation field in their profile. This selection is explained by
the fact that we wanted to avoid bias derived from assuming their profession. Intending to obtain a representative and
comprehensive sample of the platform’s active users, we singled out those users who were sufficiently active during the
month of December 2021. We searched for all the photos of the month of December discarding screenshots and videos.
There were 225,590 users who uploaded photos in December 2021.
For the user selection, we discarded those users whose number of photos uploaded in December 2021 was equal to or
greater than 20% of their total activity, in order to filter out those users without a minimum activity on the platform. We
also filtered out the 5% of users from both ends of the distribution of total photos uploaded to avoid outliers. As a result,
the final number of users we selected is 151,468.
For the time limits reason, it was impractical to aim to extract the data from all the photos from all the users. Finally,
we downloaded all pictures of the selection of 27,538 users. The complete process of downloading data with Flickr API
is thoroughly explained in [Gaspar Marco et al.,2022] which also describes the full data collection process.
5
Strukova et al.
3.4 Feature engineering for the ML model
3.4.1 Deep learning models
Child states that the photographer has to pre-visualise, pre-produce and create an environment using not only selected
equipment, subject matter, props, and far more importantly, light [Child,2008]. Image quality can be affected by the
noise, the blur and the used technical requirements and equipment. From another perspective, the aesthetics of the
photo depend on the colours’ balance (their compatibility and what feelings they evoke), contrast (variance between
light and dark), lighting in general, camera to subject diagram, camera angle and height, meter readings of light ratios,
composition, subject choice and symmetry.
Despite the fact that evaluating these points might be hard for an ordinary user, some models perform well in this
regard. After exploring several surveys including a comprehensive performance evaluation of image quality assessment
algorithms [Athar and Wang,2019], we selected two algorithms based on their high performance and the ability to
rebuild them. Firstly, Neural Image Assessment (NIMA) – a deep CNN that is trained to predict which images a typical
user would rate as looking good (technically) or attractive (aesthetically) [Talebi and Milanfar,2018]. Other models
classify images as low/high scores while the NIMA model produces a distribution of ratings for any given image – on a
scale of 1 to 10, NIMA assigns likelihoods to each of the possible scores. Various functions of the NIMA vector score
(such as the mean) can then be used to rank photos aesthetically. The authors replaced the last layer of the baseline CNN
with a fully-connected layer with 10 neurons followed by soft-max activations. Baseline CNN weights are initialised by
training, and then an end-to-end training on quality assessment is performed.
Secondly, Photo Aesthetics Ranking Network with Attributes and Content Adaptation proposes to train a deep
convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics is directly
modelled in the loss function [Kong et al.,2016]. This model incorporates joint learning of meaningful photographic
attributes and image content information which can help regularise the complicated photo aesthetics rating problem.
This model returns ratings for any given image – on a scale of 0 to 1.
3.4.2 Comments preprocessing
To analyse the comments, we followed several data preprocessing steps for comments. First, we changed all the
comments to lowercase. Next, we replaced emojis in comments with their description codes with the use of the Python
Demoji library
7
. Moreover, we cleaned the comments from hyperlinks and non-alphanumeric text. Finally, we removed
empty comments (including those that consisted only of stop words). For every comment, we computed features further
explained in Section 3.4.3.
3.4.3 Description of the final data collection
The final data set used for further investigation consists of 2,647,927 pictures of 27,538 users. Each picture was
downloaded in a size such that the smallest side of the image measures more than 230 pixels because the NIMA model
takes images of 224x224 pixels size as input and Photo Aesthetics Ranking Network with Attributes and Content
Adaptation works with 227x227 pixels images.
We have grouped the features that we obtained into three families – photography, crowdsourced and user-author. The
photography feature set includes the following features:
1. Publication date – Number of days since the photo uploaded to Flickr.
2.
Update date – Number of days since the last update of the photo metadata (visits, favourites, comments, etc.).
3. Groups number – Number of groups in which the photo has been posted.
4. NIMA technical score – Technical score from NIMA model implementation.
5. NIMA aesthetic score – Aesthetic score from NIMA model implementation.
6.
Kong score – Aesthetic score from Photo Aesthetics Ranking Network with Attributes and Content Adaptation
implementation.
Crowdsourced features. These involve information or opinions from a group of people who submit their views via the
Flickr site.
1. Comments number – Number of comments written on the photo page.
7https://pypi.org/project/demoji/
6
Strukova et al.
2. Views number – Number of views the photo got.
3. Favourites number – Number of users who added the photo to the list of their favourites.
4.
Average polarity of the comments – computed with TextBlob. TextBlob a Python library for processing textual
data [Loria,2018].
5.
Average subjectivity of the comments – Average number of subjective words in posted answers computed with
TextBlob.
6.
Average readability of the comments, including two metrics indicating how difficult a passage in English is to
understand, such as the number of difficult words and reading time.
7.
Average entropy of the answer – A statistical parameter that measures how much information is produced on
average for each letter of a text in a language.
8. Average comment length – Character count of the comment.
The user feature set includes the following features:
1. Photos number – Total number of photos uploaded by the user to the platform.
2. Join date – Number of days since the user became the member of the forum.
3. Following number – Number of the users followed by the user.
4. Groups number – Number of groups to which the user belongs.
5.
Flickr PRO – The indication if the user has the paid membership Flickr PRO
8
. Flickr PRO provides advanced
statistics on photos and videos of the user. Also, it allows ad-free browsing on Flickr for the PRO user and
their visitors. Moreover, it permits unlimited uploads at full resolution and easily backup. Finally, the user can
establish detailed privacy settings for every photo.
Next, we aggregated crowdsourced and photo features by every user. As a result, for every user, there is a representation
of every feature in terms of a minimum, a maximum and an average value.
Finally, for a better understanding, as social activity features to answer the RQ2, we consider a minimum, a maximum
and an average values of the following variables – the number of comments and the number of favourites (how many
people added a picture to their list of favourites).
3.4.4 Ground truth
To collect the ground truth values, we firstly obtained the occupation self-indicated by the user. Based on it, we detected
if the occupation is related to photography. It was computed with the use of regular expressions in several languages
that use the Latin alphabet. The regular expression includes the following terms – “fot", “phot", “valokuv", “zdj
e¸
cie",
“dealbh", “bild", “grianghraf", “nuotrauk", “pictur", “myndin", “billed", “ljósmyndari", “ritratt". Accordingly, as a
ground truth value, we will consider those users who have the photography-related occupation. This being the case,
there are 4,108 users (≈15%) fulfilling this criteria.
3.5 ML model to identify professional photographers
Following the scope of our study, we will compare the performance of both interpretable and non-interpretable
classification techniques over the features mentioned in the previous section to fulfil the goal stated in Section 1. We
selected the two most interpretable classification techniques including a probabilistic and a logit models – Gaussian
Naïve Bayes and Logistic Regression (LR) and pit them against non-interpretable techniques including a bagging and a
boosting models – Random Forest (RF) and Gradient Boosting Classifier. We believe that our choice of algorithms
covers a wide spectrum of attribute-based learning approaches. Hence, we restrict our case study to the use of these
algorithms with the final goal of selecting the best model out of a set of classifiers with various feature representations.
We have trained our model applying a 10-fold cross-validation. Given the significant data imbalance, we have trained
the model to maximise the quality metric AUC, which takes into account the data imbalance. Moreover, we also report
F1-score and the accuracy of the model.
8https://www.flickr.com/account/upgrade/pro/
7
Strukova et al.
4 Results
4.1 RQ1. Professional and non-professional photographers
From Table 2, we can observe the fact that all the models – interpretable which include Gaussian Naïve Bayes and LR
and non-interpretable ones represented by RF and Gradient Boosting Classifier – perform in a similar way. The results
show that, in general, most competing algorithms were fairly accurate for our data set. The surprising observation is
that the accuracy score varies from 0.85 and reaches 0.92 in most tests of models and feature sets. This can be explained
by the fact that accuracy is a simple evaluation measure for binary classification and it is more suitable for matters
when the data are perfectly balanced. This is not the case in our study and it was explained previously in Section 3.4.4.
Consequently, we computed AUC and F1 scores which help us to observe the precision and the recall providing more
insight into the differences between classifiers.
The comprehensive comparison revealed that RF demonstrates the best performance when using the user and photo
features reaching an accuracy of 0.92, an AUC score of 0.73 and an F1 score of 0.89. Similarly, with RF, Gradient
Boosting Classifier was almost as successful as the approach with the best capacity. It showed just slightly worst results
if we compare the AUC and F1 measures of each model in every set of features. Nevertheless, its prediction with the set
including user features showed sufficiently good results indicating an accuracy of 0.92, an AUC score of 0.72 and an F1
score of 0.89. In this way, non-interpretable algorithms exhibited superior evaluation performance compared with the
rest of the contenders.
Regarding the comparison of interpretable models, among Gaussian Naïve Bayes and LR, the best-performing model is
LR with the photo features indicating an accuracy of 0.92, an AUC score of 0.68 and an F1 score of 0.88. From Table 2,
we note that the performance of Gaussian Naïve Bayes is less competitive compared to LR. However, even though these
algorithms achieve the lowest AUC score, they reach comparably high accuracy and F1 score. We also can observe that
combining sets of features does not always mean a clear increase in the evaluation ability than using each set of features
alone.
4.2 RQ2. The aesthetics and technical quality and the social activity of photos
To answer the question regarding the relationship between the aesthetics and technical quality of a picture and the
social activity of that picture, we first explored the correlation matrix of the social activity features and NIMA technical
score, NIMA aesthetic score and Kong score represented in Figure 2. As it is plain to observe, the technical and
aesthetic scores are highly correlated between themselves. As a case in point, the correlation between the average of
the Kong scores and the average of the NIMA aesthetic scores is 0.45 which indicates that they are strongly positively
correlated. As depicted in the figure, there are many variables that are relatively highly correlated with their different
representations (a minimum, a maximum and an average). Variables that represent different concepts are not positively
correlated with each other, with the exception of aesthetic and technical scores.
Then, we examined the performance of the best-performing model selected in the previous section – RF separately on
the social activity feature set and the features related to the aesthetics and technical quality of pictures. The results
reported in Table 3indicate that the algorithm has a better predictive power with the photo features that include NIMA
technical score, NIMA aesthetic score and Kong score reaching an accuracy of 0.92, an AUC score of 0.67 and an F1
score of 0.88. On the other hand, with the social activity set of features, Gradient Boosting Classifier shows a lower
AUC score equal to 0.6 but the same F1 score equal to 0.88. This means that despite the subjectivity of art, the aesthetic
and technical scores computed by CNN models are reliable.
4.3 RQ3. Common characteristics of professional and non-professionals
To answer this RQ, we aggregate the results of the prediction of the RF best performing model with user and photo
features. We computed average metrics per professional photographer and non-professionals for all user and photo
features that were explained in Section 3.4.3. The aggregation of these two types of users predicted by RF with their
most common characteristics and differences is shown in Figure 3. The model identified 974 users as professional
photographers (
≈
12%) and 7,279 users as non-professionals. It is noteworthy to mention that there are many more
non-professional users than other types of users. It is consistent with the distribution of ground truth presented in
Section 3.4.4 where we explained the class imbalance context.
In this Figure 3, we depict these two types of users and the metrics which differentiate them. We performed a multivariate
analysis of variance (MANOVA) to ascertain that the differences between these types of users are statistically significant.
This fact was confirmed by obtaining an
F
-value = 3,268 and
p
-value
≪
0.0. Thus, we can confirm that the two types
of users have statistically significant different characteristics. Also, we conducted an analysis of variance (ANOVA) for
8
Strukova et al.
Algorithm Feature set Accuracy AUC F1 score
Gaussian
Naïve
Bayes
Crowdsourced
features 0.86 0.51 0.85
User features 0.86 0.53 0.85
Photo features 0.85 0.52 0.85
Crowdsourced +
user features 0.92 0.5 0.85
Crowdsourced +
photo features 0.85 0.52 0.86
User +
photo features 0.86 0.53 0.85
All features 0.92 0.5 0.85
Logistic
Regression
Crowdsourced
features 0.92 0.57 0.89
User features 0.92 0.66 0.89
Photo features 0.92 0.68 0.88
Crowdsourced +
user features 0.92 0.53 0.88
Crowdsourced +
photo features 0.92 0.67 0.89
User +
photo features 0.92 0.63 0.88
All features 0.92 0.61 0.88
Random
Forest
Crowdsourced
features 0.92 0.58 0.88
User features 0.92 0.69 0.88
Photo features 0.92 0.68 0.88
Crowdsourced +
user features 0.92 0.53 0.88
Crowdsourced +
photo features 0.92 0.64 0.88
User +
photo features 0.92 0.76 0.89
All features 0.92 0.65 0.88
Gradient
Boosting
Classifier
Crowdsourced
features 0.92 0.61 0.89
User features 0.92 0.72 0.88
Photo features 0.92 0.7 0.88
Crowdsourced +
user features 0.92 0.57 0.89
Crowdsourced +
photo features 0.92 0.68 0.88
User +
photo features 0.92 0.69 0.88
All features 0.92 0.66 0.88
Table 2: Results comparison of Gaussian Naïve Bayes, Logistic Regression (LR), Random Forest and Gradient
Boosting Classifier models by accuracy, AUC and F1 score metrics in the set of features – crowdsourced, user, photo
and their combinations.
Algorithm Feature set Accuracy AUC F1 score
Random
Forest
Aesthetics and
technical features 0.92 0.67 0.88
Social activity
features 0.92 0.6 0.88
Table 3: Results comparison of RF model by accuracy, AUC and F1 score metrics in the aesthetics and technical quality
features and social activity features.
9
Strukova et al.
Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)
Figure 2: Correlation matrix between the social activity features and NIMA technical score, NIMA aesthetic score and
Kong score
each individual feature to see which of them are statistically different which are represented in Figure 3. The outcomes
of average photography technical and aesthetic scores show that photos of professional photographers got a higher
NIMA aesthetic score (4.86 versus 4.54), NIMA technical score (5.16 versus 4.78) and Kong score (0.55 versus 0.51).
Moreover, photos of professional photographers tend to be visited more often getting an average number of views equal
to 3,602 while photos of non-professional users get an average of 236 views. Besides, pictures of professional users
differ from another type of users by the average number of groups where they are published – 14 versus 4. Finally,
the number of users followed by professional photographers is fairly higher – 1,547 opposite 1,340 in the case of
non-professional users.
10
Strukova et al.
Professional
Photographer
(974 users)
Average NIMA
technical score
= 5.16
Average NIMA
aesthetic score
= 4.86
Average Kong
score
= 0.55
Average number
of views
= 3,602
Number of
following
= 1,547
Average
number of
groups = 14
Non-
Professional
Photographers
(7,279 users)
Average NIMA
technical score
= 4.78
Average NIMA
aesthetic score
= 4.54
Average Kong
score
= 0.51
Average number
of views
= 236
Number of
following
= 1,340
Average
number of
groups = 4
Figure 3: Aggregation of professional photographers and non-professional users with their most common characteristics
5 Discussion
In this section, we first discuss the obtained results. Then, we talk about the potential application of our study in real
scenarios. Finally, we raise the limitations of our work.
5.1 Obtained results
To summarise, the experiments conducted on the data set extracted from Flickr suggest that all competing interpretable
and non-interpretable algorithms (Gaussian Naïve Bayes and LR, RF and Gradient Boosting Classifier) provide
meaningful results. It is worth mentioning that studies examined in Section2.2 applied models for the task of expert
finding in technical fields or concise areas which showed better results. This can be explained by the fact that artistic
skills and specifically photography skills are ill-defined and there are no established methods and features to determine
and measure them. Also, based on the results presented in Table 2, we can notice that some feature sets are more
informational than others. For example, the results of using photo and user features are proving that these denominate
more meticulously professional users. This indicates that the Flickr data of users and photos can be used to identify
professional photographers and non-professional users based on self-reported occupation labels. Other researchers
can use our findings as a base to find more powerful models in order to strengthen the detection of experts in the
photography field.
After recognising the satisfactory performance of the above-mentioned algorithms, we focused on RQ2 questioning the
relationship between the aesthetics and technical quality of a picture and the social activity of that picture. The fact
that we did not see much correlation between social activity features and technical and aesthetic scores of the photos
was not unpredictable. It can be explained by the fact that many existing studies in the literature on image aesthetic
assessment are based on the data sets like A Large-Scale Database for Aesthetic Visual Analysis [Murray et al.,2012]
or Tampere Image Database [Ponomarenko et al.,2015]. These data sets were annotated with semantic and aesthetic
labels and rated by users unidentifiable to researchers. In this way, it is not clear that these annotators align with the
photography enthusiast community. Besides, aesthetic beauty is subjective based on the fact that the perception of the
beauty of the same picture can be different. Moreover, due to the social network features of the photo and video sharing
portal, the behaviour of users might prevail over aesthetics.
Ultimately, to answer the RQ3 in relation to the characteristics that differentiate professional photographers from
non-professionals, we aggregated the results of the prediction of RF by every user. Based on the statistically significant
features, we noticed that users differ by the average NIMA aesthetic score, the average NIMA technical score, the
average Kong score, the average number of groups, the average number of views and the number of the following users.
All these features are reasonably higher for professional photographers. The fact that photos from non-professional
users are less visited can be explained by two other variables – the average number of followers and groups. The
number of followers the users by default explains the number of clicks that their pictures could get. On the other hand,
to achieve that a picture is published in a group, the user should conduct some activity by uploading it there. Besides,
some groups require administration approval for the picture to be in the group. We can conclude that definitely this is
correlated with the technical and aesthetic scores which are higher for professional photographers.
11
Strukova et al.
5.2 Application in real scenarios
The results presented in this paper can find workability in the task of assessing the photography quality of users. The
automatic detection of professional photographers can be used in order to build more reliable photo and video sharing
platforms by establishing high standards of skills for users.
Moreover, our findings can be applied in different contexts apart from the stated identification of professional pho-
tographers. Through computer vision, we can detect inappropriate content. This is a relevant issue nowadays since
the proliferation of social media enables people to express their opinions widely online leading to the emergence of
conflict and hate. The lack of a universal hate classifier generalising various training sets and contexts was addressed
by [Salminen et al.,2020]. The authors developed a cross-platform online hate classifier which performs well for
detecting hateful comments across multiple social media platforms including YouTube, Reddit, Wikipedia and Twitter.
However, the data sets used for this study mainly include manually-labelled comments from these sites. We believe that
this work can contribute significantly to improving the coverage of the existing platform. More than that, in these latter
days, even the seemingly harmless meme can become a multimodal type of hate speech considered as a direct attack on
people based on ethnicity, religious affiliation, gender, etc. [Velioglu and Rose,2020].
More than that, our results can serve as a base for creating a platform which could offer personalised aesthetic-
based photo recommendations. This tool is already implemented in several portals such as the Netflix recommender
system Gomez-Uribe and Hunt [2016] and there is a need of extrapolating it to other platforms. It can help photography
websites better serve the needs of non-professionals and professional photographers Zhou et al. [2018]. Content-based
image search does not fully satisfy the needs of such users since they are usually not interested in content alone. Instead,
they are often looking for photos with certain photographic aesthetics, which may include monochromaticity, light
contrast, and style.
Another important topic that can be addressed with the help of this study is privacy issues. We can detect sensitive
places and photographs violating the community terms and conditions. On the one hand, most websites nowadays are
taking measures against spam messages and inappropriate content. However, every day malefactors are inventing new
ways of overcoming them. Also, not all the systems can detect photographs of which places can make it vulnerable.
5.3 Limitations
We should like to discuss the research gaps that arise today within the topic of the identification of professional
photographers on Flickr.
It is important to mention that our results prove the potential of ML models to be used in several domains. However, the
expected foundation of image quality and aesthetics was not proved for professional identification. We noticed a certain
level of correlation assumed from the MANOVA test but not enough to assure that pictures of professionals have higher
scores according to the models described in Section 3.4.1.
More than that, the ground truth for this study is based on the self-proclaimed occupation of the users. We believe that
there might be more characteristics of professional users that could be used as the base of the ML model.
Finally, it would be useful to repeat the study with other portals explored in Section 2.1. Flickr, being a large photo and
video sharing platform, could be not representative enough regarding the amount of not professional users.
6 Conclusions and Future Work
This work aimed to fill the gap of the lack of any open data set on photo and video sharing platforms. We provided a
significant contribution to the literature by collecting one of the largest labelled data sets on Flickr with multimodal data
including crowdsourced, user and photo features. From 225,590 users who uploaded photos in December 2021, we
filtered out users without a minimum activity on the platform and the 5% of users from both ends of the distribution
of total photos uploaded to avoid outliers. As a result, the final number of users we selected is 151,468. For the time
limits reason, we downloaded all pictures of the selection of 27,538 users. Based on these data, we addressed the
task of identification of professional photographers and non-professional users on Flickr. We used several feature
sets and tested four models on them and their representation. From interpretable classification techniques – Gaussian
Naïve Bayes and LR and non-interpretable techniques – RF and Gradient Boosting Classifier, RF showed the best
performance using user and photo features. Our results demonstrated that it is feasible to properly predict whether a
user is a professional photographer or not based on self-reported occupation labels. We also deduced that the technical
and aesthetic scores of the picture are not highly correlated with the social activity carried out in this picture. Finally,
based on statistically significant features, we draw the inference that professional photographers can be distinguished
12
Strukova et al.
from non-professional users by higher NIMA aesthetic score, the average NIMA technical score, the average Kong
score, the average number of groups, the average number of views and the number of the following users.
We will devote our future work to the models generalisation to detect professional photographers in other photo and
video platforms that we described in this study as sites holding the potential for this type of task. Moreover, we will be
expanding and replicating the study in other environments. Also, there is an assured need of performing the validation
of our findings, e.g., through manual labelling or through other platforms such as LinkedIn. Besides, following the
presented results, we would like to explore additional potential applications of this work, e.g., automatic detection of
good photographers on the Web.
References
Sofia Strukova, José A. Ruipérez-Valiente, and Félix Gómez Mármol. A survey on data-driven evaluation of competen-
cies and capabilities across multimedia environments. International Journal of Interactive Multimedia and Artificial
Intelligence, 2022a. doi: 10.9781/ijimai.2022.10.004.
Eunji Lee, Jung-Ah Lee, Jang Ho Moon, and Yongjun Sung. Pictures speak louder than words: Motivations for using
instagram. Cyberpsychology, behavior, and social networking, 18(9):552–556, 2015.
Michael Hameleers, Thomas E. Powell, Toni G.L.A. Van Der Meer, and Lieke Bos. A picture paints a thousand lies?
the effects and mechanisms of multimodal disinformation and rebuttals disseminated via social media. Political
Communication, 37(2):281–301, 2020. doi: 10.1080/10584609.2019.1674979.
Isaac Chun-Hai Fung, Elizabeth B Blankenship, Jennifer O Ahweyevu, Lacey K Cooper, Carmen H Duke, Stacy L
Carswell, Ashley M Jackson, Jimmy C Jenkins III, Emily A Duncan, Hai Liang, et al. Public health implications of
image-based social media: a systematic review of instagram, pinterest, tumblr, and flickr. The Permanente Journal,
24, 2020. doi: 10.7812/TPP/18.307.
Sofia Strukova, José A. Ruipérez-Valiente, and Félix Gómez Mármol. Identifying experts in question & answer portals:
A case study on data science competencies in reddit, 2022b.
Keyan Ding, Ronggang Wang, and Shiqi Wang. Social media popularity prediction: A multiple feature fusion approach
with deep neural networks. In Proceedings of the 27th ACM International Conference on Multimedia, pages
2682–2686, 2019a.
Ling Ding, Huyin Zhang, Jinsheng Xiao, Bijun Li, Shejie Lu, and Mohammad Norouzifard. An improved image mixed
noise removal algorithm based on super-resolution algorithm and cnn. Neural Computing and Applications, 31(1):
325–336, 2019b. doi: https://doi.org/10.1007/s00521-018- 3777-6.
Dipesh Gyawali, Alok Regmi, Aatish Shakya, Ashish Gautam, and Surendra Shrestha. Comparative analysis of multiple
deep cnn models for waste classification. 2020. doi: 10.48550/ARXIV.2004.02168.
Mahboobe Mehrvarz, Elham Heidari, Mohammadreza Farrokhnia, and Omid Noroozi. The mediating role of digital in-
formal learning in the relationship between students’ digital competence and their academic performance. Computers
& Education, 167:104184, 2021. ISSN 0360-1315. doi: https://doi.org/10.1016/j.compedu.2021.104184.
U Dikwatta and TGI Fernando. Violence detection in social media-review. Vidyodaya Journal of Science, 22(2), 2019.
Ruben Gaspar Marco, Sofia Strukova, José A. Ruipérez-Valiente, and Félix Gómez Mármol. Annotated flickr dataset
for identification of professional photographers. Data in Brief, 2022.
Annotated flickr dataset, 2022. URL http://dx.doi.org/10.17632/2nc8ytfw5x.1.
Wolfram Höpken, Marcel Müller, Matthias Fuchs, and Maria Lexhagen. Flickr data for analysing tourists’ spatial
behaviour and movement patterns: A comparison of clustering techniques. Journal of Hospitality and Tourism
Technology, 11(1):69–82, 2020. doi: 10.1108/JHTT-08-2017-0059.
Emre Erturk and Santiago Obrutsky. Multimedia storage in the cloud using amazon web services: Implications for
online education, 2016.
Alexandra Weilenmann, Thomas Hillman, and Beata Jungselius. Instagram at the museum: Communicating the
museum experience through social photo sharing. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems, CHI ’13, page 1843–1852, New York, NY, USA, 2013. Association for Computing Machinery.
ISBN 9781450318990. doi: 10.1145/2470654.2466243.
Sofia Strukova and José A Ruipérez-Valiente. Using online digital data to infer valuable skills for the modern workforce.
In Handbook of Research on New Media, Training, and Skill Development for the Modern Workforce, pages 89–109.
IGI Global, 2022. doi: 10.4018/978-1-6684-3996-8.ch005.
13
Strukova et al.
Wern Han Lim, Mark James Carman, and Sze-Meng Jojo Wong. Estimating relative user expertise for content
quality prediction on Reddit. In Proceedings of the 28th ACM Conference on Hypertext and Social Media, HT
’17, page 55–64, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450347082. doi:
10.1145/3078714.3078720. URL https://doi.org/10.1145/3078714.3078720.
Sumanth Patil and Kyumin Lee. Detecting experts on Quora: by their activity, quality of answers, linguistic characteris-
tics and temporal behaviors. Social network analysis and mining, 6(1):5, 2016.
Rohit Saxena and Niranjan Pedanekar. I know what you coded last summer: Mining candidate expertise from
GitHub repositories. In Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and
Social Computing, CSCW ’17 Companion, page 299–302, New York, NY, USA, 2017. Association for Computing
Machinery. ISBN 9781450346887. doi: 10.1145/3022198.3026354. URL
https://doi.org/10.1145/3022198.
3026354.
E. Constantinou and G. M. Kapitsaki. Identifying developers’ expertise in social coding platforms. In 2016 42nd
Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pages 63–67, 2016. doi:
10.1109/SEAA.2016.18.
Piotr Wesołowski. Enhancing architectural engineering students’ acquisition of artistic technical competences and soft
skills. Cogent Arts & Humanities, 9(1):2043997, 2022. doi: 10.1080/23311983.2022.2043997.
Pavan Kantharaju, Katelyn Alderfer, Jichen Zhu, Bruce Char, Brian Smith, and Santiago Ontañón. Tracing player
knowledge in a parallel programming educational game, 2019.
Aditya Pal, Rosta Farzan, Joseph A Konstan, and Robert E Kraut. Early detection of potential experts in question
answering communities. In International Conference on User Modeling, Adaptation, and Personalization, pages
231–242. Springer, 2011.
David van Dijk, Manos Tsagkias, and Maarten de Rijke. Early detection of topical expertise in community question
answering. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR ’15, page 995–998, New York, NY, USA, 2015. Association for Computing Machinery.
ISBN 9781450336215. doi: 10.1145/2766462.2767840. URL https://doi.org/10.1145/2766462.2767840.
Aditya Pal, Amaç Herdagdelen, Sourav Chatterji, Sumit Taank, and Deepayan Chakrabarti. Discovery of topical
authorities in instagram. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, page
1203–1213, Republic and Canton of Geneva, CHE, 2016. International World Wide Web Conferences Steering
Committee. ISBN 9781450341431. doi: 10.1145/2872427.2883078.
Kristo Radion Purba, David Asirvatham, and Raja Kumar Murugesan. Instagram post popularity trend analysis and
prediction using hashtag, image assessment, and user history features. Int. Arab J. Inf. Technol., 18(1):85–94, 2021.
Raymond Yee. Pro Web 2.0 mashups: remixing data and web services. Apress, 2008.
John Child. Studio Photography: Essential Skills. Routledge, 2008. doi: 10.4324/9780080926933.
Shahrukh Athar and Zhou Wang. A comprehensive performance evaluation of image quality assessment algorithms.
IEEE Access, 7:140030–140070, 2019. doi: 10.1109/ACCESS.2019.2943319.
Hossein Talebi and Peyman Milanfar. Nima: Neural image assessment. IEEE transactions on image processing, 27(8):
3998–4011, 2018.
Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. Photo aesthetics ranking network with
attributes and content adaptation. In European conference on computer vision, pages 662–679. Springer, 2016.
Steven Loria. textblob documentation. Release 0.15, 2:269, 2018.
Naila Murray, Luca Marchesotti, and Florent Perronnin. Ava: A large-scale database for aesthetic visual analysis. In
2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2408–2415, 2012. doi: 10.1109/CVPR.
2012.6247954.
Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel,
Kacem Chehdi, Marco Carli, Federica Battisti, and C.-C. Jay Kuo. Image database tid2013: Peculiarities, results
and perspectives. Signal Processing: Image Communication, 30:57–77, 2015. ISSN 0923-5965. doi: https:
//doi.org/10.1016/j.image.2014.10.009.
Joni Salminen, Maximilian Hopf, Shammur A Chowdhury, Soon-gyo Jung, Hind Almerekhi, and Bernard J Jansen.
Developing an online hate classifier for multiple social media platforms. Human-centric Computing and Information
Sciences, 10(1):1–34, 2020.
Riza Velioglu and Jewgeni Rose. Detecting hate speech in memes using multimodal deep learning approaches:
Prize-winning solution to hateful memes challenge, 2020.
14
Strukova et al.
Carlos A. Gomez-Uribe and Neil Hunt. The netflix recommender system: Algorithms, business value, and innovation.
ACM Trans. Manage. Inf. Syst., 6(4), dec 2016. ISSN 2158-656X. doi: 10.1145/2843948. URL
https://doi.org/
10.1145/2843948.
Yu Qing Zhou, Ga Wu, Scott Sanner, and Putra Manggala. Aesthetic features for personalized photo recommendation,
2018.
15