ChapterPDF Available

Application of Machine Learning in the Social Network

Authors:
  • Indian Insititute of Technology Jodhpur

Abstract and Figures

This chapter provides a survey of different metaheuristic machine learning algorithms used for various interesting research problems in the domain of social networks and big data. It illustrates the flow of content from social media to a big data storage system and the analysis by machine learning and natural language processing. The chapter discusses the regression‐based concepts and their application in social networks. It illustrates several applications, such as spam content classification, labeling data available in an online social network, medical data classification, human behavior analysis, and sentiment analysis. Sentiment analysis classifies the users' emotions from the text they share on social media and microblogging sites. The chapter provides a few examples where deep learning and evolutionary computing have been used to solve research issues in social networks. Community detection is one of the most important problems of social networks where evolutionary algorithms have been effectively used.
Content may be subject to copyright.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm
k
k k
k
Author Queries
AQ1 Please provide Table 4.1 citation.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 61
k
k k
k
61
4
Application of Machine Learning in the Social Network
Belfin R. V.1,E.GraceMaryKanaga
1, and Suman Kundu2,3
1Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences,
Coimbatore, India
2Department of Computer Science and Engineering, Indian Institute of Technology, Jodhpur, India
3Department of Computational Intelligence, Wroclaw University of Science and Technology, Wroclaw, Poland
4.1 Introduction
Social media platforms have become an integral part of day-to-day life for a majority of
the world’s internet users. People tend to get more erudition from social media. Apart from
information, people can create content for social media to showcase their skills. An example
is the video resume, which professionals create and publish on social media to show their
presence. Content can take dierent forms such as images, text, emoticons, and videos.
Since there are not many limits on content creation on social media, users generate a mas-
sive amount of data that shows all the characteristics of big data. This data can be used for
dierent analytical and predictive applications for business. Selling data through APIs for
business and educational purposes is also a business for many data giants. Structural Query
Language is not sucient to mine information from big data. It needs complex statistical
and machine learning (ML) approaches to glean information from this massive data. The
chapter provides a survey of dierent metaheuristic machine learning algorithms used for
various interesting research problems in the domain of social networks and big data.
4.1.1 Social Media
A critical entity of the World Wide Web is social media, which comes in dierent forms
including social blogs, forums, professional networks, picture sharing applications, social
gaming sites, chatting applications, and most importantly social networks. Social media is
mighty in the sense that estimates predict we will reach 3.02 billion monthly active social
media users by 2021. A forecast by Statista.com (2018) shows that China alone will have 750
million users by 2022 and India will have one-third of a billion users. On average, internet
users worldwide spend 135 minutes surng social media. This user density has resulting
in marketers promoting their products on social media in a new eld named social media
marketing or social digital media advertising. Recently, there has been a complete trans-
formation in the usage of social networking sites, switching from being used on personal
Recent Advances in Hybrid Metaheuristics for Data Clustering, First Edition.
Edited by Sourav De, Sandip Dey, and Siddhartha Bhattacharyya.
© 2020 John Wiley & Sons Ltd. Published 2020 by John Wiley & Sons Ltd.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 62
k
k k
k
62 4 Application of Machine Learning in the Social Network
computers to now being used more often on mobile devices. The social networking giants
like Facebook, Twitter, and many others give away their mobile applications to customers.
There are even location-based microblogging and many other services oered to their cus-
tomers through mobile applications.
4.1.2 Big Data
The amount of data generated by social networks and social media is unimaginable. It cov-
ers all four signicant features of big data, the so-called 4V’s. The 4V’s are volume, velocity,
variety, and veracity, and when present in generated social media data, the analysis on the
data becomes complex. Leaving the complex data as it is not a wise decision for the tech-
nology giants. These social media organizations have started analyzing this generated data
to give better prospects to their users. The users using these features are happy and excited
to see applications built on their data. The application users can personalize it and share
the personalized content with their friends on social media. To leverage the content gen-
erated on social media, branding and advertising departments of the top companies create
marketing plans and budgets accordingly. These companies also need to understand the
outcome of their advertisements, the preference of their customers, and even the negative
reviews. Since the amount of data is enormous, it is impossible to do the analysis manu-
ally. Information from the historical transactions and social media data is not enough for
the top ocials to decide on their future goals. The organizations have to stay ahead of the
competitors. Machine learning models come to the rescue to help top management make
decisions.
4.1.3 Machine Learning
Machine learning and AI are the important concepts in the current scenario. Much of the
human work will be replaced by machines. For example, in the future, bots will replace most
of the humans in the armed forces of a country. Restaurants can replace the waiters with AI
bots. Bots in restaurants are available in a fewrestaurants in now. There are machine learn-
ing approaches that can teach the bots to understand the environment and act accordingly.
Classication, clustering, regression, and deep learning are some of the models in machine
learning.
As shown in Figure 4.1 the machine learning algorithms can be divided into four
types, namely, supervised learning, unsupervised learning, semisupervised learning, and
Machine
Learning
Regression Classication Clustering Association Classication Classication ControlClustering
Supervised
Learning
Unsupervised
Learning
Semi-
Supervised
Learning
Reinforcement
Learning
Figure 4.1 Classification of machine learning algorithms
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 63
k
k k
k
4.1 Introduction 63
reinforcement learning. Supervised learning algorithms are used when the target variable
is continuous and categorical. Some use cases for supervised learning are regression anal-
ysis for housing price prediction and the classication of medical images. Unsupervised
learning algorithms are used when there is no target variable. Clustering in marketing
data for customer segmentation and market basket analysis or association rule mining
of a supermarket transaction data are the use cases of unsupervised learning algorithms.
Semisupervised algorithms can be used when the target variable in the data is categorical.
The text classication of news data and lane nding in GPS data using clustering are
some of the use cases of semisupervised learning algorithms. Reinforcement learning is
an advanced level of learning algorithm that learns the environment and acts accordingly.
Reinforcement learning can be implemented in the data when the target variable in the
data is categorical or there is no target variable. The use cases for reinforcement learning
are driver-less cars and optimizing the marketing cost of a business.
4.1.4 Natural Language Processing (NLP)
The amount of content generated by the users of social media is exponentially increasing.
The text data cannot be processed by a machine eciently like with other formats of data.
A machine needs to understand human slang and language to analyze the text content.
Natural language processing (NLP) helps machines understand human slang and language
in the text content generated on social media. The ow of content from social media to a
big data storage system and the analysis by ML and NLP are illustrated in Figure 4.2.
In recent times, machine learning and articial intelligence play a vital role in engaging
millions of social media users. Recent studies show that customers are more loyal to the
companies that respond to them promptly. Bots or machine learning programs automati-
cally understand the customers’ queries using NLP and respond to them then and there.
This advancement helps companies retain their customers and build stronger relationships
with them. The basic model of a social media chatbot is illustrated in Figure 4.3.
Big Data Storage
Big Data
generated from
social media
platforms is
stored in the
cloud for further
processing.
Data scientists
use ML and
NLP to tap
useful
information form
the big data.
Machine Learning and Natural
language Processing Engine
ML NLP
Social Media Platforms
Figure 4.2 Workflow of big data, machine learning, and social media
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 64
k
k k
k
64 4 Application of Machine Learning in the Social Network
NLP
Machine Learnin
g
Bot Logic
Messaging Platforms
APIs
Social Media Content
Figure 4.3 Chatbot schematic diagram
4.1.5 Social Network Analysis
Social network analysis (SNA) is a method of analyzing social relationships usually with
the concepts of networks and graph theory. In SNA the social actors are usually denoted
with nodes, and the relationships are denoted with edges of the graph. There are dier-
ent variants in these networks like directed, undirected, and weighted networks. In recent
times there have been multilayer representations to represent complex social structures.
Although graph theories were at the forefront of social network analysis (Beln et al., 2018;
Beln and Grace Mary Kanaga, 2018), there were attempts to use other theories like game
theory (Narayanam and Narahari, 2011) and granular computing (Kundu and Pal, 2015a;
Pal and Kundu, 2017; Kundu and Pal, 2018) to solve social network issues. This chapter is a
summary of the various applications and machine learning methods available in the social
network and big data literature.
This chapter has compiled classication methods and applications in section 4.2 followed
by the clustering methods and applications in Section 4.3. The regression-based concepts
and their application in social networks are discussed in Section 4.4. Finally, the application
of evolutionary algorithms and deep learning methods are discussed in the section 4.5.
4.2 Application of Classification Models in Social Networks
Classication divides whole content in to chunks of related content. Machine learning clas-
sication is done on date that has labels associated with it. For instance, say a user has a
massive number of emails in an inbox. Classifying those emails based on topics like work,
promotions, and social might help the user to prioritize his work. In this example, work,
social, and promotions are the labels. This process is similar to placing colored balls in the
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 65
k
k k
k
4.2 Application of Classification Models in Social Networks 65
right baskets with similarly colored balls. In social networks, there are several applications
where classication concepts are instrumental. This following section gives several appli-
cations, such as spam content classication, labeling data available in an online social net-
work, medical data classication, human behavior analysis, and sentiment analysis given
in the literature.
4.2.1 Spam Content Detection
The digital age has resulted in lots of strategies for businesses to market their products and
pump lots of money into their digital marketing. These marketing strategies have generated
lots of promotional content dispersed across social media. Most of the content that reaches
users is irrelevant to them. Separating relevant user information from the irrelevant infor-
mation is called spam content classication. Benevenuto et al. (2009) identies spam users
who spread impure information on YouTube using real YouTube users and content. Zhu
et al. (2012) proposed a method for spam content classication to solve the problems in
content and topology-based classication models. The data for the experiment was taken
from China’s largest social network, Renren.com. A work by Ahmed and Abulaish (2013)
proposed a statistical method to analyze and lter spam content in Facebook and Twitter
data. The algorithm proposed generates 14 generic statistical features to detect a user who
spreads spam content.
Gender classication in social media data is an important aspect for law enforcement,
target advertising, and other social-related problems. Alowibdi et al. (2013) proposed an
algorithm for classifying proles as male and female proles. The algorithm used ve fea-
tures to classify the gender. The features may be the color of the prole background picture
or the set of text used to post the content on social media. Li and Xu (2014) introduced a
rule-based classication system based on sociology concepts to identify and label emotions
in microblog posts. They used Chinese microblog post data for the experiment. SPADE
Wang et al. (2014) is a social media classier that classies spam and useful messages
across a social network. The proposed method is a generic solution for multiple social net-
works using cross-domain and associative classication. Bots in a social network create
unrealistic text and spread false information. The classication of human accounts and bot
accounts has been designed by Igawa et al. (2016). They use random forests and multi-
layer perceptron classiers to test their model in a set of scraped data related to the 2014
FIFA world cup. Tacchini et al. (2017) proposed a work that focuses on misinformation
detection in social networks. They used Facebook posts as the data for their experiment.
This method uses logistic regression and a Boolean crowd sourcing algorithm to build the
classier model.
4.2.2 Topic Modeling and Labeling
Topic modeling (TM) is one of the crucial areas of research in big data analytics. TM is a
process where the text content in the extensive data is summarized into specic groups. An
example of this method is the grouping of news content into sports, economics, and pol-
itics. This section contains a brief discussion of the literature available in topic modeling
and labeling in social network data. Tuulos and Tirri (2004) used social network chat room
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 66
k
k k
k
66 4 Application of Machine Learning in the Social Network
data. They tried to break up the dynamic nature of the chat data and model it into topics.
Location annotation is a critical method to group locations. Ye et al. (2011) uses a support
vector machine (SVM) classier method to annotate and tag locations. Finally, it catego-
rizes the location as various categories. SocioDim by Tang and Liu (2011) works on the
classication model for social media by considering the heterogeneity of the social network.
Wanichayapong et al. (2011) worked on topic modeling with trac congestion data from
social media and broke it into two categories such as point and link. McAuley and Leskovec
(2012) tried to nd the inter-dependencies between the images considering the metadata
of the images on social media. They also considered the social community of the user who
created the content below the image: the data for the work generated from the comments
section below the image and the person who uploaded the image and his friends’ networks.
In addition, social media users add their dining, shopping, and other preferences on social
media. This generated content can help marketing experts recommend products to the
users. The approach in Song et al. (2013) is an iterative learning-based classier that learns
each user’s content and classies them in dierent user buckets. The algorithm also under-
stands the user’s friends content and provides a personalized recommendation. Customer
churn prediction is another important aspect in business. Churn analysis will forecast the
loyalty of customers. Verbeke et al. (2014) used real telecommunication datasets to predict
customer churn. The algorithm uses a combination of relational and nonrelational classi-
cation models to predict the churn. Emails are an important part of everyone’sprofessional
life. Classication in emails can be done to separate spam emails from the critical emails and
to classify the subject of the mail content. Alsmadi and Alhami (2015) proposed a method
using n-grams to classify spam emails in English and Arabic. Nowadays many users of the
internet have accounts on multiple online social network sites. The work in Peled et al.
(2016) developed a classier to match the entities between online social network accounts.
They used the data collected from Facebook and Xing to experiment with their classier.
Himelboim et al. (2017) classies Twitter tweets by using the information in the text and the
patterns visible in the network. The authors used the density, modularity, centrality, and the
faction of independent users in the network to build the classication model. The previous
works are centered around the users and not on the entire network structure. Adverse drug
reactions are considered to be one of the determinants of mortality in the medical eld. The
work in Yang et al. (2015) classies the experiences shared by doctors and the victims on
social media, micro-blogging sites, and forums. Finally, the data will be classied to form a
drug reaction database.
4.2.3 Human Behavior Analysis
This type of classication methods analyzes the data and groups the users according
to the user’s behavior in online social networks. An example of this human behavior
analysis is grouping the user’s gender using their behavior in online social networks. This
section will summarize state-of-the-art literature that classies human behavior in social
networks. Eleta and Golbeck (2014) classify patterns of communication on Twitter while
considering its multilinguistic nature. The work resulted in understanding the global reach
of social media and the ow of multilingual communication in social networks. This work
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 67
k
k k
k
4.3 Application of Clustering Models in Social Networks 67
also studied how multilinguistic users of Twitter mediate information sharing from a
dierent language. User personality classication is an essential aspect for a criminology
department and also for business. The work on user personality classication done by
Lima and de Castro (2014) takes the group of text shared by the user and learns it using
dierent machine learning approaches like naïve Bayes, support vector machine, and
multilayer perception neural network. Bayot and Gonçalves (2018) classify the age and
gender of the users in a social network using deep-convolutional neural networks (CNNs).
Epilepsy is a brain disorder commonly correlated with abnormal cortical and subcortical
functional networks. Zhang et al. (2011) use functional MRI data that is classied with the
help of social network analysis theories to nd this epilepsy.
4.2.4 Sentiment Analysis
Sentiment analysis classies the users’ emotions from the text they share on social media
and microblogging sites. An example of this might be classifying the happy, neutral, and
unhappy customers from the feedback data. This method aims to understand the con-
tent generated by the user and decide its emotion with computation or statistical methods.
Web technology is the most signicant technological advancement from the past decade.
It changed the way people think and the way they purchase items. Lo and Potdar (2009)
discussed opinion mining and sentiment analysis from the feedback data generated by
users for e-commerce products. Batool et al. (2013) analyzed the Twitter data to understand
the emotion of each tweet. The algorithm proposed includes a synonym binder module
and a knowledge enhancement module to classify and summarize the tweets. Sentiment
analysis and classication on Facebook status data was done by Akaichi (2013). This algo-
rithm builds sentiment lexicons based on the emoticons, interjections, and acronyms to
classify the sentiments in the status text. Vázquez et al. (2014) explains the recent trend
among e-commerce customers to look at the feedback of other people to help them decide
whether to buy the product. This work classies the microblog posts based on the reviews
posted by users. (Burnap et al., 2015) experimented with suicide-related communication
using machine learning classication methods from the Twitter data. This proposed work
classies the text that refers to suicidal contents using the lexical, structural, emotive, and
psychological features extracted from Twitter posts.
4.3 Application of Clustering Models in Social Networks
Clustering is the concept of automatically nding subgroups from massive data. In a social
network the same idea can be called community detection. There are many related works
that talk about community detection (Beln et al., 2018), (Beln and Grace Mary Kanaga,
2018) in social networks. Grouping methods can be utilized for many applications. One
example of clustering a real graph data word’s adjacency (Fortunato, 2010) is depicted in
Figure 4.4. The dataset is the adjacency network of popular adjectives and nouns in the
book David Coppereld by Charles Dickens. Some of the other applications of clustering
mentioned in the literature are discussed next.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 68
k
k k
k
68 4 Application of Machine Learning in the Social Network
Figure 4.4 Clustering in the network data using a word adjacency dataset
4.3.1 Recommender Systems
More people are traveling today than ever before, and they often use recommendations
from blogs and forums. Since one person generates the content on a microblogging site, the
recommendations might not be the best for each traveler. The recommender system needs
a learning engine that provides the best recommendation aggregated from the content of
multiple travelers. Cenamor et al. (2017) designed a system that takes previous data, clus-
ters it into a daily travel plan, and makes personalized recommendations to the user. Chen
et al. (2017) proposed a new recommender system that suggests clustered urban functional
areas with the help of collected building-level social media data. The proposed work was
implemented in the Yuexiu District, Guangzhou, China with the K-values 2 and 4. These
recommender systems can be used for urban planning for smart city projects. Feng et al.
(2015) proposed a personalized movie recommender system that uses the community to
recommend a movie. The community detection in the proposed work is done based on asso-
ciation rule mining. This recommender system was tested with the MovieLens and Netix
datasets.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 69
k
k k
k
Table 4.1 Summary Classification Applications
AQ1
Reference Problem Dataset Data Type Classification Type
Ye et al. (2011) Text annotation Facebook check-in data Human/Social Binary support vector
machine (SVM) classier
Song et al. (2013) Personalized recommendation Sina Weibo Human/Social Gradient descent learning
Batool et al. (2013) Synonym and knowledge
enhancement
Twitter data Human/Social Domain-specic learning
Ahmed and Abulaish (2013) Spam ltering Facebook and Twitter data Human/Social Nave Bayes, Jrip, and J48
Akaichi (2013) Complexities in conveyed texts Facebook status data Human/Social Support vector machine
(SVM) and naive Bayes
Li et al. (2014) All traditional models use
statistical methods
Chinese micro-blog posts Human/Social Rule based
Vázquez et al. (2014) Costly sentiment analysis English and Spanish social
media data
Human/Social Rule based
Lima and de Castro (2014) Omission of social media
metadata
Twitter data Human/Social Nave Bayes, SVM, multilayer
perceptron neural network
(Yang, Kiang, and Shang 2015) Adverse drug reactions
(ADRs)
Medhelp website Web text data Latent Dirichlet allocation
modeling
Igawa et al. (2016) Text bots 2014 FIFA World Cup data Web text data Random forests and
multilayer perceptrons
Tacchini et al. (2017) Misinformation classier Facebook data Human/social Logistic regression
Bayot and Gonçalves (2018) Gender classication Adience for age and gender Human/social Deep CNN
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 70
k
k k
k
70 4 Application of Machine Learning in the Social Network
4.3.2 Sentiment Analysis
Community detection plays a vital role in analyzing the eect of some real-world happen-
ings. Ou et al. (2017) examined the emotion of an event that occurred in the real world—the
proposed algorithm nds a community, detects the community emotion, aggregates the
community emotion, and detects any community emotion burst. In other cases, most peo-
ple are comfortable with the brand they use for a given product. Companies promote their
brands on social media to build their customer base. The brand community will enable
customers to know more about products and create a strong relationship with the customer
base. Habibi et al. (2014) proposed a model for an overlapping brand community that creates
a positive inuence and brand trust among the customers.
4.3.3 Information Spreading or Promotion
Information dispersion is one of the signicant areas in a social network analysis accord-
ing to Shaji et al. (2018). Information dispersion learns about the ow of information on a
social network. Social network information spreading is used in product promotion. Target
marketing is an area where the marketing is targeted to a group of individuals or com-
munity. Johnston (2017) proposed a theoretical model where social media can be lever-
aged by the statutory agencies to communicate to the community on social media. Sitter
and Curnew (2016) proposed an innovative model and described how social media can
be used by social workers to share YouTube videos with community members. Croitoru
et al. (2015) learned how to use the big data generated from social media after an event.
Their experiment was carried out with two real-world datasets from social media. The data
used includes the user-generated content and propagation data after the events Occupy
Wall Street in November 2011 and the Boston Marathon bombing in April 2013. Alsmadi
and Alhami (2015) proposed a method that clusters events on Twitter. The clustered events
will be spread across communities. Schirr (2013) proposed a method for community-based
learning and sharing educational information and curriculum development for classroom
training. Zhou et al. (2012) claimed that social network communication is community spe-
cic and not individual specic. Zhou et al. proposed a method named COCOMP that shares
a message with a community that is similar. Lakkaraju and Ajmera (2011) introduced a
community-based application that predicts the reach of a brand or content in the future.
Conover et al. (2011) experimented with the political aliation of Twitter users using a
hidden community structure. Ang (2011) proposed a model of a community with customer
relationship management (CRM) data to use customers to build products. The model sug-
gests the CRM phases as connect, converse, create, and collaborate. Ebner and Reinhardt
(2009) proposed a method to build a scientic community using the Twitter community.
4.3.4 Geolocation-Specific Applications
Epidemiology is the area of learning about disease outbreaks and the spreading process.
Community studies can help the health department to quickly nd the epidemic and the
path of a disease. Hossain et al. (2016) studied the 2014 Ebola outbreak in Africa and experi-
mented with how a social media community study can help to defend the spreading proac-
tively. Social media is an essential tool nowadays to report disasters, and social networks
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 71
k
k k
k
4.4 Application of Regression Models in Social Networks 71
also support rescue teams in locating aected people and areas. Bakillah et al. (2015) pro-
posed a method for geospatial clustering to spread information during a disastrous situa-
tion. The case study used for the work was Typhoon Haiyan in the Philippines, with data
from Twitter. Geolocation applications are good to work on because they are location spe-
cic. Atzmanstorfer et al. (2014) proposed a citizen-oriented and location-aware spatial
planning system. The social media users from the location-aware community can partic-
ipate in the discussion and brainstorm and implement various planning and functional
activities in and around their location. The case study used for this experiment was the
participatory land-zoning process in the Capital District of Quito, Ecuador.
4.4 Application of Regression Models in Social Networks
Regression is a well-known machine learning technique used for nding relationships
between independent and dependent variables in data. With people’s lives intertwined
with social networks, it is obvious that human emotions, behaviors, and sentiments
will depend their personal and organizational social networks. Over the past few years
scientists have been trying to gure out how one’s social network aects their personal
behaviors, emotions, performance, and other humanly attributes in relation to dierent
life activities. Regression analysis has been at the forefront of these scientic explorations.
Positional analysis of social networks started in the late 20th century and intensied in the
last decades due to the availability of technology that made data collection an easy task
for the researchers. Dierent interesting problems have been investigated by scholars with
regression analysis being used as the major instrument for studying social network data.
In this section, we provide a few examples of studies and show how regression analysis
facilitated the understanding of correlations between dierent aspects of human nature
and social network properties.
4.4.1 Social Network and Human Behavior
Human behavior is a complex output of their psychological and physiological states within
the individual and social contexts. Sometime one’s social network can aect their perfor-
mance in jobs whether individual performance or a group performance. A eld study was
conducted in 2001 by Sparrowe et al. (2001) with 190 employee in 38 dierent groups. These
190 employees are from 5 dierent organizations. The study was conducted over two social
networks between these members on an organization basis. One is an advice network, and
the other is a hindrance network. Using regression analysis, they showed that the indi-
vidual performance is positively and negatively related to the in-degree centrality score of
an individual in the advice network and hindrance network, respectively. Group perfor-
mance was also studied, and they found that the inuence of hindrance network density
is highly negatively signicant for group performance. Both of these networks were con-
structed from the informal relationships between two individuals in a group, and the data
was collected by interviewing all 190 participants. While the advice network was comprised
of the relationships through which employees share resources and information, the hin-
drance network was formed from the negative relationships such as interference, threat,
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 72
k
k k
k
72 4 Application of Machine Learning in the Social Network
sabotage, and rejections. In a similar way, Collins and Clark (2003) worked with a top man-
agement network of technological rms to study the eect on the performance of their rms
in terms of sales growth and stock returns. In this case, the social network was formed with
the top management and their internal and external contacts. Instead of a person-to-person
network, this network used the weighted links between the members of the top manage-
ment team with dierent internal departments such as sales and marketing, research and
development, etc., and external providers such as suppliers, nance institutions, customers,
etc. The weight of the links in these person-to-department networks was based on the num-
ber of contacts, time span of interactions, and intensity of their relations as reported by the
management team members through a survey. Hierarchical regression was performed to
nd the relationship between these networks and the rm’s growth.
Network measures used as the independent variables have included network size, net-
work range (Powell and Brantley, 1992; Scott, 2000), and the strength of ties (Granovetter,
1973). The regression results showed that the range and strength of an external network was
signicantly related to a rm’s sales growth and stock returns, but the size of the external
network had no signicant eect. On the contrary, the network size of an internal net-
work was signicant for the sales growth but not for the stock returns, while the range
of an internal network was signicantly related to the stock return but not with the sales
growth. In an interesting research work, Cimenler et al. (2014) tried to nd a correlation
between researchers’ social network matrices with the researchers’ citation performance.
In this work, they collected four dierent social networks of 100 researchers from the Col-
lege of Engineering at the University of South Florida. These networks included a personal
communication network, a joint grant network, a co-authorship network, and a joint patent
network. The H-index was taken as a dependent variable characterizing the citation per-
formance and seven dierent network measures. Specically, degree centrality, closeness
centrality, betweenness centrality, eigenvector centrality, average tie strength, Burt’s e-
ciency coecient, and local clustering coecient were taken as independent variables.
In addition to this, researchers’ demographic attributes such as gender, race, and depart-
ment were taken as input variables. With this massive attribute set, they ran a separate
Poison regression bi-variate model for each attribute obtained from four dierent social
networks. They found that degree, closeness, eigenvector, betweeneness centrality, average
tie strength, and local clustering coecient of co-authorship network have a statistically
signicant eect on citation performance. Degree, closeness, betweenness centrality, aver-
age tie strength, and eciency coecient of pettent network and only degree, closeness,
eigenvector and local clustering coecient of grant proposal network have a positive signif-
icance in citation performance. Interestingly for a communication network, only closeness
and eigenvector centrality had a statistically signicant eect on citation performance.
In the aforesaid paragraph we show how one person’s performance can be enhanced/
deducted due to their social position (centrality) in their personal and work social network.
Now we will see how perception within the social network can change their attitude toward
dierent events. Tucker (2014) studied an interesting phenomenon of human behavior
using regression. Tucker reported that when a person thinks that the social network plat-
form is honoring privacy by facilitating some software conguration, then they are more
prone to accept the personalized content even though every other parameter of the person-
alized content remains the same. The study was conducted over the social network platform
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 73
k
k k
k
4.4 Application of Regression Models in Social Networks 73
Facebook to see the user’s response to personalized advertisements and media from a few
NGOs. Fortunately, in the middle of the campaign, Facebook introduced a privacy control
on the platform. Regression analysis with the ad click-through before and after the intro-
duction of the privacy control showed a dierent pattern. It showed that people are more
responsive to the personalized content after the introduction of a privacy control. Hence, it
provides evidence of the idea that perception can really aect our responses in a network.
In another experiment, Paluck et al. (2016) experimented with 24,191 students of 56
schools to support the theory of human behavior that states that one’s behavior is adjusted
toward the societal normative. In this experiment, the students’ social network was formed
by surveying in the mentioned 56 schools. Then they selected a few students from ran-
domly chosen schools and trained them as an anti-conict squad. Linear regression was
performed over the data collected during one year of studies. The result of the regression
analysis shows a more than 30% reduction in per-student conicts in the schools where
seed students played the role of anti-conict agents. But what is more interesting is that
where the seed sets were chosen based on the socially referent, more reduction in conict
was visible. Another interesting problem of peer inuence on human behavior was studied
by Bapna and Umyarov (2015). This experiment was conducted over the large-scale online
music social network Last.fm. The network contains more than 23 million friendship links
and 3.8 million users. They scanned several snapshot and extracted the user subscription
data. In addition to this, several demographic information and social activity reports were
collected from the website. Logistic regression was performed with this massive data. It
prevails that once a person subscribes for a premium service, the chances of subscription
increase in the neighborhood. Thus, the peer inuence has a statistically and economically
signicant causal eect. In addition, the regression revealed that the strength of the peer
inuence is inversely proportional to the size of the friendship circle.
4.4.2 Emotion Contagion through Social Networks
Emotion contagion is an interesting research problem that states that human emotions such
as happiness, loneliness, and depression can be transferred from person to person. Evidence
has been found that two socially connected individuals have similar emotions. However,
the casual eect of this may be attributed to either contagion or homophily. Coviello et al.
(2014) conducted experiments with a massive amount of Facebook data to see whether
emotions diused through the friendship links in online social networks. Regression with
instrumental variable was used to determine the emotional contagions in the network. They
chose rainfall as the instrument and two dierent regressions were used to establish the
hypothesis that (i) rainfall is correlated to negative emotions in human beings and (ii) these
negative emotions diuse to other geographically distant friends through online social net-
works. Although they found proof of social contagions of emotions, the ratio of an indirect
to direct eect of rainfall was quite low compared to economical or political contagions.
A similar experiment with a massive amount of Facebook data was conducted by Kramer
et al. (2014). This experiment was conducted with 689,003 Facebook users. The control
experiments were done by reducing friends’ positive and negative emotional posts from
the users’ news feeds. Poisson regression was performed with the percentage of reduc-
tion as a regression weight. An interesting nding with this regression analysis was that
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 74
k
k k
k
74 4 Application of Machine Learning in the Social Network
omitting emotional content reduced the number of words the person subsequently pro-
duced irrespective of the type of emotions. Later they performed weighted linear regres-
sion to show that when positive content was omitted, then the negative emotional posts
increased whereas the positive decreased. The reverse was found to be true when negative
contents were omitted. Thus, they found that human emotion is contagious over the online
platform Facebook.
4.4.3 Recommender Systems in Social Networks
Recommender systems try to predict a user’s anity to a product or service based on either
the user’s or similar users’ past experiences (collaborative ltering) or the attributes of simi-
lar products (content-based ltering) in a social network. With the increase of social media,
network recommender systems have become more relevant in recent times.
Collaborative topic regression (CTR) (Wang and Blei, 2011) combines both of the tech-
niques to better recommend the topic more relevant to a user. Purushotham et al. (2012)
went a bit further and integrated CTR with the social matrix factorization model. This
takes advantages of the social relations of users into account. The main motivation for the
idea came from the fact that the social relations form between two users because they have
similarities. Thus, incorporating social correlations can improve the accuracy of the rec-
ommendations. They experimented with two real-world online social networks: the online
music station Last.fm and the online bookmark sharing platform Delicious. One of the chal-
lenges for correct recommeder systems was to identify the geographical location of the user.
McGee et al. (2013) worked in this direction to predict a user’s location based on the tie
strength.
A study was conducted on Twitter, and decision tree regression was used to improve pre-
diction. Very recently, Tacchini et al. (2017) worked on an interesting project where they
tried to answer the question “Can a hoax be identied based on the users who liked it?”.
The authors proposed a logistic regression-based technique to classify a post as a hoax from
user activities (likes) on that post. The experiment was conducted on a large amount of Face-
book data that was collected during 2016. A very interesting fact about the user activities is
that on average hoax posts have more likes than nonhoax posts.
4.5 Application of Evolutionary Computing and Deep
Learning in Social Networks
Deep learning is a growing machine learning technique. It’s a hierarchical learning tech-
nique that learns the structures inside the data. At its core, deep learning is a feed-forward
articial neural network with many hidden layers. On the other hand, evolutionary com-
puting is a family of global optimization techniques inspired by biological evolution. Both
of these tools have been used to learn and optimize social network data. In this section, we
provide a few such examples where deep learning and evolutionary computing have been
used to solve research issues in social networks.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 75
k
k k
k
4.5 Application of Evolutionary Computing and Deep Learning in Social Networks 75
4.5.1 Evolutionary Computing and Social Network
One of the rst attempts to use genetic algorithms with social network analysis was by
Wilson and Banzhaf (2009). The research was conducted over the huge amount of email
communication data of Enron Corporation. The main objective of their study was to nd
the key players among the 151 employees of the organization. The social networkwas inte-
grated into the genetic algorithm through the tness function of the genetic algorithm. The
tness function used was derived from social network measures such as degree, density,
and proximity prestige.
Community detection (Kundu and Pal, 2015b) is one of the most important problems of
social networks where evolutionary algorithms have been eectively used. One such study
was conducted by Gong et al. (2012). In this work, a multi-objective evolutionary algorithm
was used to optimize two important properties of the communities. They simultaneously
maximized the internal link density and minimized the density of links between commu-
nities. They used a modied version of the multi-objective evolutionary algorithm based
on decomposition proposed by Qingfu Zhang and Hui Li (2007). Liu et al. (2014) used a
multi-objective evolutionary algorithm to detect communities in a signed network. A signed
social network is the network where both friend and foe relationships are present. This
algorithm tried to optimize two contradictory objectives of a community. The algorithm
maximizes positive links within a community while minimizing the negative links from
it. Very recently, Rizman žalik (2019) used a multi-objactive genetic algorithm to detect
communities. Here, both the objective functions were minimized to get the end results.
These objective functions were based on the node’s centrality measure and ratio of edges.
To use the genetic algorithm in community detection, they modied dierent steps such as
initialization, mutation, and crossover of the genetic algorithm.
4.5.2 Deep Learning and Social Networks
Deep learning was rst used in social networks by Perozzi et al. (2014). In this work, the
authors used deep learning to represent social graphs with a latent representation in contin-
uous vector space. This allows other well-known statistical and machine learning models
to be used with social network data easily. To learn the social representation, they used a
stream of short random walks. In 2015, Nikfarjam et al. (2015) used deep learning tech-
niques to analyze user posts in social networks. Their objective was to learn about adverse
drug reactions. Deep learning tools were mainly used to interpret natural languages that
automatically classify unlabeled user posts. Li et al. (2014) uses the conditional temporal
restricted Boltzmann machine to predict future links in dynamic social networks. A con-
ditional temporal restricted Boltzmann machine was inherited from the original restricted
Boltzmann machine (Hinton and Salakhutdinov, 2006). Multiple snapshots of the network
at dierent timestamps were used as the input to the model and nodes’ transitional pat-
tern, and inuence of local neighbors are used as the conditional and temporal properties
for the model.
Hate speech classication is one eld of deep learning known to perform well. However,
a study conducted by Aroyehun and Gelbukh (2018) concluded the opposite. The study
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 76
k
k k
k
76 4 Application of Machine Learning in the Social Network
Table 4.2 Summary of Clustering Applications
Reference Problem Dataset Data Type Clustering Type
(Chen et al. 2017) Clustering
urban
functional
areas
Building-level
social media data,
Yuexiu District,
Guangzhou, China
Human/
social
K-Medoids
(Feng et al. 2015) dynamic user
interests
MovieLens and
Netix datasets
Movie
data
Time-weighted
association
rule mining
(Alsaedi and
Burnap 2015)
Events
clustering
Twitter Social Online
clustering
method
(Bakillah, Li, and
Liang 2015)
Geolocated
communities
Twitter: typhoon
Haiyan in the
Philippines
Social Spatial
clustering
(Habibi, Laroche,
and Richard 2014)
Inuence
brand trust
Ecommerce social
media data
Social Overlapping
community
(Atzmanstorfer
et al. 2014)
GeoCitizen
platform
Case study: Capital
District of Quito,
Ecuador
Geo
Social
Spatial
clustering
(Conover et al.
2011)
Cluster
political
aliation
Twitter Human/
social
Latent
semantic
analysis
(Ebner and
Reinhardt 2009)
Scientic
community
Twitter Human/
social
Online
communities
objective was to compare dierent deep learning techniques to identify aggression or hate
speech against the baseline support vector machine with naive Bayes (Wang and Manning,
2012). The other goal of the study was to see the performance of dierent deep neural net-
works in the presence of varying sizes of data. They found that on average the deep learning
technique needed more data points to perform better than the baseline SVM algorithm. In
another study, an interesting problem related to inuence maximization (Pal et al., 2014)
was attempted by Qiu et al. (2018). Here the deep learning technique was used to predict
user actions for neighbors in the network, which in turn provided a way to predict a user’s
inuence in the network. The experiment was conducted over a large-scale online social
network and demonstrated its applicability in proling the inuence of a node in a social
network.
4.6 Summary
In the recent times, social networks are changing the way people operate. As discussed
in the chapter, social network usage happens in almost all areas of life. Some of the
applications discussed will be an eye-opener for many researchers and bring in many
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 77
k
k k
k
Acknowledgments 77
Table 4.3 Summary Regression Application
Reference Problem Dataset Data Type Regression Type
Sparrowe
et al. (2001)
Individual
performance and
group performance in
an employee advice
and hindrance network
190 employees
38 groups 5
organizations
Human/
Social
Simple
Collins and
Clark (2003)
Firms performance
based on top
management social
network
73 companies
avg. empl. 1,742
Human/
Social
Hierarchical
Tucker
(2014)
Personalized
advertising and privacy
controls
1.2 million
Facebook user
Online Logistic
Purushotham
et al. (2012)
Recommendation
systems
Lastfm: 1,892
users
Delicious: 1,867
users
Online Collaborative
topic
regression
Cimenler
et al. (2014)
Researchers’ citation
performance based on
their social network
100 researchers
4 dierent
social networks
Human/
Social
Poisson
Coviello et al.
(2014)
Emotion contagion Massive
Facebook data
Online With
instrumental
variables
Kramer et al.
(2014)
Emotion contagion Facebook with
689,003 users
Online Poisson
Bapna and
Umyarov
(2015)
Peer inuence in a
music website
Last.fm 3.8m
users 23m edges
Online Logistic
Paluck et al.
(2016)
Reducing the conict
between students using
SNA
24,191 students
56 schools
Human/
Social
Linear &
least-square
Tacchini
et al. (2017)
Hoax post
identication
Facebook Online Logistic
interdisciplinary applications in the future. The literature discussed was summarized in
tables. The application of regression in social networks was summarized in Table 4.3,
the classication applications were summarized in Table 4.2, and nally, the clustering
applications in social media were summarized in Table 4.3.
Acknowledgments
Suman Kundu acknowledges the National Science Center, Poland, for the grant 2016/23/
B/ST6/01735.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 78
k
k k
k
78 4 Application of Machine Learning in the Social Network
References
Ahmed, F. and Abulaish, M. (2013) A generic statistical approach for spam detection in Online
Social Networks. Computer Communications,36 (10-11), 1120–1129,
doi:10.1016/j.comcom.2013.04.004.
Akaichi, J. (2013) Social networks’ Facebook’ statutes updates mining for sentiment
classication, in Proceedings - SocialCom/PASSAT/BigData/EconCom/BioMedCom 2013,pp.
886–891, doi:10.1109/SocialCom.2013.135.
Alowibdi, J.S., Buy, U.A., and Yu, P. (2013) Language independent gender classication on
Twitter, in Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining - ASONAM ’13, pp. 739–743, doi:10.1145/2492517.2492632.
Alsmadi, I. and Alhami, I. (2015) Clustering and classication of email contents. Journal of
King Saud University - Computer and Information Sciences,27 (1), 46–57,
doi:10.1016/j.jksuci.2014.03.014.
Ang, L. (2011) Community relationship management and social media. Journal of Database
Marketing and Customer Strategy Management,18 (1), 31–38, doi:10.1057/dbm.2011.3.
Aroyehun, S.T. and Gelbukh, A. (2018) Aggression Detection in Social Media: Using Deep
Neural Networks, Data Augmentation, and Pseudo Labeling, in Proceedings of the First
Workshop on Trolling, Aggression and Cyberbullying, pp. 90–97.
Atzmanstorfer, K., Resl, R., Eitzinger, A., and Izurieta, X. (2014) The GeoCitizen-approach:
Community-based spatial planning - An Ecuadorian case study. Cartography and
Geographic Information Science,41 (3), 248–259, doi:10.1080/15230406.2014.890546.
Bakillah, M., Li, R.Y., and Liang, S.H. (2015) Geo-located community detection in Twitter with
enhanced fast-greedy optimization of modularity: the case study of typhoon Haiyan.
International Journal of Geographical Information Science,29 (2), 258–279,
doi:10.1080/13658816.2014.964247.
Bapna, R. and Umyarov, A. (2015) Do Your Online Friends Make You Pay? A Randomized
Field Experiment on Peer Inuence in Online Social Networks. Management Science,61 (8),
1902–1920, doi:10.1287/mnsc.2014.2081.
Batool, R., Khattak, A.M., Maqbool, J., and Lee, S. (2013) Precise tweet classication and
sentiment analysis, in 2013 IEEE/ACIS 12th International Conference on Computer and
Information Science, ICIS 2013 - Proceedings, pp. 461–466, doi:10.1109/ICIS.2013.6607883.
Bayot, R.K. and Gonçalves, T. (2018) Age and gender classication of tweets using
convolutional neural networks, in Lecture Notes in Computer Science (including subseries
Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), vol. 10710 LNCS,
vol. 10710 LNCS, pp. 337–348, doi:10.1007/978-3-319-72926-8_28.
Beln, R.V., E., G.M.K., and Bródka, P. (2018) Overlapping community detection using
superior seed set selection in social networks. Computers and Electrical Engineering,
doi:10.1016/j.compeleceng.2018.03.012.
Beln, R.V. and Grace Mary Kanaga, E. (2018) Parallel seed selection method for overlapping
community detection in social network. Scalable Computing, doi:10.12694/scpe.v19i4.1429.
Benevenuto, F., Rodrigues, T., Almeida, J., Gonçalves, M., and Almeida, V. (2009) Detecting
spammers and content promoters in online video social networks, in Proceedings - IEEE
INFOCOM, doi:10.1109/INFCOMW.2009.5072127.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 79
k
k k
k
Refe rences 79
Burnap, P., Colombo, W., and Scoureld, J. (2015) Machine Classication and Analysis of
Suicide-Related Communication on Twitter, in Proceedings of the 26th ACM Conference on
Hypertext & Social Media - HT ’15, pp. 75–84, doi:10.1145/2700171.2791023. 0305058.
Cenamor, I., de la Rosa, T., Núñez, S., and Borrajo, D. (2017) Planning for tourism routes using
social networks. Expert Systems with Applications,69, 1–9, doi:10.1016/j.eswa.2016.10.030.
Chen, Y., Liu, X., Li, X., Liu, X., Yao, Y., Hu, G., Xu, X., and Pei, F. (2017) Delineating urban
functional areas with building-level social media data: A dynamic time warping (DTW)
distance based k-medoids method. Landscape and Urban Planning,160, 48–60.
Cimenler, O., Reeves, K.A., and Skvoretz, J. (2014) A regression analysis of researchers’ social
network metrics on their citation performance in a college of engineering. Journal of
Informetrics,8(3), 667–682, doi:10.1016/j.joi.2014.06.004.
Collins, C.J. and Clark, K.D. (2003) Strategic human resource practices, top management team
social networks, and rm performance: The role of human resource practices in creating
organizational competitive advantage, doi:10.2307/30040665.
Conover, M.D., Gonçalves, B., Ratkiewicz, J., Flammini, A., and Menczer, F. (2011) Predicting
the political alignment of twitter users, in Proceedings - 2011 IEEE International Conference
on Privacy, Security, Risk and Trust and IEEE International Conference on Social Computing,
PASSAT/SocialCom 2011, doi:10.1109/PASSAT/SocialCom.2011.34.
Coviello, L., Sohn, Y., Kramer, A.D., Marlow, C., Franceschetti, M., Christakis, N.A., and
Fowler, J.H. (2014) Detecting emotional contagion in massive social networks. PLoS ONE,
9(3), e90 315, doi:10.1371/journal.pone.0090315.
Croitoru, A., Wayant, N., Crooks, A., Radzikowski, J., and Stefanidis, A. (2015) Linking cyber
and physical spaces through community detection and clustering in social media feeds.
Computers, Environment and Urban Systems,53, 47–64, doi:10.1016/j.compenvurbsys.2014.
11.002.
Ebner, M. and Reinhardt, W. (2009) Social networking in scientic conferences Twitter as tool
for strengthen a scientic community, in telearnnoekaleidoscopeorg, vol. 2, vol. 2, pp. 1–8.
Eleta, I. and Golbeck, J. (2014) Multilingual use of Twitter: Social networks at the language
frontier. Computers in Human Behavior,41, 424–432, doi:10.1016/j.chb.2014.05.005.
Feng, H., Tian, J., Wang, H.J., and Li, M. (2015) Personalized recommendations based on
time-weighted overlapping community detection. Information and Management,52 (7),
789–800, doi:10.1016/j.im.2015.02.004.
Fortunato, S. (2010) Community detection in graphs. Physics Reports,486 (3-5), 75–174,
doi:10.1016/j.physrep.2009.11.002.
Gong, M., Ma, L., Zhang, Q., and Jiao, L. (2012) Community detection in networks by using
multiobjective evolutionary algorithm with decomposition. Physica A: Statistical Mechanics
and its Applications,391 (15), 4050–4060, doi:10.1016/j.physa.2012.03.021.
Granovetter, M.S. (1973) The strength of weak ties. American Journal of Sociology,78 (6),
1360–1380.
Habibi, M.R., Laroche, M., and Richard, M.O. (2014) The roles of brand community and
community engagement in building brand trust on social media. Computers in Human
Behavior,37, 152–161, doi:10.1016/j.chb.2014.04.016.
Himelboim, I., Smith, M.A., Rainie, L., Shneiderman, B., and Espina, C. (2017) Classifying
Twitter Topic-Networks Using Social Network Analysis. Social Media +Society,3(1),
205630511769 154, doi:10.1177/2056305117691545.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 80
k
k k
k
80 4 Application of Machine Learning in the Social Network
Hinton, G.E. and Salakhutdinov, R.R. (2006) Reducing the dimensionality of data with neural
networks. Science (New York, N.Y.),313 (5786), 504–7, doi:10.1126/science.1127647.
Hossain, L., Kam, D., Kong, F., Wigand, R.T., and Bossomaier, T. (2016) Social media in Ebola
outbreak. Epidemiology and Infection,144 (10), 2136–2143, doi:10.1017/S095026881600039X.
Igawa, R.A., Barbon, S., Paulo, K.C.S., Kido, G.S., Guido, R.C., Júnior, M.L.P., and da Silva, I.N.
(2016) Account classication in online social networks with LBCA and wavelets.
Information Sciences,332, 72–83, doi:10.1016/j.ins.2015.10.039.
Johnston, J. (2017) Courts’ use of social media: A community of practice model. International
Journal of Communication,11, 669–683, doi:10.1021/am504320h.
Kramer, A.D.I., Guillory, J.E., and Hancock, J.T. (2014) Experimental evidence of massive-scale
emotional contagion through social networks. Proceedings of the National Academy of
Sciences,111 (24), 8788–8790, doi:10.1073/pnas.1320040111.
Kundu, S. and Pal, S.K. (2015a) FGSN: Fuzzy Granular Social Networks - Model and
applications. Information Sciences,314, 100–117, doi:10.1016/j.ins.2015.03.065.
Kundu, S. and Pal, S.K. (2015b) Fuzzy-rough community in social networks. Pattern
Recognition Letters,67, 145–152, doi:10.1016/j.patrec.2015.02.005.
Kundu, S. and Pal, S.K. (2018) Double bounded rough set, tension measure, and social link
prediction. IEEE Transactions on Computational Social Systems,5(3), 841–853,
doi:10.1109/TCSS.2018.2861215.
Lakkaraju, H. and Ajmera, J. (2011) Attention prediction on social media brand pages, in
Proceedings of the 20th ACM international conference on Information and knowledge
management - CIKM ’11, p. 2157, doi:10.1145/2063576.2063915.
Li, W. and Xu, H. (2014) Text-based emotion classication using emotion cause extraction.
Expert Systems with Applications,41 (4 PART 2), 1742–1749, doi:10.1016/j.eswa.2013.08.073.
Li, X., Du, N., Li, H., Li, K., Gao, J., and Zhang, A. (2014) A Deep Learning Approach to Link
Prediction in Dynamic Networks, in Proceedings of the 2014 SIAM International Conference
on Data Mining, Society for Industrial and Applied Mathematics, Philadelphia, PA, pp.
289–297, doi:10.1137/1.9781611973440.33.
Lima, A.C.E. and de Castro, L.N. (2014) A multi-label, semi-supervised classication approach
applied to personality prediction in social media. Neural Networks,58, 122–130,
doi:10.1016/j.neunet.2014.05.020.
Liu, C., Liu, J., and Jiang, Z. (2014) A multiobjective evolutionary algorithm based on similarity
for community detection from signed social networks. IEEE Transactions on Cybernetics,
44 (12), 2274–2287, doi:10.1109/TCYB.2014.2305974.
Lo, Y.W. and Potdar, V. (2009) A review of opinion mining and sentiment classication
framework in social networks, in 2009 3rd IEEE International Conference on Digital
Ecosystems and Technologies, DEST ’09, pp. 396–401, doi:10.1109/DEST.2009.5276705.
McAuley, J. and Leskovec, J. (2012) Image labeling on a network: Using social-network
metadata for image classication, in Lecture Notes in Computer Science (including subseries
Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), vol. 7575 LNCS,
vol. 7575 LNCS, pp. 828–841, doi:10.1007/978-3-642-33765-9_59. 1207.3809.
McGee, J., Caverlee, J., and Cheng, Z. (2013) Location prediction in social media based on tie
strength, in Proceedings of the 22nd ACM international conference on Conference on
information & knowledge management - CIKM ’13, pp. 459–468,
doi:10.1145/2505515.2505544. 1111.2904.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 81
k
k k
k
Refe rences 81
Narayanam, R. and Narahari, Y. (2011) A Shapley value-based approach to discover inuential
nodes in social networks. IEEE Transactions on Automation Science and Engineering,8(1),
130–147.
Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R., and Gonzalez, G. (2015) Pharmacovigilance
from social media: mining adverse drug reaction mentions using sequence labeling with
word embedding cluster features. Journal of the American Medical Informatics Association,
22 (3), 671–681, doi:10.1093/jamia/ocu041.
Ou, G., Chen, W., Wang, T., Wei, Z., Li, B., Yang, D., and Wong, K.F. (2017) Exploiting
Community Emotion for Microblog Event Detection, in Social Media Content Analysis,pp.
439–456, doi:10.1142/9789813223615_0027.
Pal, S.K. and Kundu, S. (2017) Granular Social Network: Model and Applications, in Handbook
of Big Data Technologies (eds A.Y. Zomaya and S. Sakr), Springer International Publishing,
Cham, pp. 617–651, doi:10.1007/978-3-319-49340-4_18.
Pal, S.K., Kundu, S., and Murthy, C.A. (2014) Centrality measures, upper bound, and inuence
maximization in large scale directed social networks. Fundamenta Informaticae,130 (3),
317–342.
Paluck, E.L., Shepherd, H., and Aronow, P.M. (2016) Changing climates of conict: A social
network experiment in 56 schools. Proceedings of the National Academy of Sciences of the
United States of America,113 (3), 566–71, doi:10.1073/pnas.1514483113.
Peled, O., Fire, M., Rokach, L., and Elovici, Y. (2016) Matching entities across online social
networks. Neurocomputing,210, 91–106, doi:10.1016/j.neucom.2016.03.089.
Perozzi, B., Al-Rfou,R., and Skiena, S. (2014) DeepWalk: Online Learning of Social
Representations. Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining - KDD ’14, pp. 701–710, doi:10.1145/2623330.2623732.
Powell, W.W. and Brantley, P. (1992) Competitive Cooperation in Biotechnology: Learning
through Networks?, in Networks and Organizations: Structure, Form, and Action,Harvard
Business School Press, Boston, pp. 366–394.
Purushotham, S., Liu, Y., and Kuo, C.C.J. (2012) Collaborative Topic Regression with Social
Matrix Factorization for Recommendation Systems, in Proceedings of the 29th International
Confer- ence on Machine Learning, Edinburgh, pp. 759–766,
doi:10.1016/j.jhydrol.2004.11.010. 1206.4684.
Qingfu Zhang and Hui Li (2007) MOEA/D: A Multiobjective Evolutionary Algorithm Based on
Decomposition. IEEE Transactions on Evolutionary Computation,11 (6), 712–731,
doi:10.1109/TEVC.2007.892759.
Qiu, J., Tang, J., Ma, H., Dong, Y., Wang, K., and Tang, J. (2018) DeepInf: Social Inuence
Prediction with Deep Learning, in Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining - KDD ’18, ACM Press, New York, New
York, USA, pp. 2110–2119, doi:10.1145/3219819.3220077.
Rizman Žalik, K. (2019) Evolution Algorithm for Community Detection in Social Networks
Using Node Centrality, pp. 73–87, doi:10.1007/978-3-319-77604-0_6.
Schirr, G.R. (2013) Community-Sourcing a New Marketing Course: Collaboration in Social
Media. Marketing Education Review,23 (3), 225–240, doi:10.2753/MER1052-8008230302.
Scott, J. (2000) Social network analysis : a handbook, SAGE Publications.
Shaji, A., Beln, R., and Grace Mary Kanaga, E. (2018) An innovated SIRS model for
information spreading,vol.645, doi:10.1007/978-981-10-7200-0_37.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 82
k
k k
k
82 4 Application of Machine Learning in the Social Network
Sitter, K.C. and Curnew, A.H. (2016) The application of social media in social work community
practice. Social Work Education,35 (3), 271–283, doi:10.1080/02615479.2015.1131257.
Song, Y., Lu, Z., Leung, C.W.k., and Yang, Q. (2013) Collaborative boosting for activity
classication in microblogs, in Proceedings of the 19th ACM SIGKDD international conference
on Knowledge discovery and data mining - KDD ’13, p. 482, doi:10.1145/2487575.2487661.
Sparrowe, R.T., Liden, R.C., Wayne, S.J., and Kraimer, M.L. (2001) Social networks and the
performance of individuals and groups. Academy of Management Journal,44 (2), 316–325,
doi:10.2307/3069458.
Statista.com (2018) Social Media Statistics & Facts Statista. URL https://www.statista.com/
topics/1164/social-networks/.
Tacchini, E., Ballarin, G., Della Vedova, M.L., Moret, S., and de Alfaro, L. (2017) Some Like it
Hoax: Automated Fake News Detection in Social Networks. 1704.07506.
Tang, L. and Liu, H. (2011) Leveraging social media networks for classication. Data Mining
and Knowledge Discovery,23 (3), 447–478, doi:10.1007/s10618-010-0210-x.
Tucker, C.E. (2014) Social networks, personalized advertising, and privacy controls. Journal of
Marketing Research,51 (5), 546–562, doi:10.1509/jmr.10.0355.
Tuulos, V.H. and Tirri, H. (2004) Combining topic models and social networks for chat data
mining, in Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI
2004, pp. 206–213, doi:10.1109/WI.2004.10025.
Vázquez, S., Muñoz-García, Ó., Campanella, I., Poch, M., Fisas, B., Bel, N., and Andreu, G.
(2014) A classication of user-generated content into consumer decision journey stages.
Neural Networks,58, 68–81, doi:10.1016/j.neunet.2014.05.026.
Verbeke, W., Martens, D., and Baesens, B. (2014) Social network analysis for customer churn
prediction. Applied Soft Computing Journal,14 (PART C), 431–446,
doi:10.1016/j.asoc.2013.09.017.
Wang, C. and Blei, D.M. (2011) Collaborative topic modeling for recommending scientic
articles, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge
discovery and data mining - KDD ’11, p. 448, doi:10.1145/2020408.2020480.
arXiv:1411.2581v1.
Wang, D., Irani, D., and Pu, C. (2014) SPADE: a social-spam analytics and detection framework.
Social Network Analysis and Mining,4(1), 1–18, doi:10.1007/s13278-014-0189-1.
Wang, S. and Manning, C.D. (2012) Baselines and bigrams: Simple, good sentiment and topic
classication, in Proceedings of the 50th Annual Meeting of the Association for Computational
Linguistics: Short Papers - Volume 2, Association for Computational Linguistics, Stroudsburg,
PA, USA, ACL ’12, pp. 90–94.
Wanichayapong, N., Pruthipunyaskul, W., Pattara-Atikom, W., and Chaovalit, P. (2011)
Social-based trac information extraction and classication, in 2011 11th International
Conference on ITS Telecommunications, ITST 2011, pp. 107–112,
doi:10.1109/ITST.2011.6060036.
Wilson, G. and Banzhaf, W. (2009) Discovery of email communication networks from the
enron corpus with a genetic algorithm using social network analysis, in 2009 IEEE Congress
on Evolutionary Computation, CEC 2009, pp. 3256–3263, doi:10.1109/CEC.2009.4983357.
Yang, M., Kiang, M., and Shang, W. (2015) Filtering big data from social media - Building an
early warning system for adverse drug reactions. Journal of Biomedical Informatics,54,
230–240, doi:10.1016/j.jbi.2015.01.011.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 83
k
k k
k
Refe rences 83
Ye, M., Shou, D., Lee, W.C., Yin, P., and Janowicz, K. (2011) On the semantic annotation of
places in location-based social networks, in Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data mining - KDD ’11, p. 520,
doi:10.1145/2020408.2020491.
Zhang, X., Tokoglu, F., Negishi, M., Arora, J., Winstanley, S., Spencer, D.D., and Constable, R.T.
(2011) Social network theory applied to resting-state fMRI connectivity data in the
identication of epilepsy networks with iterative feature selection. Journal of Neuroscience
Methods,199 (1), 129–139, doi:10.1016/j.jneumeth.2011.04.020.
Zhou, W., Jin, H., and Liu, Y. (2012) Community discovery and proling with social messages,
in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and
data mining - KDD ’12, p. 388, doi:10.1145/2339530.2339593.
Zhu, Y., Wang, X., Zhong, E., Liu, N., Li, H., and Yang, Q. (2012) Discovering spammers in
social networks, in Association for the Advancement of Articial Intelligence, pp. 171–177.
Trim Size: 170mm x 244mm Single Column Souravde551591 c04.tex V1 - 02/20/2020 1:49pm Page 84
k
k k
k
... Such tasks include making decisions and/or predictions (Hossain et al., 2019). Figure 1.2 shows four broad categories of machine learning (Belfin et al., 2020): ...
... Classifications of machine learning (Belfin et al., 2020). Introduction to Industry 4.0 Technologies 1-5 scitation.org/books ...
Chapter
DESCRIPTION The fourth industrial revolution is being driven by various emerging technologies, some of which are introduced in this chapter. It describes the integration of these technologies for developing innovative solutions for industries and society. Fusing these technologies can help facilitate the seamless linking of the physical and digital worlds. The chapter begins with a brief introduction to the Internet of Things technology (IoT). It is followed by a description of machine learning - a subset of artificial intelligence - and its classifications. Cloud computing, along with its characteristics and deployment models, are detailed. Augmented reality, robotics, and rapid prototyping are also explained. The chapter closes with an explanation of blockchain - a technology with the potential to disrupt a wide range of industries
... Machine learning algorithms analyze patterns, behavior, and characteristics to distinguish between genuine and counterfeit accounts accurately by leveraging extensive data gathered on these platforms [3]. ...
... We divided the literature review in two parts: first part focused on general review of studies with the objective of answering the literature research questions (LRQ) mentioned as follows. The answers for these questions are derived from the review papers extensively focused on: machine learning in mental health (Shatte et al. 2019) (Thieme et al. 2020) (Rahman et al. 2020) , mental health monitoring using machine learning (Belfin et al. 2020) (Garcia-Ceja et al. 2018), predicting mental health from social media (Chancellor & De Choudhury 2020), application of machine learning methods in mental health detection (Abd Rahman et al. 2018), machine learning and natural language processing in mental health (Glaz et al. 2021), AI in mental health (D'Alfonso 2020). LRQ1: What are the most common characteristics of the mental health that have been used for prediction? ...
Article
Full-text available
Social network analytics in health-care is recently catching-all as world is recovering from Covid-19 pandemic and when it comes to dealing with mounting mental health problems in the online community. A technical breakthrough to quickly address this issue is a big bet on researchers to deliver better intelligent mental health scale and support in mental distress risk management. In this paper we aim to build four artificial-intelligence driven models by blending the power of two dominant deep learning neural networks for explaining and predicting mental distress risk in the crowd-sourced online community. The models are simulated using a novel feature construct, which is a combination of numerical and textual data. The numerical data are realized by encoding social networking sites behavior and physical, social, cognitive experiences as part of three axes of psychological distress (depression, anxiety and stress). The textual part of data is made up of social network comments mined to acquire mental distressed cues by applying feature extraction techniques such as word embeddings and glove embeddings techniques. With the hyper-parameter tuning of models, we attained excellent performance (accuracy ~ 99.20%) which proves the efficacy of the proposed hybrid mental distress prediction model well with respect to accuracy in comparison to other related recent existing models with a boost of 0.20%. Our experimental results offer a robust model wherein we achieved zero false positive cases, attained 100% precision and excellent F1 measures. Additionally, we validated our results by using state-of-the-art technique BERT on different ground truth dataset i.e.,, Indian tweets to explore the tweets for psychological-distress prediction. Thus, we present an effectual automated tool for treatment activation of mental distress and supporting decisions in fostering the mental health of online society.
... In transportation, ML algorithms has seen integration in route prediction, traffic prediction, environmental condition monitoring, surveillance and security [17,18]. Social media has seen diverse applications of ML for market segmentation, friend recommendation, spam filtering, advertisement and sentiment analysis [19,20]. ...
Article
Full-text available
The proliferation of online job websites has eased the difficulties in hiring and applying for jobs globally. Unfortunately, the risk of defrauding desperate job seekers exists with malicious recruiters taking advantage of the loopholes in the online recruitment process. The reactive approach to detecting online job fraud and the subsequent warnings on reputable job websites hasn't curtailed this spiteful act. The purpose of the study is to propose a machine learning model for proactive job fraud detection. In building the predictive model, a job fraud dataset from a job advertisement firm in Ghana was utilised. Using the 10-fold and the 5-fold cross-validation techniques, a job fraud detection model was built by comparing conventional and ensemble machine learning algorithms. The machine learning metrics, including accuracy, F1-score and the area under the curve (AUC) value, were reported and discussed. The findings show that the Random Forest traditional algorithm, with an accuracy of 91.86%, is best suited for the dataset. The investigation further indicates that information gain and chi-square feature selection mechanisms decreased classification accuracy marginally to 91.51%.
... Based on the algorithm analysis point of view, the Levy distribution of the Cuckoo Search Algorithm increases the speed and is also suitable for very large-scale problems. A conceptual comparison of Cuckoo Search Algorithm with particle swarm optimization (PSO) Poli et al. (2007), differential evolution (DE) Belfin et al. (2020), ant colony optimization (ACO) Dorigo et al. (2006) suggested that the Cuckoo Search Algorithm provides more robust results than PSO and ACO. Hence, the proposed research uses the Cuckoo Search Algorithm in the threshold optimization problem. ...
Article
Full-text available
Social network analysis (SNA) is one of the emerging fields of research for discovering the densely connected nodes in social networks (community detection) and identifying the behavioral patterns of the communities. Changes in the behavioral pattern of the communities are called Events. The application of the similarity finding technique on the detected communities that facilitates the detection of events is called community mining. Event detection depends on a minimum permissible level of similarity value (adopted for the set of nodes in a community), called the threshold value of similarity. The traditional approaches randomly selected this threshold value of similarity causing unequal distribution of community events. This paper uses evolutionary algorithms to tune the threshold value (k) of similarity for equal distribution of community events in a massive dataset . This paper evaluates three different fitness functions (Root-Mean-Squared Error, Pairwise Sum of Squared Differences, and Entropy) with an ultimate goal to achieve uniformity in the detection of events. The experimental results confirm that maximizing the Entropy (proved using the Lagrangian multiplier method) is the best strategy among the three fitness functions in order to get a uniform distribution of events. This paper compares the performance of Cuckoo Search Algorithm (CSA), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ASO) for tuning the similarity threshold value using Entropy-based fitness function. The experimental results show that CSA facilitates a uniform classification of events as compared to PSO and ACO. The empirical analysis validates the similarity threshold value of 0.48 for classifying all the events uniformly. It is observed that CSA is approximately 54.95% and 71.08% faster than PSO and ACO, respectively.
Article
Social networks exhibit interactions that lead to event changes in their communities. It is imperative to track community events to understand an extensive social network. Recently, several models reported that the randomness and sparsity of social networks bring significant challenges in predicting community events. Hence, the proposed work extracts both community and temporal features to predict the events effectively that a community might experience. Machine learning (ML) models are widely used in predicting such events in a social network. Many machine learning models, such as naive Bayes, random forest, logistic regression, SVM, neural networks, etc., are used to predict community events. Further, the model’s performance is enhanced using hyper-parameter tuning by selecting the appropriate parameters. Evolutionary algorithms are effective in tuning these hyper-parameters. This paper investigates the effectiveness of Cuckoo search optimization (CSO), particle swarm optimization (PSO), ant colony optimization (ACO), jellyfish search optimization (JFO), and mayfly optimization (MFO) evolutionary algorithms in tuning the hyper-parameters of four ML models to achieve higher accuracy in the results. The comparative analysis of these 20 combinations (five evolutionary algorithms and four ML models) shows that CSO improves average accuracy by 4.12% in all the machine learning models compared to PSO, ACO, JFO, and MFO. Furthermore, results confirm that CSO precisely suits the neural network model in tuning its hyper-parameters. The accuracy of the neural network model improved by 4.5% after tuning its hyper-parameters using CSO.
Article
Full-text available
Social network analysis is one of the key areas of research during modern times. The social network is growing with more users and the ties between them day by day. This reason brings out many research queries and new conclusions from this area. Overlapping community detection in the social network is one such research problem which has acquired interest among researchers nowadays. Earlier, the investigation was in finding out algorithms to detect communities in the network sequentially. There are many distinguished findings toward overlapping community detection. Due to the velocity of data in the current era, the available algorithms will be a bit sluggish in processing the data. The proposed algorithm uses parallel processing engine to resolve this delay problem in the current scenario. The algorithm in parallel finds out the superior seed set in the network and expands it in parallel to find out the community. The work shows amazing improvement in the runtime and also detects quality groups in the network.
Article
Full-text available
This paper describes a new approach of viewing a social relation as a string with various forces acting on it. Accordingly, a tension measure for a relation is defined. Various component forces of the tension measure are identified based on the structural information of the network. A new variant of rough set, namely, double bounded rough set, is developed in order to define these forces mathematically. It is revealed experimentally with synthetic and real-world data that positive and negative tension characterizes, relatively, the presence and absence of a physical link between two nodes. An algorithm based on tension measure is proposed for link prediction. Superiority of the algorithm is demonstrated on nine real-world networks, which include four temporal networks. The source code for calculating tension measure and link prediction algorithm is publicly available at https://gitlab.com/suman5/social-tension-measure .
Preprint
Full-text available
Social and information networking activities such as on Facebook, Twitter, WeChat, and Weibo have become an indispensable part of our everyday life, where we can easily access friends' behaviors and are in turn influenced by them. Consequently, an effective social influence prediction for each user is critical for a variety of applications such as online recommendation and advertising. Conventional social influence prediction approaches typically design various hand-crafted rules to extract user- and network-specific features. However, their effectiveness heavily relies on the knowledge of domain experts. As a result, it is usually difficult to generalize them into different domains. Inspired by the recent success of deep neural networks in a wide range of computing applications, we design an end-to-end framework, DeepInf, to learn users' latent feature representation for predicting social influence. In general, DeepInf takes a user's local network as the input to a graph neural network for learning her latent social representation. We design strategies to incorporate both network structures and user-specific features into convolutional neural and attention networks. Extensive experiments on Open Academic Graph, Twitter, Weibo, and Digg, representing different types of social and information networks, demonstrate that the proposed end-to-end model, DeepInf, significantly outperforms traditional feature engineering-based approaches, suggesting the effectiveness of representation learning for social applications.
Chapter
Full-text available
Epidemic models do a great job in spreading information over a network, among which the SIR model stands out due to its practical applicability, with three different compartments. When considering the real-world scenarios, these three compartments have a great deal of application in spreading information over a network. Even though SIR is a realistic model, it has its own limitations. For example, the maximum reach of this model is limited. A solution to this is to introduce the SIRS model where the nodes in the recovered (removed) state will gradually slip into the susceptible state, based on the immunity loss, which is a constant. This presents the problem because in the real-world scenario, this immunity loss rate is a dependent parameter so a constant won’t do justice. So to cope with the real-world problem, this paper presents a variable called immunity coefficient, which is dependent on the state of the neighbors.
Article
Full-text available
Community discovery in the social network is one of the tremendously expanding areas which earn interest among researchers for past one decade. There are many already existing algorithms. However, new seed-based algorithms establish an emerging drift in this area. The basic idea behind these strategies is to identify exceptional nodes in the given network, called seeds, around which communities can be located. This paper proposes a blended strategy for locating suitable superior seed set by applying various centrality measures and using them to find overlapping communities. The examination of the algorithm has been performed regarding the goodness of the identified communities with the help of intra-cluster density and inter-cluster density. Finally, the runtime of the proposed algorithm has been compared with the existing community detection algorithms showing remarkable improvement.
Chapter
Community structure identification has received a great effort among computer scientists who are focusing on the properties of complex networks like the internet, social networks, food networks, e-mail networks and biochemical networks. Automatic network clustering can uncover natural groups of nodes called communities in real networks that reveals its underlying structure and functions. In this paper, we use a multiobjective evolution community detection algorithm, which forms center-based communities in a network exploiting node centrality. Node centrality is easy to use for better partitions and for increasing the convergence of evolution algorithm. The proposed algorithm reveals the center-based natural communities with high quality. Experiments on real-world networks demonstrate the efficiency of the proposed approach.
Chapter
Determining age and gender from a series of texts is useful for areas such as business intelligence and digital forensics. We explore the use of convolutional neural networks together with word2vec word embeddings for this task in comparison to handcrafted features. The network constructed consists of five layers and is trained using adadelta. It starts with an embedding layer where a word is represented by a vector, followed by a convolutional layer composed of three filters, each with 100 feature maps. It is followed by a max-over-time pooling layer which is done on each map and the resulting features are concatenated before a dropout layer and a softmax layer. The network was trained to classify age and gender for English and Spanish tweets. The predictions per tweet were aggregated using the majority prediction as the final prediction for the user who gave the tweets. The results outperform previous experiments. The highest English age and gender classification accuracy obtained are 49.6% and 72.1% respectively. The highest Spanish age and gender classification accuracy obtained on the other hand are 56.0% and 69.3% respectively.
Chapter
Microblog has become a major platform for information about real-world events. Automatically discovering real-world events from microblog has attracted the attention of many researchers. However, most of existing work ignore the importance of emotion information for event detection. We argue that people’s emotional reactions immediately reflect the occurring of real-world events and should be important for event detection. In this study, we focus on the problem of community-related event detection by community emotions. To address the problem, we propose a novel framework which include the following three key components: microblog emotion classification, community emotion aggregation and community emotion burst detection. We evaluate our approach on real microblog data sets. Experimental results demonstrate the effectiveness of the proposed framework. © 2018 by World Scientific Publishing Co. Pte. Ltd. All rights reserved.