Figure 9 - uploaded by Huan Liu
Content may be subject to copyright.
Source publication
The increasing popularity of social media is shortening the distance between people. Social activities, e.g., tagging in Flickr, book marking in Delicious, twittering in Twitter, etc. are reshaping people's social life and redefining their social roles. People with shared interests tend to form their groups in social media, and users within the sam...
Context in source publication
Context 1
... a tag cloud, size of a tag is representative of its frequency or importance in a set of tags or phrases. Figure 9 shows the tag cloud for Category Health (category-health) including all tags of this category. The most frequent 5 tags, health, weight loss, diet, fitness and nutrition, are all about health. ...
Similar publications
Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions,...
Opinion retrieval deals with finding relevant documents that express either a negative or positive opinion about some topic. Social Networks such as Twitter, where people routinely post opinions about almost any topic, are rich environments for opinions. However, spam and wildly varying documents makes opinion retrieval within Twitter challenging....
The fundamental building block of social influence is for one person to elicit a response in another. Researchers measur-ing a "response" in social media typically depend either on detailed models of human behavior or on platform-specific cues such as re-tweets, hash tags, URLs, or mentions. Most content on social networks is difficult to model bec...
Databases of places have become increasingly popular to identify places of a given type that are
close to a user-specified location. As it is important for these systems to use an up-to-date
database with a broad coverage, there is a need for techniques that are capable of expanding
place databases in an automated way. In this paper we discuss how...
Twitter is a fabulous source for information to keep track of latest happenings and concerns in the world. Whenever somet include hash tags, allowing us to selectively search for tweets about a certain event or thing. Many twitter users also engage in conversations, and looking at these conversations a leaders and frequent actors. devastating. For...
Citations
... Such groups are usually formed due to affinities and common interests. As organizations increase their utilization, overlaps (both in terms of individuals and purposes) among groups may occur [80,104], yielding ambiguity and gaps in the information. This may also imply doubts and misguiding indications among individuals, generating irrelevant information to the problem under investigation. ...
This paper investigates the impact of social media utilization on problem-solving routines in organizations undergoing Lean Production (LP) implementation. A multi-case study was conducted in three firms from different sectors with distinct maturity levels of LP implementation. Empirical evidence was collected through complementary ways, such as semi-structured interviews, secondary data, and in loco non-participant observation. Data were then analyzed and triangulated, leading to propositions on the effects of social media on problem-solving activities in lean organizations. This research was grounded on the concepts of Information Manipulation Theory. Our findings suggest that, while social media may contribute to the amount of information that is shared for solving problems, the relevance and the level of details of such information may be shallow, overburdening the help chain mechanisms and generating wastes like overprocessing. The identification of the impact of social media on problem-solving activities enables a better comprehension of how new information and communication technologies can promote (or impair) the intra- and inter-organizational links. It also helps identify improvement opportunities in integrating social media into problem-solving routines, resulting in more responsive and competitive organizations.
... In this section, we introduce the related work from various aspects. Some work uses similarity-based methods [13] or clustering [3,7,21,22,26], meta-task learning [4,10,23,27] , matrix factorization [8] , classification-based methods [1] and rule-based methods [17] . Knowledge graphs embedding models [2,14,18,24]and GNNs [5,6,9,12,16,19,20,25] based models also can be used in recommender systems to address lookalike modeling. ...
Lookalike models are based on the assumption that user similarity plays an important role towards product selling and enhancing the existing advertising campaigns from a very large user base. Challenges associated to these models reside on the heterogeneity of the user base and its sparsity. In this work, we propose a novel framework that unifies the customers different behaviors or features such as demographics, buying behaviors on different platforms, customer loyalty behaviors and build a lookalike model to improve customer targeting for Rakuten Group, Inc. Extensive experiments on real e-commerce and travel datasets demonstrate the effectiveness of our proposed lookalike model for user targeting task.
... The anomaly analytics methods are reviewed in Sections 3 to 6. In Section 7, we outline several real-world applications of anomaly analytics that can be solved with deep graph models, and discuss some future research directions and challenges in Section 8. Finally, we briefly conclude this survey in Section 9. [26] Undirected network Node Karate, Dolphin, Jazz, US Power grid, Ego-Facebook 1 Rumor source detection GAS [54] HIN Node&Edge -Spam review detection SpecAE [56] Attributed networks Node Cora, Pubmed, PolBlog [76] -DOMINANT [22] Attributed networks Node BlogCatalog [103],Flickr [89], ACM [88] -GCNwithMRF [108] Directed graph Node TwitterSH [53], 1KS-10KN [116] Social spammer detection Bi-GCN [7] -Edge Weibo [65], Twitter [66] Rumor detection GCAN [63] Weighted graph Node Twitter Fake news detection TPC-GCN [134] HIN Node Weibo [65], Reddit [37] Controversy detection AANE [29] -Edge Disney 2 , Enron 3 -HMGNN [142] Vanilla graph Node -Fraud invitation GraphRfi [127] Bipartite Graph Node Yelp [79], Amazon [70] -AddGraph [131] Dynamic graph Edge UCI [73], Digg [18] -ST-GCAE [68] ST Graph Graph ShanghaiTech [64] Anomalous action StrGNN [10] Temporal Graph Node UCI, Digg [18] -ANEMONE [45] -Node Cora, Citeseer, Pubmed -CoLA [60] Attributed network Node [103], [89], [88] -GCCAD [13] Attributed network Node Aminer, MAS, Alpha, Yelp -GAT HAGNE [100] HINs Graph -Unknown malware HACUD [41] Attributed HINs Node -Cash-out user detection SemiGNN [95] Multiview graph Node -Financial fraud AA-HGNN [81] HINs Node BuzzFeed 4 Fake news detection mHGNN [32] Attributed HINs Subgraph -Illicit traded product GDN [19] Directed graph Node SWaT [69], WADI [1] Anomalous sensors TGBULLY [34] Temporal graph Subgraph Instagram [39], Vine [77] Cyberbullying detection TADDY [61] Dynamic graph Node Email 5 , AS-Topology 6 -GAE AEHE [31] HINs Path ACM [88] Co-authored event AEGIS [21] Attributed networks Node BlogCatalog,Flickr, ACM -DONE [6] Attributed 2. Illustration of the whole process of detecting anomalies in graph data with deep graph models. The models are mainly divided into two parts according to whether anomaly score is calculated by latent representation or directly generated by end-to-end models. ...
... The anomaly analytics methods are reviewed in Sections 3 to 6. In Section 7, we outline several real-world applications of anomaly analytics that can be solved with deep graph models, and discuss some future research directions and challenges in Section 8. Finally, we briefly conclude this survey in Section 9. [26] Undirected network Node Karate, Dolphin, Jazz, US Power grid, Ego-Facebook 1 Rumor source detection GAS [54] HIN Node&Edge -Spam review detection SpecAE [56] Attributed networks Node Cora, Pubmed, PolBlog [76] -DOMINANT [22] Attributed networks Node BlogCatalog [103],Flickr [89], ACM [88] -GCNwithMRF [108] Directed graph Node TwitterSH [53], 1KS-10KN [116] Social spammer detection Bi-GCN [7] -Edge Weibo [65], Twitter [66] Rumor detection GCAN [63] Weighted graph Node Twitter Fake news detection TPC-GCN [134] HIN Node Weibo [65], Reddit [37] Controversy detection AANE [29] -Edge Disney 2 , Enron 3 -HMGNN [142] Vanilla graph Node -Fraud invitation GraphRfi [127] Bipartite Graph Node Yelp [79], Amazon [70] -AddGraph [131] Dynamic graph Edge UCI [73], Digg [18] -ST-GCAE [68] ST Graph Graph ShanghaiTech [64] Anomalous action StrGNN [10] Temporal Graph Node UCI, Digg [18] -ANEMONE [45] -Node Cora, Citeseer, Pubmed -CoLA [60] Attributed network Node [103], [89], [88] -GCCAD [13] Attributed network Node Aminer, MAS, Alpha, Yelp -GAT HAGNE [100] HINs Graph -Unknown malware HACUD [41] Attributed HINs Node -Cash-out user detection SemiGNN [95] Multiview graph Node -Financial fraud AA-HGNN [81] HINs Node BuzzFeed 4 Fake news detection mHGNN [32] Attributed HINs Subgraph -Illicit traded product GDN [19] Directed graph Node SWaT [69], WADI [1] Anomalous sensors TGBULLY [34] Temporal graph Subgraph Instagram [39], Vine [77] Cyberbullying detection TADDY [61] Dynamic graph Node Email 5 , AS-Topology 6 -GAE AEHE [31] HINs Path ACM [88] Co-authored event AEGIS [21] Attributed networks Node BlogCatalog,Flickr, ACM -DONE [6] Attributed 2. Illustration of the whole process of detecting anomalies in graph data with deep graph models. The models are mainly divided into two parts according to whether anomaly score is calculated by latent representation or directly generated by end-to-end models. ...
Anomaly analytics is a popular and vital task in various research contexts, which has been studied for several decades. At the same time, deep learning has shown its capacity in solving many graph-based tasks like, node classification, link prediction, and graph classification. Recently, many studies are extending graph learning models for solving anomaly analytics problems, resulting in beneficial advances in graph-based anomaly analytics techniques. In this survey, we provide a comprehensive overview of graph learning methods for anomaly analytics tasks. We classify them into four categories based on their model architectures, namely graph convolutional network (GCN), graph attention network (GAT), graph autoencoder (GAE), and other graph learning models. The differences between these methods are also compared in a systematic manner. Furthermore, we outline several graph-based anomaly analytics applications across various domains in the real world. Finally, we discuss five potential future research directions in this rapidly growing field.
... Social Networks: We use Facebook (FB) (Pfeiffer III et al., 2015;Moore and Neville, 2017), BlogCatalog (BLOG) (Wang et al., 2010), and Reddit dataset (Hamilton et al., 2017). In the FB dataset, the nodes are FB users and the task is to predict the political views of a user given the gender and religious view of the user as features. ...
Many real-world applications deal with data that have an underlying graph structure associated with it. To perform downstream analysis on such data, it is crucial to capture relational information of nodes over their expanded neighborhood efficiently. Herein, we focus on the problem of Collective Classification (CC) for assigning labels to unlabeled nodes. Most deep learning models for CC heavily rely on differentiable variants of Weisfeiler-Lehman (WL) kernels. However, due to current computing architectures' limitations, WL kernels and their differentiable variants are limited in their ability to capture useful relational information only over a small expanded neighborhood of a node. To address this concern, we propose the framework, I-HOP, that couples differentiable kernels with an iterative inference mechanism to scale to larger neighborhoods. I-HOP scales differentiable graph kernels to capture and summarize information from a larger neighborhood in each iteration by leveraging a historical neighborhood summary obtained in the previous iteration. This recursive nature of I-HOP provides an exponential reduction in time and space complexity over straightforward differentiable graph kernels. Additionally, we point out a limitation of WL kernels where the node's original information is decayed exponentially with an increase in neighborhood size and provide a solution to address it. Finally, extensive evaluation across 11 datasets showcases the improved results and robustness of our proposed iterative framework, I-HOP.
... Besides interesting from a theoretical point of view (see above), the necessity of using overlapping clusters has also been acknowledged in empirical research (Banerjee et al., 2005). First, overlapping clustering is often successfully used in community detection problems (Tang & Liu, 2009;Wang et al., 2010;Fellows et al., 2011;Bonchi et al., 2013). Take as an example social networks in which subjects could be a member of multiple communities (Azaouzi et al., 2019), biological protein networks where proteins are part of various protein complexes simultaneously (Palla et al., 2008), or genetic networks, in which genes influence multiple cellular functions and as such belong to several metabolic pathways (Segal et al., 2003;Battle et al., 2004;Hastie et al., 2000). ...
In various scientific fields, researchers make use of partitioning methods (e.g., K -means) to disclose the structural mechanisms underlying object by variable data. In some instances, however, a grouping of objects into clusters that are allowed to overlap (i.e., assigning objects to multiple clusters) might lead to a better representation of the underlying clustering structure. To obtain an overlapping object clustering from object by variable data, Mirkin’s ADditive PROfile CLUStering (ADPROCLUS) model may be used. A major challenge when performing ADPROCLUS is to determine the optimal number of overlapping clusters underlying the data, which pertains to a model selection problem. Up to now, however, this problem has not been systematically investigated and almost no guidelines can be found in the literature regarding appropriate model selection strategies for ADPROCLUS. Therefore, in this paper, several existing model selection strategies for K -means (a.o., CHull, the Caliński-Harabasz, Krzanowski-Lai, Average Silhouette Width and Dunn Index and information-theoretic measures like AIC and BIC) and two cross-validation based strategies are tailored towards an ADPROCLUS context and are compared to each other in an extensive simulation study. The results demonstrate that CHull outperforms all other model selection strategies and this especially when the negative log-likelihood, which is associated with a minimal stochastic extension of ADPROCLUS, is used as (mis)fit measure. The analysis of a post hoc AIC-based model selection strategy revealed that better performance may be obtained when a different—more appropriate—definition of model complexity for ADPROCLUS is used.
... Datasets. We use 11 publicly available real-world datasets, in which Blog is from [5,49] and the others are downloaded Fig. 11 The average number of i-hop reachable neighbors for each dataset from SNAP [29] and Webgraph [6]. These graphs are preprocessed as unlabelled and undirected simple graphs. ...
Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks and thus cannot handle large real-world graphs. In this paper, we propose a new framework, namely StructSim, to compute nodes’ role similarity. Under this framework, we first prove that StructSim is an admissible role similarity metric based on the maximum matching. While the maximum matching is still too costly to scale, we then devise the BinCount matching that not only is efficient to compute but also guarantees the admissibility of StructSim. BinCount-based StructSim admits a precomputed index to query a single pair of node in O(klogD) time, where k is a small user-defined parameter and D is the maximum node degree. To build the index, we further devise an FM-sketch-based technique that can handle graphs with billions of edges. Extensive empirical studies show that StructSim performs much better than the existing works regarding both effectiveness and efficiency when applied to compute structural node similarities on the real-world graphs.
... Because of the potential overlap between memberships and interests, many researchers have studied communities with an overlapping nature, for example, Rees and Gallagher (2012). Overlapping user communities based on co-clustering of tags and users have been studied by Wang et al. (2010). The proposed methodology framework for ad hoc community detection aims at identifying discrete social groups of shared interest topics, as opposed to social groups with an overlapping nature. ...
The contents of this volume are contributions from invited speakers at a
workshop entitled “Data Analysis for Cyber-Security”, hosted by the University of Bristol in March 2013. We are grateful for the generous support of
the Heilbronn Institute for Mathematical Research, an academic research
unit of the Universisty of Bristol with interests related to cyber-security.
... Two approaches known as RaRe (starts the clustering process from high ranking nodes) and IS (uses candidate clusters as the starting point) were developed in [16] in attempt to find an optimal solution in overlapping clustering on synthetic and real world graphs. Relationship between nodes were used to discover overlapped clusters in a social network [17], where information among users (user data) and tags (connections between users) were utilized. A soft clustering approach based on a genetic algorithm [18] has an application in this type of clustering too. ...
... This measure compares different clusters, and whenever its value is high, it means that the two clusters are similar (Amelio and Pizzuti, 2017). If clusters X and Y are precisely the same, their NMI is equal to one (Wang et al., 2010). ...
Social Networking Services (SNSs) connect people worldwide, where they communicate through sharing contents, photos, videos, posting their first-hand opinions, comments, and following their friends. Social networks are characterized by velocity, volume, value, variety, and veracity, the 5 V’s of big data. Hence, big data analytic techniques and frameworks are commonly exploited in Social Network Analysis (SNA). By the ever-increasing growth of social networks, the analysis of social data, to describe and find communication patterns among users and understand their behaviors, has attracted much attention. In this paper, we demonstrate how big data analytics meets social media, and a comprehensive review is provided on big data analytic approaches in social networks to search published studies between 2013 and August 2020, with 74 identified papers. The findings of this paper are presented in terms of main journals/conferences, yearly distributions, and the distribution of studies among publishers. Furthermore, the big data analytic approaches are classified into two main categories: Content-oriented approaches and network-oriented approaches. The main ideas, evaluation parameters, tools, evaluation methods, advantages, and disadvantages are also discussed in detail. Finally, the open challenges and future directions that are worth further investigating are discussed.
... For example, Palla et al. [97] introduced an approach to analyze the main statistical features of the interwoven sets of overlapping communities based on the Clique Percolation Method (CPM). • Use link partition rather than note partition to detect overlapping community [98], [99]. The main idea underlying this method is that a note can only belong to one community but it may have several edges, which means that a note can be assigned to multiple communities as long as its edges can be assigned to multiple communities. ...
Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on data-driven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.