Article

A Local Search Approach to Efficient ( k,p )-Core Maintenance

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The (( k,p ))-core model was recently proposed to capture engagement dynamics by considering both intra-community interactions (i.e., the k -core structure) and inter-community interactions (i.e., the p -fraction property). It is a refinement of the classic k -core, by introducing an extra parameter p to customize the engagement within a community at a finer granularity. In this paper, we study the problem of maintaining all (k,p)-cores (essentially, maintaining the p-numbers for all vertices) for dynamic graphs. The existing Global approach conducts a global peeling, almost from scratch, for all vertices whose old p-numbers are within a computed range [p - ,p + ], and thus is inefficient. We propose a new Local approach which conducts local searches starting from the two end-points of the newly inserted or deleted edge, and then iteratively expands the search frontier by including their neighbors. Our algorithm is designed based on several fundamental properties that we prove in this paper to characterize the necessary condition for a vertex's p-number to change. Compared to Global, our Local approach implicitly obtains the optimal affected p-number range [p - * ,p + * ] ⊆ [p - ,p + ], and further skips many vertices whose p-numbers are within this range. Experimental results show that Local is on average two orders of magnitude faster than Global.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this work, we performed an analysis of the networks of interactions between drugs and their targets to assess how connected the compounds are. For our purpose, the interactions were downloaded from the DrugBank database, and we considered all drugs approved by the FDA. Based on topological analysis of this interaction network, we obtained information on degree, clustering coefficient, connected components, and centrality of these interactions. We identified that this drug-target interaction network cannot be divided into two disjoint and independent sets, i.e., it is not bipartite. In addition, the connectivity or associations between every pair of nodes identified that the drug-target network is constituted of 165 connected components, where one giant component contains 4376 interactions that represent 89.99% of all the elements. In this regard, the histamine H1 receptor, which belongs to the family of rhodopsin-like G-protein-coupled receptors and is activated by the biogenic amine histamine, was found to be the most important node in the centrality of input-degrees. In the case of centrality of output-degrees, fostamatinib was found to be the most important node, as this drug interacts with 300 different targets, including arachidonate 5-lipoxygenase or ALOX5, expressed on cells primarily involved in regulation of immune responses. The top 10 hubs interacted with 33% of the target genes. Fostamatinib stands out because it is used for the treatment of chronic immune thrombocytopenia in adults. Finally, 187 highly connected sets of nodes, structured in communities, were also identified. Indeed, the largest communities have more than 400 elements and are related to metabolic diseases, psychiatric disorders and cancer. Our results demonstrate the possibilities to explore these compounds and their targets to improve drug repositioning and contend against emergent diseases.
Article
Full-text available
Sex steroid hormones have been shown to alter regional brain activity, but the extent to which they modulate connectivity within and between large-scale functional brain networks over time has yet to be characterized. Here, we applied dynamic community detection techniques to data from a highly sampled female with 30 consecutive days of brain imaging and venipuncture measurements to characterize changes in resting-state community structure across the menstrual cycle. Four stable functional communities were identified, consisting of nodes from visual, default mode, frontal control, and somatomotor networks. Limbic, subcortical, and attention networks exhibited higher than expected levels of nodal flexibility, a hallmark of between-network integration and transient functional reorganization. The most striking reorganization occurred in a default mode subnetwork localized to regions of the prefrontal cortex, coincident with peaks in serum levels of estradiol, luteinizing hormone, and follicle stimulating hormone. Nodes from these regions exhibited strong intranetwork increases in functional connectivity, leading to a split in the stable default mode core community and the transient formation of a new functional community. Probing the spatiotemporal basis of human brain–hormone interactions with dynamic community detection suggests that hormonal changes during the menstrual cycle result in temporary, localized patterns of brain network reorganization.
Article
Full-text available
With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization, friend recommendation, and so on. Consequently, how to efficiently find high-quality communities from big graphs is an important research topic in the era of big data. Recently, a large group of research works, called community search, have been proposed. They aim to provide efficient solutions for searching high-quality communities from large networks in real time. Nevertheless, these works focus on different types of graphs and formulate communities in different manners, and thus, it is desirable to have a comprehensive review of these works. In this survey, we conduct a thorough review of existing community search works. Moreover, we analyze and compare the quality of communities under their models, and the performance of different solutions. Furthermore, we point out new research directions. This survey does not only help researchers to have better understanding of existing community search solutions, but also provides practitioners a better judgment on choosing the proper solutions.
Article
Full-text available
We consider the community search problem defined upon a large graph G: given a query vertex q in G, to find as output all the densely connected subgraphs of G, each of which contains the query v. As an online, query-dependent variant of the well-known community detection problem, community search enables personalized community discovery that has found widely varying applications in real-world, large-scale graphs. In this paper, we study the community search problem in the truss-based model aimed at discovering all dense and cohesive k-truss communities to which the query vertex q belongs. We introduce a novel equivalence relation, k-truss equivalence, to model the intrinsic density and cohesiveness of edges in k-truss communities. Consequently, all the edges of G can be partitioned to a series of k-truss equivalence classes that constitute a space-efficient, truss-preserving index structure, EquiTruss. Community search can be henceforth addressed directly upon EquiTruss without repeated, time-demanding accesses to the original graph, G, which proves to be theoretically optimal. In addition, EquiTruss can be efficiently updated in a dynamic fashion when G evolves with edge insertion and deletion. Experimental studies in real-world, large-scale graphs validate the efficiency and effectiveness of EquiTruss, which has achieved at least an order of magnitude speedup in community search over the state-of-the-art method, TCP-Index.
Article
Full-text available
Graphs have been widely used in many applications such as social networks, collaboration networks, and biological networks. One important graph analytics is to explore cohesive subgraphs in a large graph. Among several cohesive subgraphs studied, k-core is one that can be computed in linear time for a static graph. Since graphs are evolving in real applications, in this paper, we study core maintenance which is to reduce the computational cost to compute k-cores for a graph when graphs are updated from time to time dynamically. We identify drawbacks of the existing efficient algorithm, which needs a large search space to find the vertices that need to be updated, and has high overhead to maintain the index built, when a graph is updated. We propose a new order-based approach to maintain an order, called k-order, among vertices, while a graph is updated. Our new algorithm can significantly outperform the state-of-the-art algorithm up to 3 orders of magnitude for the 11 large real graphs tested. We report our findings in this paper.
Article
Full-text available
Common asset holding by financial institutions (portfolio overlap) is nowadays regarded as an important channel for financial contagion with the potential to trigger re sales and severe losses at the systemic level. We propose a method to assess the statistical significance of the overlap between heterogeneously diversi ed portfolios, which we use to build a validated network of financial institutions where links indicate potential contagion channels. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be applied to any bipartite network. We find that the proportion of validated links (i.e. of signi cant overlaps) increased steadily before the 2007–2008 financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from re sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013. We finally show that market trends tend to be ampli ed in the portfolios identify ed by the algorithm, such that it is possible to have an informative signal about institutions that are about to su er (enjoy) the most significant losses (gains).
Book
Full-text available
User engagement refers to the quality of the user experience that emphasizes the positive aspects of interacting with an online application and, in particular, the desire to use that application longer and repeatedly. User engagement is a key concept in the design of online applications (whether for desktop, tablet or mobile), motivated by the observation that successful applications are not just used, but are engaged with. Users invest time, attention, and emotion in their use of technology, and seek to satisfy pragmatic and hedonic needs. Measurement is critical for evaluating whether online applications are able to successfully engage users, and may inform the design of and use of applications. User engagement is a multifaceted, complex phenomenon; this gives rise to a number of potential measurement approaches. Common ways to evaluate user engagement include using self-report measures, e.g., questionnaires; observational methods, e.g. facial expression analysis, speech analysis; neuro-physiological signal processing methods, e.g., respiratory and cardiovascular accelerations and decelerations, muscle spasms; and web analytics, e.g., number of site visits, click depth. These methods represent various trade-offs in terms of the setting (laboratory versus ``in the wild''), object of measurement (user behaviour, affect or cognition) and scale of data collected. For instance, small-scale user studies are deep and rich, but limited in terms of generalizability, whereas large-scale web analytic studies are powerful but negate users' motivation and context. The focus of this book is how user engagement is currently being measured and various considerations for its measurement. Our goal is to leave readers with an appreciation of the various ways in which to measure user engagement, and their associated strengths and weaknesses. We emphasize the multifaceted nature of user engagement and the unique contextual constraints that come to bear upon attempts to measure engagement in different settings, and across different user groups and web domains. At the same time, this book advocates for the development of ``good'' measures and good measurement practices that will advance the study of user engagement and improve our understanding of this construct, which has become so vital in our wired world.
Conference Paper
Full-text available
Community detection which discovers densely connected structures in a network has been studied a lot. In this paper, we study on-line community search which is practically useful but less studied in the literature. Given a query vertex in a graph, the problem is to find meaningful communities that the vertex belongs to in an online manner. We propose a novel community model based on the k-truss concept, which brings nice structural and computational properties. We design a compact and elegant index structure which supports the efficient search of k-truss communities with a linear cost with respect to the community size. In addition, we investigate the k-truss community search problem in a dynamic graph setting with frequent insertions and deletions of graph vertices and edges. Ex-tensive experiments on large real-world networks demonstrate the effectiveness and efficiency of our community model and search algorithms.
Conference Paper
Full-text available
We consider a model of user engagement in social networks, where each player incurs a cost to remain engaged but derives a benefit proportional to the number of engaged neighbors. The natural equilibrium of this model corresponds to the k-core of the social network — the maximal induced subgraph with minimum degree at least k. We study the problem of “anchoring” a small number of vertices to maximize the size of the corresponding anchored k-core — the maximal induced subgraph in which every non-anchored vertex has degree at least k. This problem corresponds to preventing “unraveling” — a cascade of iterated withdrawals. We provide polynomial-time algorithms for general graphs with k = 2, and for bounded-treewidth graphs with arbitrary k. We prove strong inapproximability results for general graphs and k ≥ 3.
Conference Paper
Full-text available
Vibrant online communities are in constant flux. As members join and depart, the interactional norms evolve, stimulating further changes to the membership and its social dynamics. Linguistic change --- in the sense of innovation that becomes accepted as the norm --- is essential to this dynamic process: it both facilitates individual expression and fosters the emergence of a collective identity. We propose a framework for tracking linguistic change as it happens and for understanding how specific users react to these evolving norms. By applying this framework to two large online communities we show that users follow a determined two-stage lifecycle with respect to their susceptibility to linguistic change: a linguistically innovative learning phase in which users adopt the language of the community followed by a conservative phase in which users stop changing and the evolving community norms pass them by. Building on this observation, we show how this framework can be used to detect, early in a user's career, how long she will stay active in the community. Thus, this work has practical significance for those who design and maintain online communities. It also yields new theoretical insights into the evolution of linguistic norms and the complex interplay between community-level and individual-level linguistic change.
Article
Full-text available
We empirically analyze five online communities: Friendster, Livejournal, Facebook, Orkut, Myspace, to identify causes for the decline of social networks. We define social resilience as the ability of a community to withstand changes. We do not argue about the cause of such changes, but concentrate on their impact. Changes may cause users to leave, which may trigger further leaves of others who lost connection to their friends. This may lead to cascades of users leaving. A social network is said to be resilient if the size of such cascades can be limited. To quantify resilience, we use the k-core analysis, to identify subsets of the network in which all users have at least k friends. These connections generate benefits (b) for each user, which have to outweigh the costs (c) of being a member of the network. If this difference is not positive, users leave. After all cascades, the remaining network is the k-core of the original network determined by the cost-to-benefit c/b ratio. By analysing the cumulative distribution of k-cores we are able to calculate the number of users remaining in each community. This allows us to infer the impact of the c/b ratio on the resilience of these online communities. We find that the different online communities have different k-core distributions. Consequently, similar changes in the c/b ratio have a different impact on the amount of active users. As a case study, we focus on the evolution of Friendster. We identify time periods when new users entering the network observed an insufficient c/b ratio. This measure can be seen as a precursor of the later collapse of the community. Our analysis can be applied to estimate the impact of changes in the user interface, which may temporarily increase the c/b ratio, thus posing a threat for the community to shrink, or even to collapse.
Article
Full-text available
User engagement is a key concept in designing user-centred web applications. It refers to the quality of the user experi-ence that emphasises the positive aspects of the interaction, and in particular the phenomena associated with being cap-tivated by technology. This definition is motivated by the observation that successful technologies are not just used, but they are engaged with. Numerous methods have been proposed in the literature to measure engagement, however, little has been done to validate and relate these measures and so provide a firm basis for assessing the quality of the user experience. Engagement is heavily influenced, for ex-ample, by the user interface and its associated process flow, the user's context, value system and incentives. In this paper we propose an approach to relating and de-veloping unified measures of user engagement. Our ulti-mate aim is to define a framework in which user engagement can be studied, measured, and explained, leading to recom-mendations and guidelines for user interface and interaction design for front-end web technology. Towards this aim, in this paper, we consider how existing user engagement met-rics, web analytics, information retrieval metrics, and mea-sures from immersion in gaming can bring new perspective to defining, measuring and explaining user engagement.
Article
Full-text available
The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads, political opinions, the adoption of new technologies, and financial decisions. Traditional models of social contagion have been based on physical analogies with biological contagion, in which the probability that an individual is affected by the contagion grows monotonically with the size of his or her "contact neighborhood"--the number of affected individuals with whom he or she is in contact. Whereas this contact neighborhood hypothesis has formed the underpinning of essentially all current models, it has been challenging to evaluate it due to the difficulty in obtaining detailed data on individual network neighborhoods during the course of a large-scale contagion process. Here we study this question by analyzing the growth of Facebook, a rare example of a social process with genuinely global adoption. We find that the probability of contagion is tightly controlled by the number of connected components in an individual's contact neighborhood, rather than by the actual size of the neighborhood. Surprisingly, once this "structural diversity" is controlled for, the size of the contact neighborhood is in fact generally a negative predictor of contagion. More broadly, our analysis shows how data at the size and resolution of the Facebook network make possible the identification of subtle structural signals that go undetected at smaller scales yet hold pivotal predictive roles for the outcomes of social processes.
Conference Paper
Full-text available
A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph G, and a set of query nodes in the graph, we seek to find a subgraph of G that contains the query nodes and it is densely connected. We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.
Conference Paper
Full-text available
Social influence determines to a large extent what we adopt and when we adopt it.This is just as true in the digi- tal domain as it is in real life, and has become of increas- ing importance due to the deluge of user-created content on the Internet. In this paper, we present an empirical study of user-to-user content transfer occurring in the context of a time-evolving social network in Second Life, a massively multiplayer virtual world. We identify and model social influence based on the change in adoption rate following the actions of one's friends and find that the social network plays a significant role in the adoption of content. Adoption rates quicken as the number of friends adopting increases and this e!ect varies with the connectivity of a particular user. We further find that sharing among friends occurs more rapidly than sharing among strangers, but that content that di!uses primarily through social influence tends to have a more lim- ited audience. Finally, we examine the role of individuals, finding that some play a more active role in distributing content than others, but that these influencers are distinct from the early adopters.
Article
Full-text available
The structure of large networks can be revealed by partitioning them to smaller parts, which are easier to handle. One of such decompositions is based on k{cores, proposed in 1983 by Seidman. In the paper an ecien t, O(m), m is the number of lines, algorithm for determining the cores decomposition of a given simple network is presented. An application on the authors collaboration network in computational geometry is presented. The paper was published as http://www.springerlink.com/content/c6472216637p57w4/
Article
Community search studies the retrieval of certain community structures containing query vertices, which has received lots of attention recently. k -truss is a fundamental community structure where each edge is contained in at least k - 2 triangles. Triangle-connected k -truss community ( k -TTC) is a widely-used variant of k -truss, which is a maximal k -truss where edges can reach each other via a series of edge-adjacent triangles. Although existing works have provided indexes and query algorithms for k -TTC search, the cohesiveness of a k -TTC (diameter upper bound) has not been theoretically analyzed and the triangle connectivity has not been efficiently captured. Thus, we revisit the k -TTC search problem in dynamic graphs, aiming to achieve a deeper understanding of k -TTC. First, we prove that the diameter of a k -TTC with n vertices is bounded by [EQUATION]. Then, we encapsulate triangle connectivity with two novel concepts, partial class and truss-precedence, based on which we build our compact index, EquiTree, to support the efficient k -TTC search. We also provide efficient index construction and maintenance algorithms for the dynamic change of graphs. Compared with the state-of-the-art methods, our extensive experiments show that EquiTree can boost search efficiency up to two orders of magnitude at a small cost of index construction and maintenance.
Article
As a fundamental problem in graph analysis, core decomposition aims to compute the core numbers of vertices in a given graph. It is a powerful tool for mining important graph structures. For dynamic graphs with real-time updates of vertices/edges, core maintenance has been utilized to update the core numbers of vertices. The previous approaches to core maintenance face challenges in terms of storage and efficiency. In this paper, we investigate distributed approaches to core maintenance on a pregel-like system, which is a famous graph computing system. We first design a core decomposition algorithm to obtain core numbers of vertices in a given graph. Based on it, a distributed batch-stream combined algorithm (DBCA) is devised to efficiently maintain the core numbers when vertex/edge updates happen. In particular, we introduce a new task assignment strategy to DBCA based on diversity of the edge-cores of updated edges. To ensure that DBCA can accurately process core maintenance, we develop a message interaction protocol to resolve the problem of crosstalk among different tasks. Comprehensive experiments have been conducted on real/synthetic graphs, more specifically, in two typical distributed environments built on Supercomputing Center and Alibaba Cloud. The experiment results demonstrate that our proposed algorithms are efficient and scalable.
Article
The model of k -core and its decomposition have been applied in various areas, such as social networks, the world wide web, and biology. A graph can be decomposed into an elegant k -core hierarchy to facilitate cohesive subgraph discovery and network analysis. As many real-life graphs are fast evolving, existing works proposed efficient algorithms to maintain the coreness value of every vertex against structure changes. However, the maintenance of the k -core hierarchy in existing studies is not complete because the connections among different k -cores in the hierarchy are not considered. In this paper, we study hierarchical core maintenance which is to compute the k -core hierarchy incrementally against graph dynamics. The problem is challenging because the change of hierarchy may be large and complex even for a slight graph update. In order to precisely locate the area affected by graph dynamics, we conduct in-depth analyses on the structural properties of the hierarchy, and propose well-designed local update techniques. Our algorithms significantly outperform the baselines on runtime by up to 3 orders of magnitude, as demonstrated on 10 real-world large graphs.
Chapter
Cohesive subgraphs are applied in various fields. Mining cohesive components such as k-truss have attracted a lot of effort to improve time efficiency in large-scale graphs. The k-truss is a subgraph where each edge is contained in at least k2k-2 triangles and the problem of truss decomposition is computing the k-trusses of a graph for all k. However, most graphs in real scenarios are usually changing over time. The previous studies take the static graphs as input, and the truss maintenance in dynamic graphs receives little attention. This paper focuses on distributed algorithms for truss maintenance. We present a distributed model underlying the real distributed processing model Pregel. Based on the model, we propose truss decomposition and truss maintenance algorithms. To confirm the effectiveness and efficiency of the proposed algorithms, we conduct extensive experiments over both real-world and synthetic graphs.
Conference Paper
Due to the ubiquity of graphs, graph analytics has attracted much attention from both research and industry communities. The notion of k-truss is widely used in graph analytics. Since graphs are continuously evolving in real applications and it is costly to compute trusses from scratch, we study the problem of truss maintenance which aims at designing efficient incremental algorithms to update trusses when graphs are updated with changes. An incremental algorithm is desired to be bounded; that is, its cost is of O(f(\|\textttCHANGED \|_c)) for some polynomial function f and some positive integer c, where \textttCHANGED comprises the changes to both the graph and the result and \|\textttCHANGED \|_c is the size of the c-hop neighborhood of \textttCHANGED . An incremental problem is bounded if it has a bounded incremental algorithm and is unbounded otherwise. Under the model of locally persistent algorithms, we prove that truss maintenance is bounded under edge removals but is unbounded even for unit edge insertions. To address the unboundedness, we formulate a new notion \textttAFF ^\preceq which, as a practically effective alternative to \textttCHANGED , represents a set of edgesaffected by the changes to the graph, and devise an insertion algorithm that is bounded with respect to \textttAFF ^\preceq, while retaining the boundedness for edge removals. More specifically, our insertion algorithm runs in O(f(\|\textttAFF ^\preceq\|_c)) time for some polynomial function f and some positive integer c with \|\textttAFF ^\preceq\|_c being the size of the c-hop neighborhood of \textttAFF ^\preceq. Our extensive performance studies show that our new algorithms can significantly outperform the state-of-the-art by up to 3 orders of magnitude for the 12 large real graphs tested and are more efficient than computing trusses from scratch even for changes of non-trivial size. We report our findings in this paper.
Article
In this paper, we study the problem of the anchored k-core. Given a graph G, an integer k and a budget b, we aim to identify b vertices in G so that we can determine the largest induced subgraph J in which every vertex, except the b vertices, has at least k neighbors in J. This problem was introduced by Bhawalkar and Kleinberg et al. in the context of user engagement in social networks, where a user may leave a community if he/she has less than k friends engaged. The problem has been shown to be NP-hard and inapproximable. A polynomial-time algorithm for graphs with bounded tree-width has been proposed. However, this assumption usually does not hold in real-life graphs, and their techniques cannot be extended to handle general graphs. Motivated by this, we propose an efficient algorithm, namely o nion- l ayer based a nchored k -core (OLAK), for the anchored k-core problem on large scale graphs. To facilitate computation of the anchored k-core, we design an onion layer structure, which is generated by a simple onion-peeling-like algorithm against a small set of vertices in the graph. We show that computation of the best anchor can simply be conducted upon the vertices on the onion layers, which significantly reduces the search space. Based on the well-organized layer structure, we develop efficient candidates exploration, early termination and pruning techniques to further speed up computation. Comprehensive experiments on 10 real-life graphs demonstrate the effectiveness and efficiency of our proposed methods.
Article
In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we also consider the pairwise similarity between users based on their profiles. For a given similarity metric and a similarity threshold r, the similarity between any two vertices in a (k,r)-core is ensured not less than r. Efficient algorithms are proposed to enumerate all maximal (k,r)-cores and find the maximum (k,r)-core, where both problems are shown to be NP-hard. Effective pruning techniques significantly reduce the search space of two algorithms and a novel (k,k')-core based (k,r)-core size upper bound enhances performance of the maximum (k,r)-core computation. We also devise effective search orders to accommodate the different nature of two mining algorithms. Comprehensive experiments on real-life data demonstrate that the maximal/maximum (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of two mining algorithms is significantly improved by proposed techniques.
Article
Given a graph G and a vertex q ∈ G, the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we investigate the attributed community query (or ACQ), which returns an attributed community (AC) for an attributed graph. The AC is a subgraph of G, which satisfies both structure cohesiveness (i.e., its vertices are tightly connected) and keyword cohesiveness (i.e., its vertices share common keywords). The AC enables a better understanding of how and why a community is formed (e.g., members of an AC have a common interest in music, because they all have the same keyword "music"). An AC can be "personalized"; for example, an ACQ user may specify that an AC returned should be related to some specific keywords like "research" and "sports". To enable efficient AC search, we develop the CL-tree index structure and three algorithms based on it. We evaluate our solutions on four large graphs, namely Flickr, DBLP, Tencent, and DBpedia. Our results show that ACs are more effective and efficient than existing community retrieval approaches. Moreover, an AC contains more precise and personalized information than that of existing community search and detection methods.
Article
Each player in an infinite population interacts strategically with a finite subset of that population. Suppose each player's binary choice in each period is a best response to the population choices of the previous period. When can behaviour that is initially played by only a finite set of players spread to the whole population? This paper characterizes when such contagion is possible for arbitrary local interaction systems. Maximal contagion occurs when local interaction is sufficiently uniform and there is low neighbour growth, i.e. the number of players who can be reached in k steps does not grow exponentially in k.
Article
Although analyzing user behavior within individual communities is an active and rich research domain, people usually interact with multiple communities both on- and off-line. How do users act in such multi-community environments? Although there are a host of intriguing aspects to this question, it has received much less attention in the research community in comparison to the intra-community case. In this paper, we examine three aspects of multi-community engagement: the sequence of communities that users post to, the language that users employ in those communities, and the feedback that users receive, using longitudinal posting behavior on Reddit as our main data source, and DBLP for auxiliary experiments. We also demonstrate the effectiveness of features drawn from these aspects in predicting users' future level of activity. One might expect that a user's trajectory mimics the "settling-down" process in real life: an initial exploration of sub-communities before settling down into a few niches. However, we find that the users in our data continually post in new communities; moreover, as time goes on, they post increasingly evenly among a more diverse set of smaller communities. Interestingly, it seems that users that eventually leave the community are "destined" to do so from the very beginning, in the sense of showing significantly different "wandering" patterns very early on in their trajectories; this finding has potentially important design implications for community maintainers. Our multi-community perspective also allows us to investigate the "situation vs. personality" debate from language usage across different communities.
Conference Paper
Given a large social graph, how can we model the engagement properties of nodes? Can we quantify engagement both at node level as well as at graph level? Typically, engagement refers to the degree that an individual participates (or is encouraged to participate) in a community and is closely related to the important property of nodes' departure dynamics, i.e., the tendency of individuals to leave the community. In this paper, we build upon recent work in the field of game theory, where the behavior of individuals (nodes) is modeled by a technology adoption game. That is, the decision of a node to remain engaged in the graph is affected by the decision of its neighbors, and the "best practice" for each individual is captured by its core number - as arises from the k-core decomposition. After modeling and defining the engagement dynamics at node and graph level, we examine whether they depend on structural and topological features of the graph. We perform experiments on a multitude of real graphs, observing interesting connections with other graph characteristics, as well as a clear deviation from the corresponding behavior of random graphs. Furthermore, similar to the well known results about the robustness of real graphs under random and targeted node removals, we discuss the implications of our findings on a special case of robustness - regarding random and targeted node departures based on their engagement level.
Article
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-Hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for streaming graph data. In this paper, we propose the first incremental k-core decomposition algorithms for streaming graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have to be updated, and efficiently process this subgraph to update the k-core decomposition. Our results show a significant reduction in run-time compared to non-incremental alternatives. We show the efficiency of our algorithms on different types of real and synthetic graphs, at different scales. For a graph of 16 million vertices, we observe speedups reaching a million times, relative to the non-incremental algorithms.
Article
A common way to evaluate the time complexity of an algorithm is to use asymptotic worst-case analysis and to express the cost of the computation as a function of the size of the input. However, for an incremental algorithm this kind of analysis is sometimes not very informative. (By an “incremental algorithm”, we mean an algorithm for a dynamic problem.) When the cost of the computation is expressed as a function of the size of the (current) input, several incremental algorithms that have been proposed run in time asymptotically no better, in the worst-case, than the time required to perform the computation from scratch. Unfortunately, this kind of information is not very helpful if one wishes to compare different incremental algorithms for a given problem.
Conference Paper
As the distribution of the video over the Internet becomes main- stream and its consumption moves from the computer to the TV screen, user expectation for high quality is constantly increasing. In this context, it is crucial for content providers to understand if and how video quality affects user engagement and how to best invest their resources to optimize video quality. This paper is a first step towards addressing these questions. We use a unique dataset that spans different content types, including short video on demand (VoD), long VoD, and live content from popular video con- tent providers. Using client-side instrumentation, we measure quality metrics such as the join time, buffering ratio, average bitrate, rendering quality, and rate of buffering events. We quantify user engagement both at a per-video (or view) level and a per-user (or viewer) level. In particular, we find that the percentage of time spent in buffering (buffering ratio) has the largest impact on the user engagement across all types of content. However, the magnitude of this impact depends on the content type, with live content being the most impacted. For example, a 1% increase in buffering ratio can reduce user engagement by more than three minutes for a 90-minute live video event. We also see that the average bitrate plays a significantly more important role in the case of live content than VoD content.
Social media engagement theory: Exploring the influence of user engagement on social media usage
  • Di Paul
  • Molly M Gangi
  • Wasko
  • Di Gangi Paul M
Why Engagement Matters: Cross-Disciplinary Perspectives and Innovations on User Engagement with Digital Media
  • O' Heather
  • Paul Brien
  • Cairns
  • O'Brien Heather