Conference Paper

Finding Research Community in Collaboration Network with Expertise Profiling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

As a new task of expertise retrieval, finding research communities for scientific guidance and research cooperation has become more and more important. However, the existing community discovery algorithms only consider graph structure, without considering the context, such as knowledge characteristics. Therefore, detecting research community cannot be simply addressed by direct application of existing methods. In this paper, we propose a hierarchical discovery strategy which rapidly locates the core of the research community, and then incrementally extends the community. Especially, as expanding local community, it selects a node considering both its connection strength and expertise divergence to the candidate community, to prevent intellectually irrelevant nodes to spill-in to the current community. The experiments on ACL Anthology Network show our method is effective.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this scenario, particular interest is devoted to those real-world networks that exhibit a double structural nature: they are globally sparse but locally dense, i.e., they contain clusters of highly connected nodes (also called communities in social network analysis) that are loosely connected to each other (see, e.g., [2], [3], [4]). Examples include social and financial networks [5], [6], [7], [8], as well as biological and information networks [9], [10]. The visualization of networks of this type through a unique node-link diagram is sometimes unsatisfactory, due to the visual clutter caused by the high number of edges in the dense portions of the network (see, e.g., Fig. 1(a)). ...
Article
Full-text available
Hybrid visualizations combine different metaphors into a single network layout, in order to help humans in finding the “right way” of displaying the different portions of the network, especially when it is globally sparse and locally dense. We investigate hybrid visualizations in two complementary directions: (i) On the one hand, we evaluate the effectiveness of different hybrid visualization models through a comparative user study; (ii) On the other hand, we estimate the usefulness of an interactive visualization that integrates all the considered hybrid models together. The results of our study provide some hints about the usefulness of the different hybrid visualizations for specific tasks of analysis and indicates that integrating different hybrid models into a single visualization may offer a valuable tool of analysis.
... Many real-world networks, in a variety of application domains, exhibit a heterogeneous structure with a double nature: they are globally sparse but locally dense, i.e., they contain clusters of highly connected nodes (also called communities in social network analysis) that are loosely connected to each other (see, e.g., [29,31,50]). Examples include social and financial networks [16,26,49,58], as well as biological and information networks [28,43]. ...
Preprint
Full-text available
Hybrid visualizations mix different metaphors in a single layout of a network. In particular, the popular NodeTrix model, introduced by Henry, Fekete, and McGuffin in 2007, combines node-link diagrams and matrix-based representations to support the analysis of real-world networks that are globally sparse but locally dense. That idea inspired a series of works, proposing variants or alternatives to NodeTrix. We present a user study that compares the classical node-link model and three hybrid visualization models designed to work on the same types of networks. The results of our study provide interesting indications about advantages/drawbacks of the considered models on performing classical tasks of analysis. At the same time, our experiment has some limitations and opens up to further research on the subject.
... In particular, many networks in a variety of application domains are globally sparse but locally dense, i.e., they contain communities (or clusters) of highly connected nodes, and such communities are loosely connected to each other (see, e.g., [16,19,33]). Typical examples are social networks such as collaboration and financial networks [5,11,32,41]. Other examples include biological networks (e.g., metabolic and protein-protein interaction networks) and information networks; see, e.g., [15,23,30]. ...
Preprint
Full-text available
Many real-world networks are globally sparse but locally dense. Typical examples are social networks, biological networks, and information networks. This double structural nature makes it difficult to adopt a homogeneous visualization model that clearly conveys an overview of the network and the internal structure of its communities at the same time. As a consequence, the use of hybrid visualizations has been proposed. For instance, NodeTrix combines node-link and matrix-based representations (Henry et al., 2007). In this paper we describe ChordLink, a hybrid visualization model that embeds chord diagrams, used to represent dense subgraphs, into a node-link diagram, which shows the global network structure. The visualization is intuitive and makes it possible to interactively highlight the structure of a community while keeping the rest of the layout stable. We discuss the intriguing algorithmic challenges behind the ChordLink model, present a prototype system, and illustrate case studies on real-world networks.
... We got a total of 4397 different candidate words. Using a naive approach TF-IDF as an indicator [21], we recognized keywords from the set of candidate words for constructing the paper-word graph. If the TF-IDF of a word was greater than a TF-IDF threshold of 0:03, this word was selected as a keyword. ...
... Finally, a total of 4, 397 distinct word tokens are obtained. To identify keywords from all paper texts, a naive TF-IDF based method is adopted as an indicator [30]: if a word's TF-IDF value was greater than a threshold, this word was selected as a keyword. In our experiment, with a TF-IDF threshold of 0.03, 3, 704 distinct keywords were identified for the set of 13, 929 papers. ...
Article
Full-text available
Many states of the art citation recommendation methods have been proposed for finding a list of reference papers for a given manuscript, among which the graph-based method has gained particular attention, due to its flexibility for incorporating various information that embodies user’s preferences. To achieve a more synthetic, accurate, and personalized recommendation result than previous graph-based methods, this paper proposes a new graph-based recommendation framework that exploiting diversified link information in a bibliographic network and concise query information that embodies the specific requirement of user comprehensively. The proposed framework not only performs mutual reinforcement rules on all available multiple types of relations in a multi-layered graph but also incorporates the query information into the multi-layered mutual reinforcement schema to construct a Multi-layered Mutually Reinforced Queryfocused (MMRQ) citation recommendation approach. Extensive experiments have been conducted on a subset of AAN data set. Experimental results of Recall measures, NDCG measures and case study all demonstrate that our MMRQ method obtains a superior citation recommendation.
... We obtained a total of 4, 397 distinct candidate words. To construct the paper-word network, we identified keywords from the set of candidate words using a naive method with TF-IDF as an indicator [30]: if the TF-IDF of a word was greater than a threshold, this word was selected as a keyword. A total of 3, 704 distinct keywords were identified for the set of 13, 929 papers at a TF-IDF threshold of 0.03. ...
Article
Full-text available
In the era of big scholarly data, citation recommendation is playing an increasingly significant role as it solves information overload issues by automatically suggesting relevant references that align with researchers’ interests. Many state-ofthe- art models have been utilized for citation recommendation, among which graph-based models have garnered significant attention due to their flexibility in integrating rich information that influences users’ preferences. Co-authorship is one of the key relations in citation recommendation, but it is usually regarded as a binary relation in current graph-based models. This binary modeling of co-authorship is likely to result in information loss such as the loss of strong or weak relationships between specific research topics. To address this issue, we present a finegrained method for co-authorship modeling that incorporates the co-author network structure and the topics of their published articles. Then, we design a three-layered graph-based recommendation model that integrates fine-grained co-authorship as well as author-paper, paper-citation and paper-keyword relations. Our model effectively generates query-oriented recommendations using a simple random walk algorithm. Extensive experiments conducted on a subset of the AAN dataset for performance evaluation demonstrate that our method outperforms other models in terms of both Recall and NDCG.
Chapter
Due to the outbreak of COVID-19 in early 2020, a flood of information and rumors about the epidemic have filled the internet, causing panic in people’s lives. During the early period of the epidemic, public welfare information with active energy had played a key role in influencing online public opinion, alleviating public anxiety and mobilizing the entire society to fight against the epidemic. Therefore, analyzing the characteristics of public welfare communication in the early period can help us better develop strategies of public welfare communication in the post-epidemic era. In China, Sina Weibo is a microblog platform based on user relationships, and it is widely used by Chinese people. In this paper, we take the public welfare microblogs released by the Weibo public welfare account “@微公益” (Micro public welfare) in the early period of the epidemic as the research object. Firstly, we collected a total of 1863 blog posts from this account from January to April in 2020, and divided them into four stages by combining the Life Cycle Theory. Then the top 10 keywords from the blog posts of different stages were extracted using word frequency statistics. Finally, the LDA topic model were utilized to find out the topics of each stage whose characteristics of public welfare communication were analyzed in detail.
Chapter
Hybrid visualizations mix different metaphors in a single layout of a network. In particular, the popular NodeTrix model, introduced by Henry, Fekete, and McGuffin in 2007, combines node-link diagrams and matrix-based representations to support the analysis of real-world networks that are globally sparse but locally dense. That idea inspired a series of works, proposing variants or alternatives to NodeTrix. We present a user study that compares the classical node-link model and three hybrid visualization models designed to work on the same types of networks. The results of our study provide interesting indications about advantages/drawbacks of the considered models on performing classical tasks of analysis. At the same time, our experiment has some limitations and opens up to further research on the subject.
Article
Many real-world networks are globally sparse but locally dense. Typical examples are social networks, biological networks, and information networks. This double structural nature makes it difficult to adopt a homogeneous visualization model that clearly conveys both an overview of the network and the internal structure of its communities at the same time. As a consequence, the use of hybrid visualizations has been proposed. For instance, NodeTrix combines node-link and matrix-based representations (Henry et al. , 2007). In this article we describe ChordLink , a hybrid visualization model that embeds chord diagrams, used to represent dense subgraphs, into a node-link diagram, which shows the global network structure. The visualization makes it possible to interactively highlight the structure of a community while keeping the rest of the layout stable. We discuss the intriguing algorithmic challenges behind the ChordLink model, present a prototype system that implements it, and illustrate case studies on real-world networks.
Chapter
Many real-world networks are globally sparse but locally dense. Typical examples are social networks, biological networks, and information networks. This double structural nature makes it difficult to adopt a homogeneous visualization model that clearly conveys an overview of the network and the internal structure of its communities at the same time. As a consequence, the use of hybrid visualizations has been proposed. For instance, NodeTrix combines node-link and matrix-based representations (Henry et al., 2007). In this paper we describe ChordLink, a hybrid visualization model that embeds chord diagrams, used to represent dense subgraphs, into a node-link diagram, which shows the global network structure. The visualization is intuitive and makes it possible to interactively highlight the structure of a community while keeping the rest of the layout stable. We discuss the intriguing algorithmic challenges behind the ChordLink model, present a prototype system, and illustrate case studies on real-world networks.
Article
Full-text available
The data overload problem and the specific nature of the experts’ knowledge can hinder many users from finding experts with the expertise they required. There are several expert finding systems, which aim to solve the data overload problem and often recommend experts who can fulfil the users’ information needs. This study conducted a Systematic Literature Review on the state-of-the-art expert finding systems and expertise seeking studies published between 2010 and 2019. We used a systematic process to select ninety-six articles, consisting of 57 journals, 34 conference proceedings, three book chapters, and one thesis. This study analyses the domains of expert finding systems, expertise sources, methods, and datasets. It also discusses the differences between expertise retrieval and seeking. Moreover, it identifies the contextual factors that have been combined into expert finding systems. Finally, it identifies five gaps in expert finding systems for future research. This review indicated that ≈65% of expert finding systems are used in the academic domain. This review forms a basis for future expert finding systems research.
Conference Paper
Full-text available
We introduce the ACL Anthology Net- work (AAN), a manually curated net- worked database of citations, collaborations, and summaries in the field of Computational Linguistics. We also present a number of statistics about the network including the most cited authors, the most central collaborators, as well as network statistics about the paper citation, author citation, and author collaboration networks.
Conference Paper
Full-text available
Extracting information from very large collections of stru ctured, semi- structured or even unstructured data can be a considerable c hallenge when much of the hidden information is implicit within relationships among entities in the data. Social networks are such data collections in which rel ationships play a vital role in the knowledge these networks can convey. A bibliographic database is an essential tool for the research community, yet finding and ma king use of relation- ships comprised within such a social network is difficult. In this paper we intro- duce DBconnect, a prototype that exploits the social network coded within t he DBLP database by drawing on a new random walk approach to reveal interesting knowledge about the research community and even recommend collaborations.
Conference Paper
Full-text available
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
Full-text available
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks. Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Reports
Article
Full-text available
We survey some of the concepts, methods, and applications of community detection, which has become an increasingly important area of network science. To help ease newcomers into the field, we provide a guide to available methodology and open problems, and discuss why scientists from diverse backgrounds are interested in these problems. As a running theme, we emphasize the connections of community detection to problems in statistical physics and computational optimization. Comment: survey/review article on community structure in networks; published version is available at http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pdf
Article
Full-text available
Potential energy landscapes can be represented as a network of minima linked by transition states. The community structure of such networks has been obtained for a series of small Lennard-Jones (LJ) clusters. This community structure is compared to the concept of funnels in the potential energy landscape. Two existing algorithms have been used to find community structure, one involving removing edges with high betweenness, the other involving optimization of the modularity. The definition of the modularity has been refined, making it more appropriate for networks such as these where multiple edges and self-connections are not included. The optimization algorithm has also been improved, using Monte Carlo methods with simulated annealing and basin hopping, both often used successfully in other optimization problems. In addition to the small clusters, two examples with known heterogeneous landscapes, the 13-atom cluster (LJ13) with one labeled atom and the 38-atom cluster (LJ38) , were studied with this approach. The network methods found communities that are comparable to those expected from landscape analyses. This is particularly interesting since the network model does not take any barrier heights or energies of minima into account. For comparison, the network associated with a two-dimensional hexagonal lattice is also studied and is found to have high modularity, thus raising some questions about the interpretation of the community structure associated with such partitions.
Article
Full-text available
We propose a method to find the community structure in complex networks based on an extremal optimization of the value of modularity. The method outperforms the optimal modularity found by the existing algorithms in the literature giving a better understanding of the community structure. We present the results of the algorithm for computer-simulated and real networks and compare them with other approaches. The efficiency and accuracy of the method make it feasible to be used for the accurate identification of community structure in large complex networks.
Article
Full-text available
Detecting community structure is fundamental for uncovering the links between structure and function in complex networks and for practical applications in many disciplines such as biology and sociology. A popular method now widely used relies on the optimization of a quantity called modularity, which is a quality index for a partition of a network into communities. We find that modularity optimization may fail to identify modules smaller than a scale which depends on the total size of the network and on the degree of interconnectedness of the modules, even in cases where modules are unambiguously defined. This finding is confirmed through several examples, both in artificial and in real social, biological, and technological networks, where we show that modularity optimization indeed does not resolve a large number of modules. A check of the modules obtained through modularity optimization is thus necessary, and we provide here key elements for the assessment of the reliability of this community detection method. • complex networks • modular structure • metabolic networks • social networks
Conference Paper
Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play special roles - vertices that bridge clusters (hubs) and vertices that are marginally connected to clusters (outliers). Identifying hubs is useful for applications such as viral marketing and epidemiology since hubs are responsible for spreading ideas or disease. In contrast, outliers have little or no influence, and may be isolated as noise in the data. In this paper, we proposed a novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks. It clusters vertices based on a structural similarity measure. The algorithm is fast and efficient, visiting each vertex only once. An empirical evaluation of the method using both synthetic and real datasets demonstrates superior performance over other methods such as the modularity-based algorithms.
Article
The problem of academic expert finding is concerned with finding the experts on a named research field. It has many real-world applications and has recently attracted much attention. However, the existing methods are not versatile and suitable for the special needs from academic areas where the co-authorship and the citation relation play important roles in judging researchers’ achievements. In this paper, we propose and develop a flexible data schema and a topic-sensitive co-pagerank algorithmcombined with a topic model for solving this problem. The main idea is to measure the authors’ authorities by considering topic bias based on their social networks and citation networks, and then, recommending expert candidates for the questions. To infer the association between authors and topics, we draw a probability model from the latent Dirichlet allocation (LDA) model. We further propose several techniques such as reasoning the interested topics of the query and integrating ranking metrics to order the practices. Our experiments show that the proposed strategies are all effective to improve the retrieval accuracy.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known--a collaboration network and a food web--and find that it detects significant and informative community divisions in both cases.
Article
The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O (md log n) where d is the depth of the dendrogram describing the community structure. Many real-world networks are sparse and hierarchical, with m approximately n and d approximately log n, in which case our algorithm runs in essentially linear time, O (n log(2) n). As an example of the application of this algorithm we use it to analyze a network of items for sale on the web site of a large on-line retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2 x 10(6) edges. We show that our algorithm can extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers.
Article
Although the inference of global community structure in networks has recently become a topic of great interest in the physics community, all such algorithms require that the graph be completely known. Here, we define both a measure of local community structure and an algorithm that infers the hierarchy of communities that enclose a given vertex by exploring the graph one vertex at a time. This algorithm runs in time O(k2d) for general graphs when d is the mean degree and k is the number of vertices to be explored. For graphs where exploring a new vertex is time consuming, the running time is linear, O(k). We show that on computer-generated graphs the average behavior of this technique approximates that of algorithms that require global knowledge. As an application, we use this algorithm to extract meaningful local clustering information in the large recommender network of an online retailer.
Article
We consider the problem of detecting communities or modules in networks, groups of vertices with a higher-than-average density of edges connecting them. Previous work indicates that a robust approach to this problem is the maximization of the benefit function known as "modularity" over possible divisions of a network. Here we show that this maximization process can be written in terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations. This result leads us to a number of possible algorithms for detecting community structure, as well as several other results, including a spectral measure of bipartite structure in networks and a centrality measure that identifies vertices that occupy central positions within the communities to which they belong. The algorithms and measures proposed are illustrated with applications to a variety of real-world complex networks.
Conference Paper
Since research trends can change dynamically, researchers have to keep up with these new trends and undertake new research topics. Therefore, research communities for new research domains are important. In this paper, we propose a method to discover research communities. The key features of our method are a network model of papers and a word assignment technique for the communities obtained. We show our system based on the proposed method and discuss our system through case studies and experiments.
Article
We present a new benchmarking procedure that is unambiguous and specific to local community finding methods, allowing one to compare the accuracy of various methods. We apply this to new and existing algorithms. A simple class of synthetic benchmark networks is also developed, capable of testing properties specific to these local methods.
Scan: A Structural Clustering Algorithm for Networks
  • X Xu
  • N Y Z Feng
  • T A J Schweiger
  • X. Xu