Conference Paper

ArnetMiner: extraction and mining of academic social networks

DOI: 10.1145/1401890.1402008 In proceeding of: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Source: DBLP
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of active learning for networked data, where samples are connected with links and their labels are correlated with each other. We particularly focus on the setting of using the probabilistic graphical model to model the networked data, due to its effectiveness in capturing the dependency between labels of linked samples. We propose a novel idea of connecting the graphical model to the information diffusion process, and precisely define the active learning problem based on the non-progressive diffusion model. We show the NP-hardness of the problem and propose a method called MaxCo to solve it. We derive the lower bound for the optimal solution for the active learning setting, and develop an iterative greedy algorithm with provable approximation guarantees. We also theoretically prove the convergence and correctness of MaxCo. We evaluate MaxCo on four different genres of datasets: Coauthor, Slashdot, Mobile, and Enron. Our experiments show a consistent improvement over other competing approaches.
    Proceedings of the 7th ACM international conference on Web search and data mining; 02/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper a simple but efficient real-time detecting algorithm is proposed for tracking community structure of dynamic networks. Community structure is intuitively characterized as divisions of network nodes into subgroups, within which nodes are densely connected while between which they are sparsely connected. To evaluate the quality of community structure of a network, a metric called modularity is proposed and many algorithms are developed on optimizing it. However, most of the modularity based algorithms deal with static networks and cannot be performed frequently, due to their high computing complexity. In order to track the community structure of dynamic networks in a fine-grained way, we propose a modularity based algorithm that is incremental and has very low computing complexity. In our algorithm we adopt a two-step approach. Firstly we apply the algorithm of Blondel et al for detecting static communities to obtain an initial community structure. Then, apply our incremental updating strategies to track the dynamic communities. The performance of our algorithm is measured in terms of the modularity. We test the algorithm on tracking community structure of Enron Email and three other real world datasets. The experimental results show that our algorithm can keep track of community structure in time and outperform the well known CNM algorithm in terms of modularity.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Access movement and the research management can take a new turn if the research information is published as Linked Open Data. With Linked Open Data, the management of the research information within institutions and across institutions can be facilitated, the quality of the available data can be improved and their availability to the public is assured. However, it can be difficult for non-expert users to take advantage of the interlinked information offered by Linked Open Data as they lack of in- depth knowledge. In this paper, we present a use case of publishing research metadata as Linked Open Data and creating interactive visualizations to support users in analyzing the Flemish research landscape.
    Procedia Computer Science. 01/2014; 33:245–252.

Full-text (2 Sources)

Available from
May 16, 2014