Conference Paper

ArnetMiner: extraction and mining of academic social networks

DOI: 10.1145/1401890.1402008 Conference: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Source: DBLP
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Network structure analysis plays an important role in characterizing complex systems. Different from previous network centrality measures, this article proposes the topological centrality measure reflecting the topological positions of nodes and edges as well as influence between nodes and edges in general network. Experiments on different networks show distinguished features of the topological centrality by comparing with the degree centrality, closeness centrality, betweenness centrality, information centrality, and PageRank. The topological centrality measure is then applied to discover communities and to construct the backbone network. Its characteristics and significance is further shown in e-Science applications.
    Journal of the American Society for Information Science and Technology 01/2010; 61:1824-1841. · 2.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the development of Web applications, textual documents are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about text-rich heterogeneous information networks. Topic models have been proposed and shown to be useful for document analysis, and the interactions among multi-typed objects play a key role at disclosing the rich semantics of the network. However, most of topic models only consider the textual information while ignore the network structures or can merely integrate with homogeneous networks. None of them can handle heterogeneous information network well. In this paper, we propose a novel topic model with biased propagation (TMBP) algorithm to directly incorporate heterogeneous information network with topic modeling in a unified way. The underlying intuition is that multi-typed objects should be treated differently along with their inherent textual information and the rich semantics of the heterogeneous information network. A simple and unbiased topic propagation across such a heterogeneous network does not make much sense. Consequently, we investigate and develop two biased propagation frameworks, the biased random walk framework and the biased regularization framework, for the TMBP algorithm from different perspectives, which can discover latent topics and identify clusters of multi-typed objects simultaneously. We extensively evaluate the proposed approach and compare to the state-of-the-art techniques on several datasets. Experimental results demonstrate that the improvement in our proposed approach is consistent and promising.
    Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google Knowledge Graph and Facebook social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets mining patterns in the single-graph setting. We resolve the "DCP-intuitiveness" dilemma by shifting the mining target from frequent subgraphs to frequent neighborhoods. A neighborhood is a specific topological pattern where a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant semantics as subgraph patterns. Experiments on real-life datasets display the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered in prior works.

Full-text (2 Sources)

Available from
May 16, 2014