Conference Paper

ArnetMiner: Extraction and mining of academic social networks

DOI: 10.1145/1401890.1402008 Conference: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Source: DBLP

ABSTRACT This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

Download full-text


Available from: Zhong Su, Aug 10, 2015
  • Source
    • "The approaches submitted in this area are dealing with finding experts, where the most critical issue is what sources they are going to choose to find experts and create their profiles. The most popular system is Arnetminer, this system is based on finding and creating experts profiles in computer science domain and represents them semantically [8]. Its main source of data acquisition is homepages, it's used essentially for the profiling task, where DBLP [18] publications are used as a bibliographic resource for the expertise computing process. "
    Twenty-Seventh International Conference on Software Engineering and Knowledge Engineering (SEKE 2015); 04/2015
  • Source
    • "In this section, we perform an empirical analysis to highlight some of the key challenges (summarized in introduction section), on AMiner citation network [22]. This is a rich real dataset for bibliography network analysis and mining. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the dynamic mechanisms that drive the high-impact scientific work (e.g., research papers, patents) is a long-debating research topic and has many important implications, ranging from personal career development, recruitment search, to the jurisdiction of research resources. Recent advances in characterizing and modeling scientific success have made it possible to forecast the long-term impact of scientific work, where data mining techniques, supervised learning in particular, play an essential role. Despite much progress has been made, several key algorithmic challenges in relation to predicting long-term scientific impact have largely remained open. In this paper, we propose a joint predictive model to forecast the long-term scientific impact at the early stage, which simultaneously addresses a number of these open challenges, including the scholarly feature design, the non-linearity, the domain-heterogeneity and dynamics. In particular, we formulate it as a regularized optimization problem; and propose effective and scalable algorithms to solve it. We perform extensive empirical evaluations on large, real scholarly data sets to validate the effectiveness and the efficiency of our method.
  • Source
    • "Dataset Description and Preprocessing: We use a dataset downloaded from ArnetMiner 4 (Tang et al. 2008) to construct three domains: a co-author domain in which each tuple has authorID, coauthorID, #papersCoauthored information , a conference domain in which each tuple has authorID , conferenceID, #papersPublished information, and a reference domain in which each tuple has authorID, referenceID , #papersReferenced information. The original dataset has approximately 2 × 10 7 publications and 4 × 10 7 citation relations. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recommender systems provide personalized item suggestions by identifying patterns in past user-item preferences. Most existing approaches for recommender systems work on a single domain, i.e., use user preferences from one domain and recommend items from the same domain. Recently, some recommendation models have been proposed to use user preferences from multiple related item source domains to improve recommendation accuracy for a target item domain, an area of research known as cross-domain recommender systems. One typical assumption in these systems is that users, items, and user preferences for items are similar across domains. In this paper, we introduce a new cross-domain recommendation problem which does not meet this typical assumption. For example, for some scientometric datasets, when the objective is to recommend co-authors, conferences, and references, respectively, to authors, although the users are similar across domains, the items and user-item preferences are different. To address this problem, we propose two approaches to aggregate knowledge from multiple domains. Our approaches allow us to control the knowledge transferred between domains. Experimental results on a DBLP subset show that the proposed cross-domain approaches are helpful in improving recommendation accuracy as compared to single domain approaches.
    AAAI Workshop on Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Austin, U.S.A; 01/2015
Show more