Conference Paper

ArnetMiner: Extraction and Mining of Academic Social Networks

DOI: 10.1145/1401890.1402008 Conference: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Source: DBLP


This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

Download full-text


Available from: Zhong Su,
223 Reads
  • Source
    • "Source: Own Elaboration. There are other research appoint agents as nodes (users, entities) who also studied in large-scale social networks as they remain soft systems [21], in combination with stochastic and probabilistic methods creating hybrid methodologies in order to propose better solutions [22]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Today one of the biggest problems found in developing and implementing Artificial Neural Networks (ANN) is the lack of a rigorous methodology that ensures the development of optimum ANN, because the performance of their function is measured by trial and error, leading to loss of time in training networks that are far away to reach the expected error rate and adequate performance. As a part of the investigation, a method for determining the optimal networks for prediction and function approximation networks with more than one output is proposed and developed. However, steps can be implemented in hard systems methodologies to analyze the characteristics of the variables required for training the ANN and the correlation between them to reduce the optimal search time. A methodology for the construction and development of an ANN is proposed and developed based on Checkland, Jenkins and Hall methodologies, obtaining a 14-step methodology grouped into three stages.
    Procedia Computer Science 12/2015; 46:1827-1834. DOI:10.1016/j.procs.2015.02.142
  • Source
    • "The approaches submitted in this area are dealing with finding experts, where the most critical issue is what sources they are going to choose to find experts and create their profiles. The most popular system is Arnetminer, this system is based on finding and creating experts profiles in computer science domain and represents them semantically [8]. Its main source of data acquisition is homepages, it's used essentially for the profiling task, where DBLP [18] publications are used as a bibliographic resource for the expertise computing process. "
    Twenty-Seventh International Conference on Software Engineering and Knowledge Engineering (SEKE 2015); 04/2015
  • Source
    • "In this section, we perform an empirical analysis to highlight some of the key challenges (summarized in introduction section), on AMiner citation network [22]. This is a rich real dataset for bibliography network analysis and mining. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the dynamic mechanisms that drive the high-impact scientific work (e.g., research papers, patents) is a long-debating research topic and has many important implications, ranging from personal career development, recruitment search, to the jurisdiction of research resources. Recent advances in characterizing and modeling scientific success have made it possible to forecast the long-term impact of scientific work, where data mining techniques, supervised learning in particular, play an essential role. Despite much progress has been made, several key algorithmic challenges in relation to predicting long-term scientific impact have largely remained open. In this paper, we propose a joint predictive model to forecast the long-term scientific impact at the early stage, which simultaneously addresses a number of these open challenges, including the scholarly feature design, the non-linearity, the domain-heterogeneity and dynamics. In particular, we formulate it as a regularized optimization problem; and propose effective and scalable algorithms to solve it. We perform extensive empirical evaluations on large, real scholarly data sets to validate the effectiveness and the efficiency of our method.
Show more