Conference Paper

ArnetMiner: Extraction and Mining of Academic Social Networks

DOI: 10.1145/1401890.1402008 Conference: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Source: DBLP


This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

Download full-text


Available from: Zhong Su
  • Source
    • "Source: Own Elaboration. There are other research appoint agents as nodes (users, entities) who also studied in large-scale social networks as they remain soft systems [21], in combination with stochastic and probabilistic methods creating hybrid methodologies in order to propose better solutions [22]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Today one of the biggest problems found in developing and implementing Artificial Neural Networks (ANN) is the lack of a rigorous methodology that ensures the development of optimum ANN, because the performance of their function is measured by trial and error, leading to loss of time in training networks that are far away to reach the expected error rate and adequate performance. As a part of the investigation, a method for determining the optimal networks for prediction and function approximation networks with more than one output is proposed and developed. However, steps can be implemented in hard systems methodologies to analyze the characteristics of the variables required for training the ANN and the correlation between them to reduce the optimal search time. A methodology for the construction and development of an ANN is proposed and developed based on Checkland, Jenkins and Hall methodologies, obtaining a 14-step methodology grouped into three stages.
    Full-text · Article · Dec 2015 · Procedia Computer Science
  • Source
    • "Besides creating social network from surveys, which is a standard approach in sociology, person-centric networks have also been extracted (semi-)automatically from diverse types of publicly accessible semi-structured data repositories. These include email archives, e.g., [1], [2], [3], the blogosphere, e.g., [4], digital libraries from which co-authorship networks are derived, e.g., [5], [6], [7], movie databases [8], and many others. For a more comprehensive overview, see the introductory chapters in [9]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Most traditional social networks rely on explicitly given relations between users, their friends and followers. In this paper, we go beyond well structured data repositories and create a person-centric network from unstructured text – the Wikipedia Social Network. To identify persons in Wikipedia, we make use of interwiki links, Wikipedia categories and person related information available in Wikidata. From the co-occurrences of persons on a Wikipedia page we construct a large-scale person-centric network and provide a weighting scheme for the relationship of two persons based on the distances of their mentions within the text. We extract key characteristics of the network such as centrality, clustering coefficient and component sizes for which we find values that are typical for social networks. Using state-of-the-art algorithms for community detection in massive networks, we identify interesting communities and evaluate them against Wikipedia categories. The Wikipedia social network developed this way provides an important source for future social analysis tasks.
    Full-text · Conference Paper · Aug 2015
  • Source
    • "Model-based approaches model users, attributes and relations by the use of modelling methods. For example, Sun and Li [25] use a label distribution algorithm [1] based method on social tagging systems and Tang et al. [26] use an extended LDA-based model. Our proposed approach, SemPostLP, belongs to the third approach using the post network with nodes' attributes and relation attributes to model both user profile and network structure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The issue of detecting large communities in online social networks is the subject of a wide range of studies in order to explore the network sub-structure. Most of the existing studies are concerned with network topology with no emphasis on active communities among the large online social networks and social portals, which are not based on network topology like forums. Here, new semantic community detection is proposed by focusing on user attributes instead of network topology. In the proposed approach, a network of user activities is established and weighted through semantic data. Furthermore, consistent extended label propagation algorithm is presented. Doing so, semantic representations of active communities are refined and labelled with user-generated tags that are available in web.2. The results show that the proposed semantic algorithm is able to significantly improve the modularity compared with three previously proposed algorithms.
    Full-text · Article · Jun 2015 · Journal of Information Science
Show more