IPHITS: An Incremental Latent Topic Model for Link Structure

DOI: 10.1007/978-3-642-04769-5_21
Source: DBLP

ABSTRACT The structure of linked documents is dynamic and keeps on changing. Even though different methods have been proposed to exploit
the link structure in identifying hubs and authorities in a set of linked documents, no existing approach can effectively
deal with its changing situation. This paper explores changes in linked documents and proposes an incremental link probabilistic
framework, which we call IPHITS. The model deals with online document streams in a faster, scalable way and uses a novel link
updating technique that can cope with dynamic changes. Experimental results on two different sources of online information
demonstrate the time saving strength of our method. Besides, we make analysis of the stable rankings under small perturbations
to the linkage patterns.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since the publication of Brin and Page's paper on P ageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are independent of the link structure of the Web. We ga in a further boost in accuracy by using data on the frequency at which users visit Web pages. We use RankNet, a ranking machine learning algorithm, to combine these and other static featur es based on anchor text and domain characteristics. The resulti ng model achieves a static ranking pairwise accuracy of 67.3 % (vs. 56.7% for PageRank or 50% for random).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Event-based network data consists of sets of events over time, each of which may involve multiple entities. Exam- ples include email trac, telephone calls, and research pub- lications (interpreted as co-authorship events). Traditional network analysis techniques, such as social network models, often aggregate the relational information from each event into a single static network. In contrast, in this paper we focus on the temporal nature of such data. In particular, we look at the problems of temporal link prediction and node ranking, and describe new methods that illustrate opportu- nities for data mining and machine learning techniques in this context. Experimental results are discussed for a large set of co-authorship events measured over multiple years, and a large corporate email data set spanning 21 months.
    SIGKDD Explorations. 01/2005; 7:23-30.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fully-utilized yet. Most links between forum pages are automatically created, which means the link-based ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into link-based methods as implicit links. The basic idea is derived from the more focused ran- dom surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Fine- grained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topic-sensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for au- tomatically generating topic hierarchy and map each page in a large-scale collection onto the computed hierarchy. The experimental results show that the proposed method can im- prove retrieval performance, and reveal that content-based link graph is also important compared with the hyper-link graph.
    SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006; 01/2006