Towards Visual Exploration of Topic Shifts
ABSTRACT This paper presents two approaches to visually analyze the topic shift of a pool of documents over a given period of time. The first of the proposed methods is based on a multi-dimensional scaling algorithm, which places vectors representing terms occurring in certain years (periodfrequency-vectors) in a spatial, two-dimensional space. This kind of visualization enables the detection of terms occurring in documents, published in particular years, or terms spread over different years. The second method uses a graph based approach. Publishing dates of documents, as well as their terms are represented by the vertices of a graph. Terms related to a specific publishing year are connected to the vertex of the year via an edge. By usage of activation spreading techniques, terms frequently occurring in documents published in particular years can be discovered visually. We tested both approaches with 2431 abstracts of papers published in the IEEE Transactions on SMC-A, SMC-B, and SMC-C in the years 1996 to 2006.Our experiments indicate that a number of interesting terms can be nicely separated in clumps according to individual years or periods of time. In addition, one can visualize the emergence of specific terms over certain periods of time and how these and other terms fade away again later.
Conference Proceeding: A Neural Network for Probabilistic Information Retrieval.[show abstract] [hide abstract]
ABSTRACT: This paper demonstrates how a neural network may be constructed, together with learning algorithms and modes of operation, that will provide retrieval effectiveness similar to that of the probabilistic indexing and retrieval model based on single terms as document components.SIGIR'89, 12th International Conference on Research and Development in Information Retrieval, Cambridge, Massachusetts, USA, June 25-28, 1989, Proceedings; 01/1989
- 01/2003; dpunkt.verlag., ISBN: 978-3-89864-213-2
- [show abstract] [hide abstract]
ABSTRACT: We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.03/2004;