ArticlePDF Available

Newsgroup Exploration with WEBSOM Method and Browsing Interface

Authors:

Abstract and Figures

The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automatically. In the report, we introduce the WEBSOM method for this task. Self-Organizing Maps (SOMs) are used to position encoded documents onto a map that provides a general view into the text collection. The general view visualizes similarity relations between the documents on a map display, which can be utilized in exploring the material rather than having to rely on traditional search expressions. Similar documents become mapped close to each other. The potential of the WEBSOM method is demonstrated in a case study where articles from the Usenet newsgroup "comp.ai.neural-nets" are organized. The map is available for exploration at the WWW address http://websom.hut.fi/websom/ Contents 1 Introdu...
Content may be subject to copyright.
preprocessing
self−organization of
word category map
word
category
map
self−organization of
document map
document encoding
document
map
documents
think
hope
thought
guess
assume
wonder
imagine
notice
discovered
usa
japan
australia
china
australian
israel
intel
trained
learned
selected
simulated
improved
effective
constructed
machine
unsupervised
reinforcement
supervised
on−line
competitive
hebbian
incremental
nestor
inductive
... Researchers have investigated the use of SOM for topic modelling and sentiment analysis. Kohonen himself published extensively on his work with text (Honkela et al. 1996;Ritter and Kohonen 1989;Kohonen 1998). He described how entire documents can be distinguished from each other based on their statistical models. ...
Article
Media architecture community systematically explores the potentials of computation and digital media to intervene in form-finding, fabrication of buildings and urban data collection processes. Combining social media topic modelling techniques with the review of media architecture-related literature, I discuss methods to locate the media architecture community in social media, conduct initial discourse analysis and pursue a deeper investigation of the topics addressed by community. In the literature, media architecture is presented as an interactive set of technologies for a participative public life. And yet, while a dynamic facade increases possibilities for participation and creative expression, it also facilitates reframing participation as a technical problem. I position optimization and efficiency in media architecture discourse as a form of optimism and offer insights into its political implications. I propose to rethink the shortcut between optimism and optimization by tracing conceptual and professional relations that inform media architecture.
... Many variants of the former algorithm have been derived and adapted to specific subfields of ML. Since the dawn of data-mining, SOM was applied to text-mining, with the well known WEBSOM (Honkela et al., 1996;Kaski et al., 1998) text navigation interface for automatic keywords generation and text classification, which generated a prolific research activity. ...
Chapter
Full-text available
Machine learning techniques applied to data-mining face the challenge of time and memory requirements, and for this purpose should make full profit of the increase in power that recent multi-core processors bring. When applied to sparse data, it is also sometimes necessary to find an appropriate reformulation of the algorithms, keeping in mind that memory load was and still is an issue. In [1], we presented a mathematical reformulation of the standard and the batch versions of the Self-Organizing Map algorithm for sparse data, proposed a parallel implementation of the batch version, and carried out initial performance evaluation tests. We here reproduce and extend our experiments on a more powerful hardware architecture and compare the results to our previous ones. A thorough quantitative and qualitative analysis confirms our preceding results.
... Las RNA se utilizan en muchos ejemplos y como variante de ésta el modelo de los mapas autoorganizativos (self-organizing map, SOM). [6][7][8][9][10] ...
Article
Full-text available
Este artículo pretende dar una idea de cómo las Redes Neuronales Artificiales (ANNs), una técnica de la Inteligencia Artificial (IA), se puede acoplar a resolver el problema del tráfico en las vías de la ciudad de Riobamba, provincia de Chimborazo, país Ecuador, usando cuatro semáforos estándar, a través de un ejemplo se indica el uso de componentes electrónicos como FPGAs (Field Programmable Gate array) y los sensores en este campo, a través de detectar y contar autos puede dar mayor fluidez al tráfico. La vía que mayor cantidad de autos tenga se dará mayor prioridad para la luz verde. Usando el algoritmo de aprendizaje de un Perceptrón Simple.
... In addition, this paper certainly must mention WEBSOM [1], [2], [3]. The goal of this seminal work is the exploration of document collections by topic. ...
Article
Full-text available
This paper presents the results of an experimental implementation of a document classifier leveraging contextual word embeddings clustered on a self-organizing map. The problem of document categorization is further compounded when there are no predefined categories, or conversely there are too many categories, that documents may be bucketed into. This paper proposes to address these problems by modelling the major themes contained in the document corpus into a cluster-map using a self-organizing neural network. The cluster-map provides a visual representation to explore the corpus, and a near-semantic search interface of the many concepts outlined across the corpus.
Chapter
In this contribution an approach for document retrieval is presented which groups (pre-processed) documents using a similarity measure. The methods weredeveloped based on self-organising maps to realise interactive associative search and visual exploration of document databases. This helps a user to navigate through similar documents. The navigation, especially the search for the first appropriate document, is supported by conventional keyword search methods.
Conference Paper
Among the large number of applications of the self-organizing map (SOM) algorithm, creating maps of document collections have become commonplace since the introduction of the WEBSOM system. This article presents a novel development in WEBSOM research. The Interactive Two-Level WEBSOM, I2WEBSOM, includes two main components, a map of terms, and a dynamic map of documents. The map of terms is used to enable interactive feature selection and weighting. The map of documents is calculated using terminology-based feature vectors where their weights can be changed using the first-level map. In the experimental part, we focus on the application of creating maps of people based on their interest or competence profiles.
Article
The self-organizing map (i.e. SOM) has inspired a voluminous body of literature in a number of diverse research domains. We present a synthesis of the pertinent literature as well as demonstrate, via a case study, how SOM can be applied in clustering accounting databases. The synthesis explicates SOM's theoretical foundations, presents metrics for evaluating its performance, explains the main extensions of SOM, and discusses its main financial applications. The case study illustrates how SOM can identify interesting and meaningful clusters that may exist in accounting databases. The paper extends the relevant literature in that it synthesises and clarifies the salient features of a research area that intersects the domains of SOM, data mining, and accounting.
Conference Paper
Full-text available
Applications of clustering and classification techniques can be proved very significant in both digital and physical (paper-based) libraries. The most essential application, document classification and clustering, is crucial for the content that is produced and maintained in digital libraries, repositories, databases, social media, blogs etc., based on various tags and ontology elements, transcending the traditional library-oriented classification schemes. Other applications with very useful and beneficial role in the new digital library environment involve document routing, summarization and query expansion. Paper-based libraries can benefit as well since classification combined with advanced material characterization techniques such as FTIR (Fourier Transform InfraRed spectroscopy) can be vital for the study and prevention of material deterioration. An improved two-level self-organizing clustering architecture is proposed in order to enhance the discrimination capacity of the learning space, prior to classification, yielding promising results when applied to the above mentioned library tasks.
Article
Full-text available
In this paper, a novel model for monolingual Information Retrieval in English and Spanish language is proposed. This model uses Natural Language Processing techniques (a POS-tagger, a Partial Parser, and an Anaphora Resolver) in order to improve the precision of traditional IR systems, by means of indexing the “entities” and the “relations” between these entities in the documents. This model is evaluated on both the Spanish and English CLEF corpora. For the English queries, there is a maximum increase of 35.11% in the average precision. For the Spanish queries, the maximum increase is 37.18%.
Conference Paper
Full-text available
Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques, especially the expanding World Wide Web which presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-organizing maps (SOMs) are used to represent documents on a map that provides an insightful view of the text collection. This view visualizes similarity relations between the documents, and the display can be utilized for orderly exploration of the material rather than having to rely on traditional search expressions. The complete WEBSOM method involves a two-level SOM architecture comprising of a word category map and a document map, and means for interactive exploration of the database
Article
This paper is concerned with the application of Kohonen's self-organizing map in the area of software reuse. Although software reuse has a long historical tradition in research, what is still missing is an appropriate way to structure software libraries according to the semantic similarities of the stored software components. In this paper we describe an approach to overcome this inconvenience by applying Kohonen's self- organizing map to the description of software components. As a result we obtain a semantically structured software library which is paramount to conventional approaches.
Article
Self-organized formation of topographic maps for abstract data, such as words, is demonstrated in this work. The semantic relationships in the data are reflected by their relative distances in the map. Two different simulations, both based on a neural network model that implements the algorithm of the selforganizing feature maps, are given. For both, an essential, new ingredient is the inclusion of the contexts, in which each symbol appears, into the input data. This enables the network to detect the logical similarity between words from the statistics of their contexts. In the first demonstration, the context simply consists of a set of attribute values that occur in conjunction with the words. In the second demonstration, the context is defined by the sequences in which the words occur, without consideration of any associated attributes. Simple verbal statements consisting of nouns, verbs, and adverbs have been analyzed in this way. Such phrases or clauses involve some of the abstractions that appear in thinking, namely, the most common categories, into which the words are then automatically grouped in both of our simulations. We also argue that a similar process may be at work in the brain.
Article
This work contains a theoretical study and computer simulations of a new self-organizing process. The principal discovery is that in a simple network of adaptive physical elements which receives signals from a primary event space, the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events. In other words, a principle has been discovered which facilitates the automatic formation of topologically correct maps of features of observable events. The basic self-organizing system is a one- or two-dimensional array of processing units resembling a network of threshold-logic units, and characterized by short-range lateral feedback between neighbouring units. Several types of computer simulations are used to demonstrate the ordering process as well as the conditions under which it fails.
Book
Much connectionist research in natural language processing has been concerned with isolated aspects of understanding language. Very few researchers have attempted to build comprehensive computational models that are biologically and psychologically plausible and that incorporate the components necessary for modeling and testing various complex high-level cognitive phenomena. Miikkulainen's book is an excep-tion to this trend. Using script understanding as a testbed, he shows how script-based inferences can be learned from experience on the basis of the statistical correlations implicit in the example data. He also shows how episodic memory organization can be automatically formed on the basis of these statistical regularities, and how word semantics can be learned from actual use. In an attempt to overcome some of the limitations of traditional AI symbolic approaches to this problem--the processing ar-chitecture, mechanisms, and knowledge are hand-coded, and inferences are based on handcrafted rules--he constructs a distributed neural network model composed solely of artificial neural network components. An important aspect of his system is its ability to address such questions as where performance errors, such as dyslexic errors and semantic slips, come from, how mem-ory can become overloaded, and why certain types of memory confusions can occur in such situations. Constructs such as topological and hierarchical feature maps are introduced to address such issues. Topological maps have the property that complex similarity relationships of some high-dimensional input space become visible on the map. In addition, the maps can be formed by an unsupervised learning process. The hierarchical nature of these maps makes it possible to characterize the input from several graded perspectives: from gross high-level classifications to more specific in-stantiations of data. Thus a story about Bill eating a lobster pizza appetizer at Biba in Boston could be grossly characterized as a story about a restaurant, or more specifi-cally, a fancy restaurant, or even more specifically about Bill eating lobster pizza.
Conference Paper
This paper is concerned with the application of Kohonen's self-organizing map in the area of software reuse. Although software reuse has a long historical tradition in research, what is still missing is an appropriate way to structure software repositories according to the semantic similarities of the stored software components. In this paper we describe an approach to overcome this inconvenience by applying Kohonen's self-organizing map to the description of software components. As a result we obtain a semantically structured software library.