ArticlePDF Available

Learning to understand—General aspects of using Self-Organizing Maps in natural language processing

Authors:

Abstract and Figures

The Self-Organizing Map (SOM) is an artificial neural network model based on unsupervised learning. In this paper, the use of the SOM in natural language processing is considered. The main emphasis is on natural features of natural language including contextuality of interpretation, and the communicative and social aspects of natural language learning and usage. The SOM is introduced as a general method for the analysis and visualization of complex, multidimensional input data. The approach of how to process natural language input is presented. Some epistemological underpinnings are outlined, including the creation of emergent and implicit categories by SOM, intersubjectivity and relativity of interpretation, and the relation between discrete symbols and continuous variables. Finally, the use of SOM as a component in an anticipatory system is presented, and the relation between anticipation and self-organization is discussed.
Content may be subject to copyright.
Learning to Understand -
General Aspects of Using Self-Organizing Maps
in Natural Language Processing
Timo Honkela
Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200, FIN-02015 HUT, Finland
tel: +358 0 451 3275, fax: +358 0 451 3277
e-mail: Timo.Honkela@hut.fi
WWW: http://nucleus.hut.fi/tho/
Abstract
The Self-Organizing Map (SOM) is an artificial neural network model based on unsupervised
learning. In this paper, the use of the SOM in natural language processing is considered. The
main emphasis is on natural features of natural language including contextuality of
interpretation, and the communicative and social aspects of natural language learning and
usage. The SOM is introduced as a general method for the analysis and visualization of
complex, multidimensional input data. The approach of how to process natural language input
is presented. Some epistemological underpinnings are outlined, including the creation of
emergent and implicit categories by SOM, intersubjectivity and relativity of interpretation,
and the relation between discrete symbols and continuous variables. Finally, the use of SOM
as a component in an anticipatory system is presented, and the relation between anticipation
and self-organization is discussed.
Keywords: natural language processing, self-organizing maps, semantics, epistemology,
neural networks.
1. Introduction
Traditionally the formal study of language has centered around structural and static aspects.
The automatic analysis and generation of syntactic structures has mainly been based on
explicit, hand-written, and symbolic representations. In semantics the main focus has been on
propositional structures, quantifiers, connectives, and other phenomena that match well the
apparatus of predicate logic. This paper aims at widening the scope to include many more
natural features of natural language including contextuality of interpretation, and the
communicative and social aspects of natural language learning and usage. The principles of
formalizing specific aspects of these phenomena are considered in this paper including the
following:
- In traditional study of language fixed categorizations are normally used. Lexical items,
words and phrases, are positioned into categories such as verbs and nouns, and these
categories are used in abstract rules, e.g., of the type ”S NP VP”, i.e., a sentence
consists of a nominal phrase and a verb phrase. It may seem that the abstract rules are
precise, but when they are applied, discrepancies exist between the rules and the actual
use of language. A rule may be incorrect in various ways. For instance, a rule may be
overtly general and should be refined. Refining the rule may be based on adding extra
restrictions on its use, or creating more fine grained categories that divide the feature
space into smaller areas. When this refinement process is continued into the extreme it
may appear that each word has a category of its own. At least, it seems that a natural
grammar has a fractal structure.
- The use and interpretation of language is adaptive and context-sensitive. One can, of
course, find the most usual patterns and make definitions based on them, but in the
actual discussions and writings, words are often used creatively based on the particular
situation. The well-known ambiguity ”problem” highlights the context-sensitivity: there
may be multiple interpretations for a word or a phrase but in the context the desired
interpretation can be understood. Most often human listeners or readers do not even
notice the potential alternative readings of distinct words. The preceding text and the
overall context supports an anticipatory process that blocks effectively incorrect
interpretations.
- The context-sensitivity of interpretation is also relevant when one considers the more
fine-grained structure of semantic and pragmatic level. The traditional, logic-based
ontology of natural language interpretation is based on the idea that the world consists
of distinct objects, their properties, and the relationships between the objects. Such a
view neglects the fact that the propositional level of sentences does not have a simple
one-to-one counterpart in the reality. The reality is highly complex, apparently high-
dimensional in the perceptional level, changing, consisting of non-linear and continuous
processes. Thus, studying the epistemological level having basically ”names” and
”objects” as their referents, may be considered to be far too simplistic. One should, for
instance, take into account the relation between discrete symbols and the continuous
spaces which the symbols refer to.
- Human understanding of natural language is based on the long individual experience.
Inevitably the differences in personal histories cause differences in the way humans
interpret natural language expressions. In the light of the previous discussion, it should
be clear how approaching this kind of phenomenon is difficult using the apparatus of
symbolic logic without considering more refined mathematical tools of algebra. The
subjectivity of interpretation is apparent when thoughtfully. The communication is
enabled by the intersubjectivity based on the learning process in which the
interpretations are adapted to match well enough so that meaningful exchange of
thoughts becomes possible. Possibility of fine-grained differences in interpretation
become understandable when continuous variables and spaces are considered as the
counterparts for the disctinct symbols that are used in communication. For instance, if
person A has a prototypical idea of a specific color having the value [0.234 0.004 0.678]
in a color coding scheme, and for the person B the corresponding vector is [0.232 0.002
0.677], it is clear that the communication based on the symbol is successful in spite of
the small difference, error, in the interpretation. Actually, it may be still more fruitful to
consider a reference relation as a distribution rather than a relation between a symbol
and a numerical value or vector. Such an approach enriches strongly the possibility to
study fine-grained phenomena of natural language interpretation as opposed to the
model theoretical approach.
In the following, Kohonen's Self-Organizing Maps (SOMs) (Kohonen, 1982, 1995) are
introduced. The SOMs may provide a sound basis for modeling the general underlying
principles of natural language learning and interpretation. The motivation for such a claim is
presented in the rest of the paper.
2. Self-Organizing Maps
The SOM is a widely-used artificial neural network model in which learning is unsupervised:
no a priori classifications for the input examples are needed. In comparison, the
backpropagation algorithm, for instance, requires the examples to consist of input-output
pairs. The network architecture of the SOM consists of a set of laterally interacting adaptive
processing elements, nodes, usually arranged as a two-dimensional grid called the map. All
the map nodes are connected to a common set of inputs. Any activity pattern on the input
gives rise to excitation of some local group of map nodes. After learning, the spatial positions
of the excited groups specify a mapping of the input onto the map. The learning process is
based on similarity comparisons in a continuous space. The result is a system that map similar
inputs close to each other in the resulting map. The input may be highly complex
multidimensional data like in the real-life applications in speech recognition, image analysis,
and process monitoring. (Kohonen, 1995)
2.1. Learning Process in SOM
Assume that some sample data sets have to be mapped onto the array depicted in Fig. 1; a
sample set is described by a real vector x(t) Rnwhere t is the index of the sample, or the
discrete-time coordinate. In setting up the neural-network model called the Self-Organizing
Map, we first assign to each unit in the array a parameter vector mi(t) Rncalled the
codebook vector, which has the same number of elements as the input vector x(t). The initial
values of the parameters, components of mi(t), can be selected at random.
Fig. 1. The basic architecture of the self-organizing map.
The ``image'' of an input item on the map is defined to be in the location, the mi(t) of which
matches best with x(t) in some metric. The self-organizing algorithm that creates the ordered
mapping can be described as a repetition of the following basic tasks:
1. An input vector x(t) (like one row of Table 1) is compared with all the codebook vectors
mi(t). The best-matching unit on the map, i.e., the unit where the parameter vector is
most similar to the input vector in some metric, called the winner, is identified.
2. The codebook vectors of the winner and a number of its neighboring units in the array
are changed incrementally according to the learning principle specified below.
The basic idea in the SOM is that, for each input sample vector x(t), the parameters of the
winner and units in its neighborhood are changed closer to x(t). For different x(t) these
changes may be contradictory, but the net outcome in the process is that ordered values for
the mi(t) are finally obtained over the array. If the number of input vectors is not large
compared with the number of codebook vectors (map units), the set of input vectors must be
presented reiteratively many times. As mentioned above, the codebook vectors may initially
have random values, but they can also be selected in an ordered way. Adaptation of the
codebook vectors in the learning process takes place according to the following equation:
mi(t+1) = mi(t) + α(t)[x(t)-mi(t)] for each i Nc(t),
where t is the discrete-time index of the variables, the factor α(t) [0,1] is a scalar that
defines the relative size of the learning step, and Nc(t) specifies the neighborhood around the
winner in the map array. At the beginning of the learning process the radius of the
neighborhood is fairly large, but it shrinks during learning. This ensures that the global order
is obtained already at the beginning, whereas towards the end, as the radius gets smaller, the
local corrections of the codebook vectors in the map will be more specific. The factor α(t)
decreases during learning. Details about the selection of parameters, variants of the map, and
thousands of application examples can be found in (Kohonen, 1995). A recent work
describing the SOM in data analysis and exploration is (Kaski, 1997).
2.2. Introductory example: from a table to a map
As a very simple illustrative example we demonstrate how different languages can be
identified from their written forms by the SOM. Consider Table 1 that is an excerpt from a
larger table used in the experiment. It gives the relative frequencies of characters in a small
text corpus, selected from each of the languages considered.
Table 1. Part of a table containing relative frequencies of characters.
Language 'a' 'b' 'c' ‘d’ ‘e’ ‘f’ ‘g’ ‘h’
English 0.078 0.016 0.019 0.053 0.128 0.018 0.024 0.078 ...
Estonian 0.118 0.008 0.000 0.055 0.127 0.002 0.016 0.019 ...
Finnish 0.119 0.000 0.000 0.007 0.083 0.000 0.000 0.022 ...
French 0.069 0.005 0.035 0.042 0.139 0.010 0.015 0.009 ...
Hungarian 0.088 0.020 0.009 0.023 0.102 0.007 0.034 0.021 ...
... ... ... ... ... ... ... ... ...
Mutual similarities of the different languages are very hard to deduce from this kind of a
table. However, when the row vectors of the table are applied as inputs to a self-organizing
map, an outcome of the learning process is that statistically similar vectors (rows) become
mapped close to each other in the display. In Fig. 2, similar statistics of characters in different
languages are reflected in their "images" being correspondingly close on the map: the nearer
the "images" of two languages (like Danish and Norwegian) are, the more similarities they
have in their written features.
Fig. 2. A self-organized hexagonal map based on the relative frequencies of characters in
some languages. The map tends to preserve the topology of the original input space.
The comparison was based on the character set which was used in the original material
obtained from the Internet. The aim was not to provide a detailed linguistic analysis and, thus,
the input material was randomly selected. Simplifying transliterations of many accented
characters had been used the original texts. The phonetic resemblances were not considered.
However, although only such a set of crude features derived from a small corpus was used as
input data, some basic relationships can be seen on the map. The Romance and Germanic
languages fall into areas of their own. Finnish and Estonian have been positioned into the
same map node. The dark color denotes a larger distance between the codebook vectors.
Thus, one can also find meaningful clustering structure on the map while the previously
mentioned areas are separated by the ”borderlines”.
3. Maps of Natural Language
In languages we use signs and conventions to present a kind of terrain. Like maps, language
can open up new worlds: we can innovate through language. We can work out problems,
understand and communicate complex ideas by unpacking them in words. But like maps,
language also distorts. In using language, we necessarily reduce and group and select. This is
exactly what self-organizing maps also do: generalize and organize information, as well
speech signal, pictorial images as symbolic input if the context is present.
In most practical applications of the SOM, the input to the map algorithm is derived from
some measurements, usually after preprocessing. In such cases, the input vectors are supposed
to have metric relations. Interpretation of languages, in their written form, is based on the
processing of sequences of discrete symbols. To create a map of discrete symbols that occur
within the sentences, each symbol must be presented in the due context.
3.1. Basic Principles of Using SOM in Natural Language Interpretation
The Self-Organizing Map, SOM (Kohonen, 1995) is well suited to serve as the central
processing element in modeling natural language interpretation because of the following
reasons:
- The SOM algorithm modifies its internal presentation, i.e., the map node vectors
according to the external input which enables the adaptation.
- The SOM is able to process natural language input to form "semantic maps" (see Ritter
and Kohonen, 1989). Natural language interpretation using the SOM has further been
examined, e.g., in (Miikkulainen, 1993; Scholtes, 1993; Honkela, 1995).
- Symbols and continuous variables may be combined in the input, and they are
associated by the SOM (see, e.g., Honkela, 1991). Continuous variables may be
quantized, and a symbolic interpretation can be given for each section in the possibly
very high-dimensional space of perceptional variables.
- Because the SOM is based on unsupervised learning, processing external input without
any prior classifications is possible (Kohonen, 1995). The map is an ``individual'' model
of the environment and of the relation between the expressions of the language and the
environment.
- The SOM enables creating a model of the relation between the environment and the
expressions of the language used by the others. In addition, generalizations of this
relations can be formed (Honkela, 1993).
3.2. Creating Maps of Words
It has earlier been shown that the SOM can be applied to the analysis and visualization of
contextual roles of words, i.e., similarities in their usage in short contexts formed of adjacent
words (Ritter and Kohonen, 1989). In the unsupervised formation of the of map of words,
each input x consists of an encoded word and its averaged context. Each word in the
vocabulary is encoded as a n-dimensional random vector. In our experiments (e.g., Honkela et
al. 1995, Kaski et al. 1996, Lagus et al. 1996) n has usually been 90. A more straightforward
way to encode each word would be to reserve one component for each distinct word but then,
especially with a large vocabulary, the dimensionality of the vector would be computationally
intractable. A mathematical analysis of the dimensionality reduction based on random
encoding is presented by Ritter and Kohonen (1989).
The basic steps in forming a map of words are listed in the following (see also Fig. 3):
1. Create a unique random vector for each word in the vocabulary.
2. Find all the instances of each word to be mapped in the input text collection. Such
words will be called key words in the following. Calculate the average over the contexts
of each key word. The random codes formed in step 1 are used in the calculation. The
context may consist of, e.g., the preceding and the succeeding word, or some other
window over the context. As a result each key word is associated with a contextual
fingerprint. If the original random encoding for a single word is, for instance, 90-
dimensional, the resulting vector of key word with context is 270-dimensional if one
neighboring word from both sides is included. The classifying information is in the
context. Therefore the key word part of the vector is multiplied by a small scalar, ε. In
our experiments ε has usually been 0.2.
3. Each vector formed in step 2 is input to the SOM. The resulting map is labeled after the
training process by inputting the input vectors once again and by naming the best-
matching neurons according to the key word part of the input vector.
Fig. 3. Creation of input vectors for the word category map.
3.3. Comparisons of Artificial Neural Networks Models
Artificial neural-network models may be classified according to their specific features, such
as structure, dynamics, and type of adaptation. The nature and source of the input material and
its preprocessing should also be specified. The input data material may be symbolic,
numerical, or a combination of the both. The speech signal is a typical example of
continuous-valued, non-symbolic input material. On the other hand, written texts are
symbolic, although they may be transformed into vectorial form in some way. One method of
such a transformation was presented in the previous chapter.
The appearance of natural-language expressions, written or spoken, is sequential. This
inherent property raises the question of how to handle time. Elman (1991) presents different
ways of representing time. One possibility is to concatenate subsequent ``moments'' into
single input vectors like in the current study. Another possibility is to use networks with
recurrent connections to make them operate sequentially. The learning principle of the SOM,
however, is not suitable for such recursive operations. On the other hand, the self-organizing
map is the only neural-network architecture in which the spatial order between representations
emerges automatically. Such a self-organized order is also one of the main properties of the
neural structures in the brain. Neurophysiological studies supporting the existence of SOM
like processing principles in the brain have been reviewed in Kohonen (1995).
The basic learning strategies of adaptive systems can be categorized into supervised,
reinforced, and unsupervised. In supervised learning, the system is given input-output pairs:
for each input there must also exist the "right answer" to be enforced at the output. The
system then learns these input-output pairs. The task is not trivial, however, and after the
learning period the network is also able to deal with inputs that were not present in the
learning phase. This property ensues from the generalizing capabilities of the network. The
drawback of supervised learning is the need for correct output in each input example. In some
cases, obtaining the output for each input case is a very laborious task especially if a large
source material is used.
Whereas supervised learning models are suitable for classification, unsupervised learning can
be used for abstraction. The self-organizing map, considered in this article, enables
autonomous processing of linguistic input. The map forms a structured statistical description
of the material, allowing very nonlinear dependencies between the items. Application of the
self-organizing maps to natural language processing has been described, e.g., by Ritter and
Kohonen (1989), Scholtes (1993), Miikkulainen (1993), Honkela et al. (1995).
4. Epistemological Considerations
In the following, the problem areas presented in the introduction and the methodological tools
provided by the self-organizing maps and the underlying principles, are tied together.
4.1. Emergent Categories
Conceptually interrelated words tend to fall into the same or neighboring node in the word
category map (see, e.g., Kaski et al., 1996; Kohonen et al., 1996). Fig. 4 shows the results of a
study in which texts from the Usenet newsgroup sci.lang were used as input for the SOM
(Lagus et al., 1996). The overall organization of a word category map reflects the syntactic
categorization of the words. In the study by Honkela, Pulkki, and Kohonen (1995) the input
for the map was the English translation of Grimm fairy tales. In the resulting map, in which
150 most common words of the tales were included, the verbs formed an area of their own in
the top of the map whereas the nouns could be found in the opposite corner. The modal verbs
were in one area. Semantically oriented ordering could also be found: for instance, the
inanimate and animate nouns formed separate clusters.
Fig. 4. An illustration of a word category map based on the texts of the Usenet newsgroup
sci.lang. The map consists of 15 x 21 nodes most of which contain several words.
An important consideration is that in the experiments the input for the SOM did not contain
any predetermined classifications. The results indicate that the text input as such, with the
statistical properties of the contextual relations, is sufficient for automatical creation of
meaningful implicit categories. The categories emerge during the learning process. The
symbol grounding task would, of course, be more realistic and complete if it is were possible
to provide also other modalities such as pictorial images as a part of the context information.
4.2. Intersubjectivity and Relativity: Relation Between Discrete and Continuous
Subjectivity is inherent in human natural language interpretation. The nature and the level of
the subjectivity has been subject to several debates. For instance, the Chomskian tradition of
linguistics as well as the philosophy of language based on predicate logic seem clearly to
undermine the subjective component of language processing. In them, the relation between
the ”names” of the language and the ”objects” or ”entities” may be taken as granted and to be
unproblematic.
Consider now that we are about to denote an interval of a single continuous parameter using a
limited number of symbols. These symbols are then used in the communication between two
subjects (human or artificial). In a trivial case two subjects would have same denotations for
the symbols, i.e. the limits of the intervals corresponding to each symbol would be identical.
If the ”experience” of the subjects is acquired from differing sources, the conceptualization
may very well differ. This kind of difference in the density pattern is illustrated in Fig. 4a. In
Fig 4b, the interval from x0to x4is divided into smaller intervals according to the patterns.
The first subject uses two symbols, A and B, whereas the second subject also utilizes a third
once, namely C for the interval between x1and x3Thus, if the context (the parameter value in
this simplified illustration of Fig. 5) is not present, a communicated symbol may lead to an
erroneous interpretation.
Fig. 5. Illustration of symbols associated to a continuous parameter by two subjects
(A and B). In the left, the varying density patterns show the motivation for different
conceptualizations of the same phenomenon. The conceptualization, i.e., naming the space, is
presented in the right.
One may then ask how to deal with this kind of discrepancies. A propositional level is not
sufficient. The key idea is to provide the means for a system to associate continuous-valued
parameter spaces to sets of symbols, and furthermore, to ”be aware” of the differences in this
association and to learn those differences explicitly. These kinds of abilities are especially
required by highly autonomous systems that need to communicate using an open set of
symbols or constructs of a natural language. This kind of association of set of symbols and a
set of continuous parameters is a natural extension or modification of the word category maps
(see Honkela, 1993; Honkela and Vepsäläinen, 1991). An augmented input consists of three
main parts: the encoded symbol, the context which is the parameter vector in this case, and
identification of the utterer or source of the symbol being used. The map nodes associate
symbols with the continuous parameters. One node corresponds to an area in the
multidimensional space, i.e., a Voronoi tessellation determined by the codebook vector
associated with the map node and its neighboring nodes. The relation is one-to-many: one
symbol is associated with infinitive number of points.
In this kind of mapping, the error (cf. Rosen, 1985), or a kind of relativity is a necessity in
communication. One can define the exact reference of a symbol in a continuous space only to
a limit that is restricted by several issues, for instance, the limited time available for
communication in which the level of intersubjectivity is raised. A common source of context
is often not available either. Von Foerster (1972b) has outlined the very basic epistemological
questions that are closely related to the topics of the present discussion. He states, among
other things, that by keeping track of the computational pathways for establishing
equivalence, ”objects” and ”events” emerge as consequences of branches of computation
which are identified as the processes of abstraction and memorization. In the realm of
symbolic logic the invariance and change are paradoxical: ”the distinct being the same”, and
”the same being distinct”. In a model that includes both the symbolic description as well as
the continuous counterpart, there is no paradox, and the relationship may be computed, e.g.,
by the self-organizing map.
The previously presented framework also provides a means to consider the relationship
between language and thoughts. In the case of colors, one may hypothesize that the perceptual
input in the human experience is overwhelming when compared with the symbolic
descriptions. Thus, the ”color map” is based on the physical properties of the input. On the
other hand, abstract concepts are based on the cultural ”agreements” and they are
communicated symbolically so that the relation to external, physical counterparts is less
straightforward. A natural result would be that such concepts are much more prone to
subjective differences based on the cultural environment. Even if the original perceptual input
is available but it is constantly associated with a systematic classifying symbolic input, the
result deviates strongly compared with the case in which the latter information is not
available. Von Foerster (1972a) has described the phenomenon and its consequences in the
following way: ”We seem to be brought up in a world seen through descriptions by others
rather than through our own perceptions. This has the consequence that instead of using
language as a tool with which to express thoughts and experience, we accept language as a
tool that determines our thoughts and experience.” In linguistics this kind of idea is referred to
as the Sapir-Whorf hypothesis.
4.3. Anticipatory Systems
Rosen (1985) has described anticipatory behavior to be one in which a change of state in the
present occurs as a function of some predicted future state. In other words, an anticipatory
system contains a predictive model of itself and/or its environment, which allows it to change
state at an instant in accord with the model’s predictions pertaining to a later instant.
Music involves the expectation and anticipation of situations on the one hand, and
confirmation or disconfirmation of them on the other. Kaipainen (1994) has studied the use of
SOMs in modeling musical perception. In addition to the basic SOM, Kaipainen uses a a list
of lateral connections that record the transition probabilities from one map node to another.
The model is based on the specific use of the SOM in which the time dynamics of a process
are characterized by the trajectory path on the map. This aspect has been important already in
the first application area of the SOMs, namely speech recognition, and more recently in
process monitoring. The model of musical perception was tested in three modes called
”Gibsonian”, ”autistic”, and ”Neisserian”. The Gibsonian and autistic are the two extremes
regarding the use of anticipation recorded in the trajectory memory: the first model was
designed so that it did not use the trajectory information at all. The result was that continuity
from one variation to another could not be maintained. On the other extreme, the autistic
model was parameterized so that it developed a deterministically strong schematic drive. It
began to use its internal representational states, eventually becoming ignorant of the input
flow of musical patterns. The intermediate model, denoted as Neisserian, was the one that
performed best in musical terms. It was open to to the input having at the same time an
internal schematic drive, anticipation, which intentionally actualized musical situations rather
than just recognizing them as given.
When the relationship between Rosen’s formulations and Kohonen’s self-organizing maps is
considered the following quote may be of interest (Rosen, 1985): "Briefly, we believe that
one of the primary functions of the mind is precisely to organize percepts. That is, the mind is
not merely a passive receiver of perceptual images, but rather takes an active role in
processing them and ultimately in responding to them through effector mechanism. The
organization of percepts means precisely the establishment of relations between them. But we
then must admit such relations reflect the properties of the active mind as much as they do the
percepts which the mind organizes." It seems that the SOM concretizes this idea. In general,
Rosen’s point of view may be characterized as physical and biological whereas Kohonen’s
main results are related to computational, epistemological, neurophysiological, and cognitive
aspects. Many basic issues are interrelated, though, including those of error, order and
disorder, similarity, and encoding.
5. Conclusions
The SOM-based approach seems to be well in line with the systemic and holistic principles
widely adopted in the cybernetic research community while it concentrates on the interaction
between processing elements, studies the effects of interactions, is especially suited to study
of nonlinear phenomena with mutually dependent variables, and the validation of the results is
based on comparison of the behavior of the model with reality.
In natural language processing new promising application areas are arising. It may be
concluded that we are gradually learning to understand how we learn to understand.
References
Elman Jeffrey (1991). Finding Structures in Time. Cognitive Science, 16, pp. 96-132.
von Foerster Heinz (1972a). Perception of the Future and the Future of Perception.
Instructional Science 1, 1, R.W. Smith, and G.F. Brieske (eds.), Elsevier/North-Holland, New
York/Amsterdam, pp. 31-43. (Also appeared in von Foerster, H.: Observing Systems,
Intersystems Publications, Seaside, CA, 1981, pp. 189-204.)
von Foerster Heinz (1972b). Notes on an Epistemology for Living Things. BCL Report
No. 9.3, Biological Computer Laboratory, Department of Electrical Engineering, University
of Illinois, Urbana, 22 p. (Also appeared in von Foerster, H.: Observing Systems,
Intersystems Publications, Seaside, CA, 1981, pp. 258-271.)
Honkela Timo and Vepsäläinen Ari M. (1991). Interpreting Imprecise Expressions:
Experiments with Kohonen's Self-Organizing Maps and Associative Memory. Artificial
Neural Networks, T. Kohonen and K. Mäkisara (eds.), vol. I, 897-902.
Honkela Timo (1993). Neural Nets that Discuss: A General Model of Communication Based
on Self-Organizing Maps. Proc. ICANN'93, Int. Conf. on Artificial Neural Networks, S.
Gielen and B. Kappen, Springer, London, 408-411.
Honkela Timo, Pulkki Ville, and Kohonen Teuvo (1995). Contextual relations of words in
Grimm tales analyzed by self-organizing map. In F. Fogelman-Soulie and P. Gallinari (eds.)
ICANN-95, Proceedings of International Conference on Artificial Neural Networks, vol. 2,
pp. 3-7. EC2 et Cie, Paris.
Honkela Timo, Kaski Samuel, Lagus Krista, and Kohonen Teuvo. Newsgroup Exploration
with WEBSOM Method and Browsing Interface. Report A32, Helsinki University of
Technology, Laboratory of Computer and Information Science, January, 1996.
Honkela Timo (1997). Self-Organizing Maps of Words for Natural Language Processing
Applications. Proceedings of Soft Computing '97, in print, September 17-19, 1997, 7 p.
Honkela Timo (1997). Emerging categories and adaptive prototypes: Self-organizing maps
for cognitive linguistics. Extended abstract, accepted to be presented in the International
Cognitive Linguistics Conference, Amsterdam, July 14-19, 1997.
Kaipainen Mauri (1994). Dynamics of Musical Knowledge Ecology - Knowing-What and
Knowing-How in the World of Sounds. PhD thesis, University of Helsinki, Helsinki, Finland,
Acta Musicologica Fennica 19.
Kaski Samuel (1997). Data Exploration Using Self-Organizing Maps. Dr.Tech thesis.
Helsinki University of Technology, Espoo, Finland, Acta Polytechnica Scandinavica, no. 82.
Kaski Samuel, Honkela Timo, Lagus Krista, and Kohonen Teuvo (1996). Creating an order in
digital libraries with self-organizing maps. Proceedings of WCNN'96, World Congress on
Neural Networks, Lawrence Erlbaum and INNS Press, Mahwah, NJ, pp. 814-817.
Kohonen Teuvo (1982). Self-organized formation of topologically correct feature maps.
Biological Cybernetics, 43, pp. 59-69.
Kohonen Teuvo (1995). Self-Organizing Maps. Springer-Verlag.
Kohonen Teuvo, Kaski Samuel, Lagus Krista, and Honkela Timo (1996). Very large two-
level SOM for the browsing of newsgroups. Proceedings of ICANN'96, International
Conference on Artificial Neural Networks.
Lagus Krista, Honkela Timo, Kaski Samuel, and Kohonen Teuvo (1996). Self-organizing
maps of document collections: a new approach to interactive exploration. E. Simoudis, J. Han,
and U. Fayyad (eds.), Proceedings of the Second International Conference on Knowledge
Discovery & Data Mining, AAAI Press, Menlo Park, CA, pp. 238-243.
Miikkulainen Risto (1993). Subsymbolic Natural Language Processing: An Integrated Model
of Scripts, Lexicon, and Memory. MIT Press, Cambridge, MA.
Ritter Helge and Kohonen Teuvo (1989). Self-organizing semantic maps. Biological
Cybernetics, vol. 61, no. 4, pp. 241-254.
Rosen Robert (1985). Anticipatory Systems. Pergamon Press.
Scholtes Jan C. (1993). Neural Networks in Natural Language Processing and Information
Retrieval. PhD thesis, University of Amsterdam, Amsterdam.
... As an overall paradigm, the SOM is widely used and repeatedly emerges as one of the most versatile general tools for analysis and visualization of large and complex data sets. 26 In terms of our particular goals, the value of the SOM is that it preserves topological neighborhood structure 27 and lends itself to applications which have hierarchical structure, 28 as well as those which don't. In one notable application, the WebSOM project 29 indexed over a million web pages and organized them by topical similarity. ...
... 16 Finally, we have discussed work similar to our own, like WebSOM 29,32 and related methods. [26][27][28]30,31,33 A direct comparison between our approach and typical machine-learning approaches to the problem is not particularly meaningful for two main reasons. Firstly, because in those approaches the categories are usually discrete and given a priori. ...
... We find that the algorithm achieves clustering performance comparable to that reported by the best of the comparable work in the field. [26][27][28][29][30][31][32][33] Recent examples of work similar to ours applied to a variety of texts such as marine inspection reports, crime reports, and qualitative research data are found in Refs 38-43. Unfortunately, these papers do not report their results in as much detail as we do, and thus we have been unable to make a quantitative comparison between these papers and our work. ...
Article
Full-text available
The huge amount of free-form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self-organizing maps, as considered in this context, and show how to apply the self-organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free-form commentary with a well-structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together. WIREs Data Mining Knowl Discov 2014, 4:71–86. doi: 10.1002/widm.1112 Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.
... Terms that have been used to describe semantically related words or semantic categories that have been found using unsupervised learning methods include 'emergent category' (Honkela 1998), 'latent class' (Hofmann 1999), 'topic' (Blei et al. 2003;Steyvers and Griffiths 2007) and 'sense' (Brody and Lapata 2009). The first three can be considered to be synonymous. ...
Article
Full-text available
In this article, automatically generated and manually crafted semantic representations are compared. The comparison takes place under the assumption that neither of these has a primary status over the other. While linguistic resources can be used to evaluate the results of automated processes, data-driven methods are useful in assessing the quality or improving the coverage of hand-created semantic resources. We apply two unsupervised learning methods, Independent Component Analysis (ICA), and probabilistic topic model at word level using Latent Dirichlet Allocation (LDA) to create semantic representations from a large text corpus. We further compare the obtained results to two semantically labeled dictionaries. In addition, we use the Self-Organizing Map to visualize the obtained representations. We show that both methods find a considerable amount of category information in an unsupervised way. Rather than only finding groups of similar words, they can automatically find a number of features that characterize words. The unsupervised methods are also used in exploration. They provide findings which go beyond the manually predefined label sets. In addition, we demonstrate how the Self-Organizing Map visualization can be used in exploration and further analysis. This article compares unsupervised learning methods and semantically labeled dictionaries. We show that these methods are able to find categorical information. In addition, they can further be used in an exploratory analysis. In general, information theoretically motivated and probabilistic methods provide results that are at a comparable level. Moveover, the automatic methods and human classifications give an access to semantic categorization that complement each other. Data-driven methods can furthermore be cost effective and adapt to a particular domain through appropriate choice of data sets.
... analysis is based on the structures of the sentences contained in the written texts. According to , the linguistic and grammatical dependence In order to carry out such analysis, the words and the sentences are: " positioned within categories as verbs and nouns, and those categories are used in abstract rules, for instance, those of the N (noun) type, NP (noun, pronoun), or VP (verb, pronoun) " [19] of a noun sentence and a Consequently, the number of categories or rules associated to a recurrent procedure permits the extraction of networks of author. These networks offer a the analog relationship between the subject (observer) and the distinctions he makes. ...
Article
Full-text available
The objective of this work is to observe and analyze the conceptual-relation structures in texts written in Spanish, from the perspective of second-order cybernetics. Texts are shaped by the syntactic structure of the sentences they contain. Conceptual-relationships emerge by transforming the text, using a text analysis platform named PAST. The transformation is guided by a set of established rules, permitting to observe a system. PAST can be defined as a powerful operator to transform texts written by the students in a Master's Program class. The platform is introduced as a tool enabling a graph representation of relationships among the concepts in the text. The internal procedure of the platform and the graph construction concerns the linguistic analysis. Although the resulting graphic representation does not follow a formal definition, it reflects an iterative construction to define a topology to create a semantic network. This network only makes sense to the author of the text. In other words, the author (observer), based on the cybernetics of the observing systems, uses a tool that enables him to make a second-order observation of a document that he/she has linearly written. The main epistemological concepts mentioned throughout the development of the proposal are discourse and distinction. The contribution of this work is the application of a methodology that conducts a recursive observation of both the direct and indirect relationships established by the words the author puts together in his/her written discourse. In the discussion of results the next phases of this work are presented.
... Through constructive theory it becomes apparent that there can be many and different interpretations and points of view towards an area of discussion. This kind of the subjectivity and intersubjectivity of interpretation may be visualized by using selforganizing maps [16]. ...
Article
Full-text available
In this article, the use of the self-organizing map (SOM) is approached on the basis of current theories of learning. Possibilities of computer and networked platforms that aim at helping human learning are also inspected. It is shown how the SOM can be considered a model of constructive learning. The area of constructive learning is outlined and two cases of using the self-organizing map in computer supported learning environments are presented.
... One of the purposes of the comparison of the two methods is to introduce the method of SOM as relatively unexploited in psycho-lexical studies. Although there are some examples of applying SOM to linguistic data (e.g., Honkela, 1997; Lagus et al., 2002) there are no references to other studies of emotion concepts by the self-organizing maps, yet. In the field of psycho-lexical studies MDS has prevailed so far (e.g., the MDS based Geneva Emotion Wheel (Scherer, 2005)), despite SOM's great popularity in several areas of data analysis (Kohonen, 2000). ...
Article
Full-text available
Self-organizing map (SOM) and multidimensional scaling (MDS) are the methods of data analysis that reduce dimensionality of the input data and visualize the structure of multidimensional data by means of projection. Both methods are widely used in different research areas. In the studies of emotion vocabulary and other psycho-lexical surveys the MDS has been prevalent. In this paper both of the methods are introduced and as an illustration they are applied to a case study of Estonian emotion concepts. There is a need to introduce some new methods to the field because exploiting only one analytical tool may tend to reveal only specific properties of data and thus have an unwanted impact on the results.
... The individuality of interpretation is considered a problem when meaning and understanding are studied in the framework of symbolic representations and within model theoretical approaches. Earlier, the emergence of linguistic representations based on the self-organizing map has been studied, e.g., to create word category maps in which those words that appear in similar contexts are located close to each other on the map (consider, e.g., Ritter and Kohonen, 1989, Honkela and Vepsäläinen, 1991, Finch and Chater, 1992, Miikkulainen, 1993, Honkela et al., 1995, Honkela, 1997, Honkela, 2000b). By emphasizing the pattern nature of language and world, we avoid the idea that some relativism would be a problem as it is in the traditional epistemological research in which language is considered as a collection of propositions and the world consists of a collection of distinct objects and their relationships. ...
Article
Full-text available
Purpose Studies aspects of Heinz von Foerster's work that are of particular importance for cognitive science and artificial intelligence. Design/methodology/approach Kohonen's self‐organizing map is presented as one method that may be useful in implementing some of Von Foerster's ideas. The main foci are the distinction between trivial and non‐trivial machines and the concept of constructive learning. The self‐organizing map is also presented as a potential tool for alleviating the participatory crisis discussed by von Foerster. Findings The participatory crisis in society is discussed and the concept of change is handled within the framework of information systems development. Originality/value Considers the importance of considering change in information systems development.
Article
Taideteollinen korkeakoulu Medialaboratorio Shakki ja äly Tämän teoksen taustalla on kaksi kehityskulkua: yleisellä tasolla tietokonepelien yhteis-kunnallisen vaikutuksen ja taloudellisen merkityksen kasvu ja toisaalta se historiallinen tilanne, jossa ensimmäistä kertaa maailman paras shakinpelaaja joutui kohtaamaan parempanaan ei-inhimillisen vastustajan. Tästä shakkiottelusta, jossa Garri Kasparov hävisi Deep Blue -shakkikoneelle, kirjoitettiin paljon. Monissa kirjoituksissa oltiin huolissaan ihmisen asemasta koneiden rinnalla. Shakin pelaamista on pidetty hyvänä esimerkkinä monimutkaisesta ja älykkäästä toiminnasta. Nyt, kun kone on saavuttanut parhaan inhimillisen tason, on esitetty, että ihmisen asema olisi jotenkin uhattuna. Tapa, jolla shakkiohjelmat yleensä ja Deep Blue erityisesti saavuttavat menestyksensä, poikkeaa kuitenkin selvästi siitä, miten ihminen toimii päättäessään siitä, minkälainen siirto seuraavaksi kannattaa tehdä kulloisessa tilanteessa. Ihmisen päätöksenteko perus-tuu pitkän ajan kuluessa kehittyneeseen näkemykseen, kykyyn erottaa tilanteessa olennaiset näkökohdat ja valita vaihtoehtojen joukosta vain parhaimmat harkittaviksi. Kone puolestaan käy läpi valtavan joukon mahdollisia jatkoja arvioiden kunkin tilanteen varsin pinnallisesti (Lehtola ja Honkela 1993). Tähän aihekokonaisuuteen tämä teos tarjoaa erittäin ajankohtaisen ja korkeatasoisen näkymän. Ensinnäkin IBM:n Deep Blue -tiimin edustaja Murray Campbell kertoo Kasparov-ottelun kulusta ja taustoista sekä itse Deep Blue'sta. Kirjoitus on peräisin Campbellin vast'ikäiseltä Suomen vierailulta muistiinpanoina esityksestä, jonka hän piti Teknillisessä korkea-koulussa. Pertti Saariluoma tarkastelee puolestaan lukuisten tutkimustensa pohjalta ihmisen kognitiivisia prosesseja shakkia pelatessa. Saariluoman erityisen tarkastelun kohteena on aloittelijan ja asiantuntevan pelaajan välinen ero siinä, kuinka he hahmottavat tilanteita shakkilaudalla ja pystyvät valitsemaan olennaiset vaihtoehdot. Heikki Hyöty-niemi ja Pertti Saariluoma yhdessä kertovat siitä, kuinka konnektionistisilla (keino-tekoisiin neuroverkkoihin perustuvilla) malleilla voidaan saada aikaan keinotekoisia Teoksessa: Pelit, tietokone ja ihminen, T. Honkela (toim.), Suomen tekoälyseura, 1999, ss. 1-7.
Article
Full-text available
Conceived as an installation on the topic of the archive and memory, Pockets Full of Memories was exhibited on the main floor of the Centre Pompidou National Museum of Modern Art, Paris from 10 April to 3 September 2001. During this time, approximately 20,000 visitors came to view the installation and contributed over 3300 objects in their possession, digitally scanning and describing them. This information was stored in a database and organized by the Kohonen Self-Organizing Map algorithm [http://www.cis.hut.fi/teuvo/] that positioned objects of similar descriptions near each other in a two-dimensional map. The map of objects was projected in the gallery space and was also accessible online at [www.pocketsfullofmemories.com] where individuals in the gallery and at home could review the objects and add comments and stories to any of them.
Article
Full-text available
In this article, we present a model of a cognitive system, or an agent, with the fol-lowing properties: it can perceive its environment, it can move in its environment, it can perform some simple actions, and it can send and receive messages. The main components of its internal structure include a working memory, a seman-tic memory, and a decision making mechanism. In our implemented simulation model, the agent associates linguistic expressions and visual perceptions. The main motivation for communication is to exchange information. The linguistic expressions are not symbolic but pattern-like. With the current framework and simulation tool, we wish to provide a useful model for language emergence based on the unsupervised learning paradigm among a community of communicating autonomous agents. In the future, we plan to include other aspects of cognitive modeling including more realistic multimodal information processing, anticipa-tory decision making, language evolution, and emotional modeling.
Article
Full-text available
Kohonen's Self-Organizing Map (SOM) is a general unsupervised tool for ordering high- dimensional statistical data so that neighboring no des on the map represent similar inputs. Often the SOM is applied to numerical data in application areas like pattern recognition, signal processing, and multiva riate statistical analysis. The SOM can also be used to f ind statistical similarities between symbols if suitabl e contextual information is available as there indeed is for words in natural language texts. Using the shor t contexts of each word occurring in a text the SOM algorithm is able to organize the words into grammatical and semantic categories represented on a two-dimensional array. The similarity of the catego ries is reflected in their distance relationships on the array. This kind of a word category map may then be utiliz ed in applications such as the analysis of large docum ent collections. In addition to the description of information retrieval and textual data mining applications, this paper outlines the relation of w ord category maps to the symbolic knowledge representation formalisms. The graded nature of the categorization performed by the SOM is discussed. The aim is also to provide an overview on the resea rch results and on the potential new areas.
Chapter
There have been attempts to develop systems in which a collection of artificial agents co-operate communicating with each other to solve problems in a particular domain. In designing such systems one has to decide how autonomous the agents are, what is the level of communication between them, do all the agents have a common model of the environment, and what is the nature of that model. In this paper, the requirements for the most advanced levels of autonomy and communication are studied. The advantages of unsupervised learning models, particularly Kohonen’s self-organizing maps, as a basis for intelligent communication between multiple agents, are presented.
Book
At the very outset of this work, in Sect.?? above, we tentatively defined the concept of an anticipatory system; a system containing a predictive model of itself and/or of its environment, which allows it to change state at an instant in accord with the model’s predictions pertaining to a later instant. Armed as we now are with a clearer idea of what a model is, we may now return to this concept, and attempt to obtain a deeper understanding of some of the principal properties of such systems. It is well to open our discussion with a recapitulation of the main features of the modeling relation itself, which is by definition the heart of an anticipatory system.
Article
Self-organized formation of topographic maps for abstract data, such as words, is demonstrated in this work. The semantic relationships in the data are reflected by their relative distances in the map. Two different simulations, both based on a neural network model that implements the algorithm of the selforganizing feature maps, are given. For both, an essential, new ingredient is the inclusion of the contexts, in which each symbol appears, into the input data. This enables the network to detect the logical similarity between words from the statistics of their contexts. In the first demonstration, the context simply consists of a set of attribute values that occur in conjunction with the words. In the second demonstration, the context is defined by the sequences in which the words occur, without consideration of any associated attributes. Simple verbal statements consisting of nouns, verbs, and adverbs have been analyzed in this way. Such phrases or clauses involve some of the abstractions that appear in thinking, namely, the most common categories, into which the words are then automatically grouped in both of our simulations. We also argue that a similar process may be at work in the brain.
Article
Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach is to represent time implicitly by its effects on processing rather than explicitly (as in a spatial representation). The current report develops a proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory. In this approach, hidden unit patterns are fed back to themselves; the internal representations which develop thus reflect task demands in the context of prior internal states. A set of simulations is reported which range from relatively simple problems (temporal version of XOR) to discovering syntactic/semantic features for words. The networks are able to learn interesting internal representations which incorporate task demands with memory demands; indeed, in this approach the notion of memory is inextricably bound up with task processing. These representations reveal a rich structure, which allows them to be highly context-dependent, while also expressing generalizations across classes of items. These representations suggest a method for representing lexical categories and the type/token distinction.
Book
Much connectionist research in natural language processing has been concerned with isolated aspects of understanding language. Very few researchers have attempted to build comprehensive computational models that are biologically and psychologically plausible and that incorporate the components necessary for modeling and testing various complex high-level cognitive phenomena. Miikkulainen's book is an excep-tion to this trend. Using script understanding as a testbed, he shows how script-based inferences can be learned from experience on the basis of the statistical correlations implicit in the example data. He also shows how episodic memory organization can be automatically formed on the basis of these statistical regularities, and how word semantics can be learned from actual use. In an attempt to overcome some of the limitations of traditional AI symbolic approaches to this problem--the processing ar-chitecture, mechanisms, and knowledge are hand-coded, and inferences are based on handcrafted rules--he constructs a distributed neural network model composed solely of artificial neural network components. An important aspect of his system is its ability to address such questions as where performance errors, such as dyslexic errors and semantic slips, come from, how mem-ory can become overloaded, and why certain types of memory confusions can occur in such situations. Constructs such as topological and hierarchical feature maps are introduced to address such issues. Topological maps have the property that complex similarity relationships of some high-dimensional input space become visible on the map. In addition, the maps can be formed by an unsupervised learning process. The hierarchical nature of these maps makes it possible to characterize the input from several graded perspectives: from gross high-level classifications to more specific in-stantiations of data. Thus a story about Bill eating a lobster pizza appetizer at Biba in Boston could be grossly characterized as a story about a restaurant, or more specifi-cally, a fancy restaurant, or even more specifically about Bill eating lobster pizza.