
Timo Honkela- PhD
- Professor (Full) at University of Helsinki
Timo Honkela
- PhD
- Professor (Full) at University of Helsinki
About
154
Publications
34,591
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,109
Citations
Introduction
Current institution
Publications
Publications (154)
This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservati...
The poster for CoLing 2016: Challenges in Multidimensional Sentiment Analysis Across Languages
In this article, automatically generated and manually crafted semantic representations are compared. The comparison takes place under the assumption that neither of these has a primary status over the other. While linguistic resources can be used to evaluate the results of automated processes, data-driven methods are useful in assessing the quality...
We present a novel Bayesian reinforcement learning algorithm that addresses model bias and exploration overhead issues. The algorithm combines different aspects of several state-of-the-art reinforcement learning methods that use Gaussian Processes model-based approaches to increase the use of the online data samples. The algorithm uses a smooth rew...
Surveys are widely conducted as a means to obtain information on thoughts, opinions and feelings of people. The representativeness of a sample is a major concern in using surveys. In this article, we consider meaning variation which is another potentially remarkable but less studied source of problems. We use Grounded Intersubjective Concept Analys...
In this paper, we study how to analyze and improve the quality of a large historical newspaper
collection. The National Library of Finland has digitized millions of newspaper pages. The quality of
the outcome of the OCR process is limited especially with regard to the oldest parts of the collection.
Approaches such as crowdsourcing has been used in...
Sentiment analysis has become a widely used approach to assess the emotional content of written documents such as customer feedback. In positive psychology research, the typical one-dimensional analysis framework has been extended to include five dimensions. This five-dimensional model, PERMA, enables a fine-grained analysis of written texts. We pr...
Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes.
Statistics
-----------
This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilli...
Emotional semantic image retrieval systems aim at incorporating the user’s affective states for responding adequately to the user’s interests. One challenge is to select features specific to image affect detection. Another challenge is to build effective learning models or classifiers to bridge the so-called “affective gap”. In this work, we study...
In this article, we consider how semantics of action verbs can be grounded on motion tracking data. We present the basic principles and requirements for grounding of verbs through case studies related to human movement. The data includes high-dimensional movement patterns and linguistic expressions that people have used to name these movements. We...
Statistical machine learning methods can provide help when developing preventative services and tools that support the empowerment of individuals. We explore how the self-organizing map could be utilized as a tool for analyzing, visualizing and browsing heterogeneous survey data on wellbeing that contains both quantitative (numeric) and qualitative...
An ideal verbally controlled virtual actor would allow the same interaction as instructing a real actor with a few words. Our goal is to create virtual actors that can be controlled with natural language instead of a predefined set of commands. In this paper, we present results related to a questionnaire where people described videos of human locom...
We present an approach for comparing human-made and automatically generated semantic representations with an assumption that neither of these has a primary status over the other. In the experimental part, we compare the results gained by using independent component analysis and the self-organizing map algorithm on word context analysis with a seman...
On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people...
Mobile proximity information provides a rich and detailed view into the social interactions of mobile phone users, allowing novel empirical studies of human behavior and context-aware applications. In this study, we apply a statistical anomaly detection method based on multivariate binomial mixture models to mobile proximity data from 106 users. Th...
In this article, we explore an application in an area of research called wellbeing informatics. More specifically, we consider how to build a system that could be used for searching stories that relate to the interest of the user (content relevance), and help the user in his or her developmental process by providing encouragement, useful experience...
We propose a probabilistic model class for the analysis of three-way count data, motivated by studying the subjectivity of lan-guage. Our models are applicable for instance to a data tensor of how many times each subject used each term in each context, thus revealing individual variation in natural language use. As our main goal is ex-ploratory ana...
It is generally accepted that there are cross-linguistic universal tendencies in the naming of colours. This is due in large part to the findings of Berlin and Kay. Recently, however, these universalist findings have been challenged, on both methodological and substantive grounds. Nisbett’s research on cultural cognition offers another interesting...
A substantial amount of subjectivity is involved in how people use language and conceptualize the world. Computational methods and formal representations of knowledge usually neglect this kind of individual variation. We have developed a novel method, Grounded Intersubjective Concept Analysis (GICA), for the analysis and visualization of individual...
Speech-to-speech machine translation is in some ways the peak of natural language pro- cessing, in that it deals directly with our original, oral mode of communication (as opposed to derived written language). As such, it presents challenges that are not to be taken lightly. Although existing technology covers each of the steps in the process, from...
We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language indepe...
n this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications of the initialization of the algorithms and properties of the algorithms in presence of missing data. We show that the commonly used principal component analysis (PCA) initialization of the GTM does not guarante...
In this article, we introduce the concept of pathways
of wellbeing and examine how such paths can be
discovered from large data sets using the
self-organizing map. Data sets used in the
illustrative experiments include measurements of
physical fitness and subjective assessments related
to diagnosing work stress.
In document clustering, semantically similar documents are grouped together. The dimensionality of document collections is often very large, thousands or tens of thousands of terms. Thus, it is common to reduce the original dimensionality before clustering for computational reasons. Cosine distance is widely seen as the best choice for measuring th...
In this work, we study people’s emotions evoked by viewing abstract art images based on traditional low-level image features within a binary classification framework. Abstract art is used here instead of artistic or photographic images because those contain contextual information
that influences the emotional assessment in a highly individual manne...
In this article, we present an analysis of the impact of nutrition and lifestyle on health at a global level. We have used
Self-organizing Maps (SOM) algorithm as the analysis technique. SOM enables us to visualize the relative position of each
country against a set of the variables related to nutrition, lifestyle and health. The positioning of the...
We present a selection of results produced in a project called Media Map. The project aims at developing an intuitive user
interface to a library information system containing data on projects and publications. The user interface is a two-dimensional
visual display created with the Self-Organizing Map algorithm. The map has been computed using the...
In this article, we introduce a method to make visible the differences among people regarding how they conceptualize the world. The Grounded Intersubjective Concept Analysis (GICA) method first employs a conceptual survey designed to elicit particular ways in which concepts are used among participants, aiming to exclude the level of opinions and va...
In this review and tutorial article, new developments towards extended use of information and communications technologies in science are discussed. The focus is in human and social sciences, specifically in linguistics and economics. Some challenging epistemological issues are handled in detail including the subjective and intersubjective nature of...
We study the combination of symbol frequence analysis and negative selection for anomaly detection of discrete sequences where
conventional negative selection algorithms are not practical due to data sparsity. Theoretical analysis on ergodic Markov
chains is used to outline the properties of the presented anomaly detection algorithm and to predict...
In this paper, we explore the possibility of applying a text mining method on a large qualitative source material concerning the history of information technology in one nation. This data was collected in the Swedish documentation project “From Computing Machines to IT.” We apply text mining on the interview transcripts of this Swedish documentatio...
This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase...
In this paper, we consider how to represent world knowledge using the self-organizing map (SOM), how to use a simple recurrent
network (SRN) to device sentence comprehension, and how to use the SOM output space to represent situations and facilitate
grounded logical reasoning.
In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that
measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with
the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A co...
In this article, we use the model adjectives using a vector space model. We further employ three different dimension reduction
methods, the Principal Component Analysis (PCA), the Self-Organizing Map (SOM), and the Neighbor Retrieval Visualizer (NeRV)
in the projection and visualization task, using antonym test for evaluation. The results show tha...
Our aim is to find syntactic and semantic relationships and roles of words based on the analysis of corpora. We study three methods for analyzing words in contexts as potential methods for solving this task. The methods are latent semantic anal-ysis, self-organizing map and independent component analysis. Latent semantic analysis is a simple method...
In this paper, we propose tensor based Maximum Margin Criterion algorithm (TMMC) for supervised dimensionality reduction. In TMMC, an image object is encoded as an nth-order tensor, and its 2-D representation is directly treated as matrix. Meanwhile, ...
The article provides an introduction to and a demonstration of the self-organizing map (SOM) method for organizational researchers interested in the use of qualitative data. The SOM is a versatile quantitative method very commonly used across many disciplines to analyze large data sets. The outcome of the SOM analysis is a map in which entities are...
The self-organizing map (SOM) is related to the classical vector quantization (VQ). Like in the VQ, the SOM represents a distribution
of input data vectors using a finite set of models. In both methods, the quantization error (QE) of an input vector can be
expressed, e.g., as the Euclidean norm of the difference of the input vector and the best-mat...
In this paper, we discuss problems related to the basic Semantic Web methodologies that are based on predicate logic and related
formalisms. We discuss complementary and alternative approaches. In particular, we suggest how the Self-Organizing Map can
be a basis for making the Semantic Web more semantic.
The complex phenomena of political science are typically studied using qualitative approach, potentially supported by hypothesis- driven statistical analysis of some numerical data. In this article, we present a complementary method based on data mining and specifically on the use of the self-organizing map. The idea in data mining is to explore th...
In this article, we consider contemporary theories of concepts, and Bayesian and self-organizing models of concept formation. After introducing the differ-ent models, we present our own experiment. It utilizes a multi-agent simulation framework, in which the emergence of a common vocabulary can be studied. In the experiment, we use jointly the self...
In time series prediction, one does often not know the properties of the underlying system generating the time series. For example, is it a closed system that is generating the time series or are there any external factors influencing the system? As a result of this, you often do not know beforehand whether a time series is stationary or nonstation...
The purpose of the present article is to examine the implications of the pragmatic web for the research and development of educational technology. It is argued that, beyond knowledge acquisition and social participation, technology-mediated learning environments based on a semantic and pragmatic web have the potential for facilitating creation and...
Finding ways in which communities of experts can benefit from each other is a question shared by the machine learning community and social sciences alike. Considerable research in machine learning methods has shown that communities of experts can provide consistently better classifications and decisions than single experts in various tasks and doma...
Latent semantic analysis (LSA) can be used to create an implicit semantic vectorial rep-resentation for words. Independent compo-nent analysis (ICA) can be derived as an extension to LSA that rotates the latent se-mantic space so that it becomes explicit, that is, the features correspond more with those resulting from human cognitive activ-ity. Thi...
This paper presents a method for creating interlingual word-to-word or phrase-to-phrase mappings between any two languages
using the self-organizing map algorithm. The method can be used as a component in a statistical machine translation system.
The conceptual space created by the self-organizing map serves as a kind of interlingual representation...
We propose a theoretical framework for modeling communication between agents that have different conceptual models of their current context. We describe how the emergence of subjective models of the world can be simulated and what the role of language and communication in that process is. We consider, in particular, the role of unsupervised learnin...
We propose a method for inferring semantic information from textual data in content-based multimedia retrieval. Training examples of images and videos belonging to a specific semantic class are associated with their low-level visual and aural descriptors augmented with textual features such as frequencies of significant words. A fuzzy mapping of a...
Biological systems have been an inspiration in the development of prototype-based clustering and vector quantization algorithms.
The two dominant paradigms in biologically motivated clustering schemes are neural networks and, more recently, biological
immune systems. These two biological paradigms are discussed regarding their benefits and shortcom...
In this article, we are studying the differences between the European Union languages using statistical and unsupervised methods. The analysis is conducted in the different levels of language: the lexical, morphological and syntactic. Our premise is that the difficulty of the translation could be perceived as differences or similarities in differen...
We present Likey, a language-independent keyphrase extraction method based on sta- tistical analysis and the use of a reference corpus. Likey has a very light-weight pre- processing phase and no parameters to be tuned. Thus, it is not restricted to any sin- gle language or language family. We test Likey having exactly the same configura- tion with...
In this article we approach neural networks as computational templates that travel across various sciences. Traditionally, it has been thought that models are primarily models of some target systems: they are assumed to represent partially or completely their target systems. We argue, instead, that many computational models cannot easily be conceiv...
Serious efforts to develop computerized systems for natural language understanding and machine translation have taken place for more than half a century. Some successful systems that translate texts in limited domains such as weather forecasts have been implemented. However, the more general the domain or complex the style of the text the more diff...
We present a probabilistic approach for detecting and analyzing changes in natural language motivated by biological immune systems. Contrary to traditional methods based on message-digest algorithms and line-by-line comparisons of two files, the proposed algorithm employs an implicit negative representation of text segments in the form of detector...
We show that independent component analysis (ICA) can be used to find distributed representations for words that can be further processed by thresholding to produce sparse representations. The applicability of the thresholded ICA representation is compared to singular value decomposition (SVD) in a multiple choice vocabulary task with three data se...
We present the results of an analysis of a text corpus of 129,000 abstracts of NSF-sponsored basic research projects between years 1990 and 2003. The methods used in the analysis include term extraction based on a reference corpus and an entropy measure, and the Self-Organizing Map algorithm for the formation of a term map and a document map. Metho...
A symbol as such is disassociated from the world. In addition, as a discrete entity a symbol does not mirror all the details
of the portion of the world that it is meant to refer to. Humans establish the association between the symbols and the referenced
domain – the words and the world – through a long learning process in a community. This paper s...
We propose a method of content-based multimedia retrieval of objects with visual, aural and textual properties. In our method, train- ing examples of objects belonging to a specific semantic class are associ- ated with their low-level visual descriptors (such as MPEG-7) and textual features such as frequencies of significant keywords. A fuzzy mappi...
An art installation was on display in the Centre Pompidou National Museum of Modern Art in Paris, where visitors could contribute
with their own personal objects, adding keyword descriptions and quantified semantic features such as age or hardness. The data was projected in real-time onto a Self-Organizing Map (SOM) which was shown in the gallery....
A vital mechanism of high‐level natural cognitive systems is the anticipatory capability of making decisions based on predicted events in the future. While in some cases the performance of computational cognitive systems can be improved by modeling anticipatory behavior, it has been shown that for many cognitive tasks anticipation is mandatory. In...
Quality of Internet health information is essential because it has the potential to benefit or harm a large number of people and it is therefore essential to provide consumers with some tools to aid them in assessing the nature of the information they are accessing and how they should use it without jeopardizing their relationship with their doctor...
In this article, we study the emergence of associations between words and concepts using the self-organizing map. In particular, we explore the meaning negotiations among communicating agents. The self-organizing map is used as a model of an agent's conceptual memory. The concepts are not explicitly given but they are learned by the agent in an uns...
In this article, we are studying the differences between the European languages using statistical and unsupervised methods. The analysis is conducted in different levels of language, lexical, morphological and syntactic. Our prem- ise is that the difficulty of the translation could be perceived as differences or similarities in different levels of...
In this position paper, we discuss some problems related to those semantic web methodologies that are straightforwardly based on predicate logic and related for malisms. We also discuss complementary and alternative approaches and provide some examples of such.
We study how independent component analysis can be used to create automatically syntactic and semantic features based on analyzing words in contexts.
Purpose
Studies aspects of Heinz von Foerster's work that are of particular importance for cognitive science and artificial intelligence.
Design/methodology/approach
Kohonen's self‐organizing map is presented as one method that may be useful in implementing some of Von Foerster's ideas. The main foci are the distinction between trivial and non‐tri...
According to a connectionist view, mental states consist of the activations of neural units in a connectionist network. We consider the similarity of representations that emerge in unsupervised, self-organization process of neural lattices when exposed to color spectrum stimuli. Self-organizing maps (SOM) are trained with color spectrum input, usin...
Our aim is to find syntactic and semantic relationships of words based on the analysis of corpora. We propose the application of independent component analysis, which seems to have clear advantages over two classic methods: latent semantic analysis and self-organizing maps. Latent semantic analysis is a simple method for automatic generation of con...
The WEBSOM is a method for analyzing and visualizing large document col-lections. In the WEBSOM method, the self-organizing map algorithm is used to automatically organize collections of documents onto a two-dimensional map to enable easy exploration and search of the collection. Map regions that are close to each other contain similar items. GS Te...
We study written language as if it were a multidimensional signal rather than a stream of symbols. We show that it is possible to find emergent features by independent component analysis from word contexts. The closeness of match between the learned features and traditional linguistic word categories is examined. It is shown that independent compon...
Our aim is to find syntactic and semantic relationships of words based on the analysis of corpora. We propose the application of independent component analysis, which seems to have clear advantages over two classic methods: latent semantic analysis and self-organizing maps. Latent semantic analysis is a simple method for automatic generation of con...
In this paper, we assume that word co-occurrence statistics can be used to extract meaningful features, exhibiting syntactic and se- mantic behavior, from text data. Independent component analysis (ICA), an unsupervised statistical method, is applied to word usage statistics, calculated from a natural language corpora, to extract a number of fea- t...
This article presents empirical evidence for the hypothesis that persons consider counterintuitive representations more likely to be religious than other kinds of beliefs. In three studies the subjects were asked to rate the probable religiousness of various kinds of imaginary beliefs. The results show that counterintuitive representations in gener...
We develop a framework for discussing the degree of conceptual autonomy of natural and artificial agents. We claim that aspects related to learning and communication necessitate adaptive agents that are partially autonomous. We demonstrate how partial conceptual autonomy can be obtained through a self-organization process. The input for the agents...
In this article, we present a model of a cognitive system, or an agent, with the fol-lowing properties: it can perceive its environment, it can move in its environment, it can perform some simple actions, and it can send and receive messages. The main components of its internal structure include a working memory, a seman-tic memory, and a decision...
Fuzzy logic, artificial neural network models and evolutionary computing are the main methodological tools of the soft computing area. This article provides an overview on two of them, namely neural networks and evolutionary models. The largest number of applications that combine these two is based on the idea that a genetic algorithm is used to op...
Kohonen’s Self-Organizing Map (SOM) is a means for automatically arranging high-dimensional statistical data. The map attempts
to represent all the input with optimal accuracy using a restricted set of models or prototypes. The prototypes also become
ordered on the map grid so that similar prototypes are close to each other and dissimilar prototype...
WEBSOM is a novel method for organizing document collections onto map displays to enhance the interactive browsing and retrieval of the documents. The map is organized automatically according to the contents of the full-text documents by the Self-Organizing Map algorithm. The map display provides a visual overview of the whole document collection....
The current availability of large collections of full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. Especially in the rapidly growing World Wide Web it is important to have methods for exploring miscellaneous document collections automatically. In the report, we introduce the WEBSOM method f...
In this article, the use of the self-organizing map (SOM) is approached on the basis of current theories of learning. Possibilities of computer and networked platforms that aim at helping human learning are also inspected. It is shown how the SOM can be considered a model of constructive learning. The area of constructive learning is outlined and t...
Questions
Question (1)
Our collaborator has developed a agent-based simulation on emergence of money (see below). Are you aware of related research? The current publication is a technical report but it seems that the result would deserve wider dissemination. Which journals have computational economics in their agenda?
Risto Linturi: Social simulation of networked barter economy with emergent money.