Timo Honkela

Timo Honkela
University of Helsinki | HY · Department of Modern Languages

PhD

About

154
Publications
28,956
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,922
Citations
Citations since 2017
0 Research Items
546 Citations
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100

Publications

Publications (154)
Conference Paper
Full-text available
This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservati...
Poster
Full-text available
The poster for CoLing 2016: Challenges in Multidimensional Sentiment Analysis Across Languages
Article
Full-text available
In this article, automatically generated and manually crafted semantic representations are compared. The comparison takes place under the assumption that neither of these has a primary status over the other. While linguistic resources can be used to evaluate the results of automated processes, data-driven methods are useful in assessing the quality...
Conference Paper
We present a novel Bayesian reinforcement learning algorithm that addresses model bias and exploration overhead issues. The algorithm combines different aspects of several state-of-the-art reinforcement learning methods that use Gaussian Processes model-based approaches to increase the use of the online data samples. The algorithm uses a smooth rew...
Conference Paper
Surveys are widely conducted as a means to obtain information on thoughts, opinions and feelings of people. The representativeness of a sample is a major concern in using surveys. In this article, we consider meaning variation which is another potentially remarkable but less studied source of problems. We use Grounded Intersubjective Concept Analys...
Conference Paper
Full-text available
In this paper, we study how to analyze and improve the quality of a large historical newspaper collection. The National Library of Finland has digitized millions of newspaper pages. The quality of the outcome of the OCR process is limited especially with regard to the oldest parts of the collection. Approaches such as crowdsourcing has been used in...
Conference Paper
Full-text available
Sentiment analysis has become a widely used approach to assess the emotional content of written documents such as customer feedback. In positive psychology research, the typical one-dimensional analysis framework has been extended to include five dimensions. This five-dimensional model, PERMA, enables a fine-grained analysis of written texts. We pr...
Data
Wikipedia Animal Dataset is a dataset created during December 2010 and January 2011 with data retrieved from Wikipedia. It is available for research purposes. Statistics ----------- This dataset is made up by 498 unique URLs corresponding to articles about animals. For each animal the article was collected in English, Finnish and Spanish, fulfilli...
Conference Paper
Full-text available
Emotional semantic image retrieval systems aim at incorporating the user’s affective states for responding adequately to the user’s interests. One challenge is to select features specific to image affect detection. Another challenge is to build effective learning models or classifiers to bridge the so-called “affective gap”. In this work, we study...
Conference Paper
Full-text available
In this article, we consider how semantics of action verbs can be grounded on motion tracking data. We present the basic principles and requirements for grounding of verbs through case studies related to human movement. The data includes high-dimensional movement patterns and linguistic expressions that people have used to name these movements. We...
Conference Paper
Full-text available
Statistical machine learning methods can provide help when developing preventative services and tools that support the empowerment of individuals. We explore how the self-organizing map could be utilized as a tool for analyzing, visualizing and browsing heterogeneous survey data on wellbeing that contains both quantitative (numeric) and qualitative...
Conference Paper
Full-text available
An ideal verbally controlled virtual actor would allow the same interaction as instructing a real actor with a few words. Our goal is to create virtual actors that can be controlled with natural language instead of a predefined set of commands. In this paper, we present results related to a questionnaire where people described videos of human locom...
Conference Paper
We present an approach for comparing human-made and automatically generated semantic representations with an assumption that neither of these has a primary status over the other. In the experimental part, we compare the results gained by using independent component analysis and the self-organizing map algorithm on word context analysis with a seman...
Article
On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people...
Conference Paper
Mobile proximity information provides a rich and detailed view into the social interactions of mobile phone users, allowing novel empirical studies of human behavior and context-aware applications. In this study, we apply a statistical anomaly detection method based on multivariate binomial mixture models to mobile proximity data from 106 users. Th...
Conference Paper
In this article, we explore an application in an area of research called wellbeing informatics. More specifically, we consider how to build a system that could be used for searching stories that relate to the interest of the user (content relevance), and help the user in his or her developmental process by providing encouragement, useful experience...
Conference Paper
Full-text available
We propose a probabilistic model class for the analysis of three-way count data, motivated by studying the subjectivity of lan-guage. Our models are applicable for instance to a data tensor of how many times each subject used each term in each context, thus revealing individual variation in natural language use. As our main goal is ex-ploratory ana...
Conference Paper
It is generally accepted that there are cross-linguistic universal tendencies in the naming of colours. This is due in large part to the findings of Berlin and Kay. Recently, however, these universalist findings have been challenged, on both methodological and substantive grounds. Nisbett’s research on cultural cognition offers another interesting...
Conference Paper
A substantial amount of subjectivity is involved in how people use language and conceptualize the world. Computational methods and formal representations of knowledge usually neglect this kind of individual variation. We have developed a novel method, Grounded Intersubjective Concept Analysis (GICA), for the analysis and visualization of individual...
Article
Full-text available
Speech-to-speech machine translation is in some ways the peak of natural language pro- cessing, in that it deals directly with our original, oral mode of communication (as opposed to derived written language). As such, it presents challenges that are not to be taken lightly. Although existing technology covers each of the steps in the process, from...
Article
We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language indepe...
Conference Paper
Full-text available
n this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications of the initialization of the algorithms and properties of the algorithms in presence of missing data. We show that the commonly used principal component analysis (PCA) initialization of the GTM does not guarante...
Conference Paper
Full-text available
In this article, we introduce the concept of pathways of wellbeing and examine how such paths can be discovered from large data sets using the self-organizing map. Data sets used in the illustrative experiments include measurements of physical fitness and subjective assessments related to diagnosing work stress.
Conference Paper
Full-text available
In document clustering, semantically similar documents are grouped together. The dimensionality of document collections is often very large, thousands or tens of thousands of terms. Thus, it is common to reduce the original dimensionality before clustering for computational reasons. Cosine distance is widely seen as the best choice for measuring th...
Conference Paper
Full-text available
In this work, we study people’s emotions evoked by viewing abstract art images based on traditional low-level image features within a binary classification framework. Abstract art is used here instead of artistic or photographic images because those contain contextual information that influences the emotional assessment in a highly individual manne...
Conference Paper
In this article, we present an analysis of the impact of nutrition and lifestyle on health at a global level. We have used Self-organizing Maps (SOM) algorithm as the analysis technique. SOM enables us to visualize the relative position of each country against a set of the variables related to nutrition, lifestyle and health. The positioning of the...
Conference Paper
Full-text available
We present a selection of results produced in a project called Media Map. The project aims at developing an intuitive user interface to a library information system containing data on projects and publications. The user interface is a two-dimensional visual display created with the Self-Organizing Map algorithm. The map has been computed using the...
Article
In this article, we introduce a method to make visible the differences among people regarding how they conceptualize the world. The Grounded Intersubjective Concept Analysis (GICA) method first employs a conceptual survey designed to elicit particular ways in which concepts are used among participants, aiming to exclude the level of opinions and va...
Article
Full-text available
In this review and tutorial article, new developments towards extended use of information and communications technologies in science are discussed. The focus is in human and social sciences, specifically in linguistics and economics. Some challenging epistemological issues are handled in detail including the subjective and intersubjective nature of...
Article
Full-text available
We study the combination of symbol frequence analysis and negative selection for anomaly detection of discrete sequences where conventional negative selection algorithms are not practical due to data sparsity. Theoretical analysis on ergodic Markov chains is used to outline the properties of the presented anomaly detection algorithm and to predict...
Conference Paper
Full-text available
In this paper, we explore the possibility of applying a text mining method on a large qualitative source material concerning the history of information technology in one nation. This data was collected in the Swedish documentation project “From Computing Machines to IT.” We apply text mining on the interview transcripts of this Swedish documentatio...
Conference Paper
Full-text available
This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase...
Conference Paper
In this paper, we consider how to represent world knowledge using the self-organizing map (SOM), how to use a simple recurrent network (SRN) to device sentence comprehension, and how to use the SOM output space to represent situations and facilitate grounded logical reasoning.
Conference Paper
Full-text available
In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A co...
Conference Paper
Full-text available
In this article, we use the model adjectives using a vector space model. We further employ three different dimension reduction methods, the Principal Component Analysis (PCA), the Self-Organizing Map (SOM), and the Neighbor Retrieval Visualizer (NeRV) in the projection and visualization task, using antonym test for evaluation. The results show tha...
Article
Full-text available
Our aim is to find syntactic and semantic relationships and roles of words based on the analysis of corpora. We study three methods for analyzing words in contexts as potential methods for solving this task. The methods are latent semantic anal-ysis, self-organizing map and independent component analysis. Latent semantic analysis is a simple method...
Article
In this paper, we propose tensor based Maximum Margin Criterion algorithm (TMMC) for supervised dimensionality reduction. In TMMC, an image object is encoded as an nth-order tensor, and its 2-D representation is directly treated as matrix. Meanwhile, ...
Article
Full-text available
The article provides an introduction to and a demonstration of the self-organizing map (SOM) method for organizational researchers interested in the use of qualitative data. The SOM is a versatile quantitative method very commonly used across many disciplines to analyze large data sets. The outcome of the SOM analysis is a map in which entities are...
Conference Paper
The self-organizing map (SOM) is related to the classical vector quantization (VQ). Like in the VQ, the SOM represents a distribution of input data vectors using a finite set of models. In both methods, the quantization error (QE) of an input vector can be expressed, e.g., as the Euclidean norm of the difference of the input vector and the best-mat...
Conference Paper
In this paper, we discuss problems related to the basic Semantic Web methodologies that are based on predicate logic and related formalisms. We discuss complementary and alternative approaches. In particular, we suggest how the Self-Organizing Map can be a basis for making the Semantic Web more semantic.
Conference Paper
Full-text available
The complex phenomena of political science are typically studied using qualitative approach, potentially supported by hypothesis- driven statistical analysis of some numerical data. In this article, we present a complementary method based on data mining and specifically on the use of the self-organizing map. The idea in data mining is to explore th...
Article
Full-text available
In this article, we consider contemporary theories of concepts, and Bayesian and self-organizing models of concept formation. After introducing the differ-ent models, we present our own experiment. It utilizes a multi-agent simulation framework, in which the emergence of a common vocabulary can be studied. In the experiment, we use jointly the self...
Conference Paper
Full-text available
In time series prediction, one does often not know the properties of the underlying system generating the time series. For example, is it a closed system that is generating the time series or are there any external factors influencing the system? As a result of this, you often do not know beforehand whether a time series is stationary or nonstation...
Article
Full-text available
The purpose of the present article is to examine the implications of the pragmatic web for the research and development of educational technology. It is argued that, beyond knowledge acquisition and social participation, technology-mediated learning environments based on a semantic and pragmatic web have the potential for facilitating creation and...
Article
Full-text available
Finding ways in which communities of experts can benefit from each other is a question shared by the machine learning community and social sciences alike. Considerable research in machine learning methods has shown that communities of experts can provide consistently better classifications and decisions than single experts in various tasks and doma...
Article
Full-text available
Latent semantic analysis (LSA) can be used to create an implicit semantic vectorial rep-resentation for words. Independent compo-nent analysis (ICA) can be derived as an extension to LSA that rotates the latent se-mantic space so that it becomes explicit, that is, the features correspond more with those resulting from human cognitive activ-ity. Thi...
Conference Paper
Full-text available
This paper presents a method for creating interlingual word-to-word or phrase-to-phrase mappings between any two languages using the self-organizing map algorithm. The method can be used as a component in a statistical machine translation system. The conceptual space created by the self-organizing map serves as a kind of interlingual representation...
Article
We propose a theoretical framework for modeling communication between agents that have different conceptual models of their current context. We describe how the emergence of subjective models of the world can be simulated and what the role of language and communication in that process is. We consider, in particular, the role of unsupervised learnin...
Article
We propose a method for inferring semantic information from textual data in content-based multimedia retrieval. Training examples of images and videos belonging to a specific semantic class are associated with their low-level visual and aural descriptors augmented with textual features such as frequencies of significant words. A fuzzy mapping of a...
Chapter
Biological systems have been an inspiration in the development of prototype-based clustering and vector quantization algorithms. The two dominant paradigms in biologically motivated clustering schemes are neural networks and, more recently, biological immune systems. These two biological paradigms are discussed regarding their benefits and shortcom...
Article
Full-text available
In this article, we are studying the differences between the European Union languages using statistical and unsupervised methods. The analysis is conducted in the different levels of language: the lexical, morphological and syntactic. Our premise is that the difficulty of the translation could be perceived as differences or similarities in differen...
Conference Paper
Full-text available
We present Likey, a language-independent keyphrase extraction method based on sta- tistical analysis and the use of a reference corpus. Likey has a very light-weight pre- processing phase and no parameters to be tuned. Thus, it is not restricted to any sin- gle language or language family. We test Likey having exactly the same configura- tion with...
Conference Paper
Full-text available
In this article we approach neural networks as computational templates that travel across various sciences. Traditionally, it has been thought that models are primarily models of some target systems: they are assumed to represent partially or completely their target systems. We argue, instead, that many computational models cannot easily be conceiv...
Conference Paper
Full-text available
Serious efforts to develop computerized systems for natural language understanding and machine translation have taken place for more than half a century. Some successful systems that translate texts in limited domains such as weather forecasts have been implemented. However, the more general the domain or complex the style of the text the more diff...
Conference Paper
We present a probabilistic approach for detecting and analyzing changes in natural language motivated by biological immune systems. Contrary to traditional methods based on message-digest algorithms and line-by-line comparisons of two files, the proposed algorithm employs an implicit negative representation of text segments in the form of detector...
Conference Paper
Full-text available
We show that independent component analysis (ICA) can be used to find distributed representations for words that can be further processed by thresholding to produce sparse representations. The applicability of the thresholded ICA representation is compared to singular value decomposition (SVD) in a multiple choice vocabulary task with three data se...
Article
Full-text available
We present the results of an analysis of a text corpus of 129,000 abstracts of NSF-sponsored basic research projects between years 1990 and 2003. The methods used in the analysis include term extraction based on a reference corpus and an entropy measure, and the Self-Organizing Map algorithm for the formation of a term map and a document map. Metho...
Conference Paper
Full-text available
A symbol as such is disassociated from the world. In addition, as a discrete entity a symbol does not mirror all the details of the portion of the world that it is meant to refer to. Humans establish the association between the symbols and the referenced domain – the words and the world – through a long learning process in a community. This paper s...
Conference Paper
We propose a method of content-based multimedia retrieval of objects with visual, aural and textual properties. In our method, train- ing examples of objects belonging to a specific semantic class are associ- ated with their low-level visual descriptors (such as MPEG-7) and textual features such as frequencies of significant keywords. A fuzzy mappi...
Conference Paper
An art installation was on display in the Centre Pompidou National Museum of Modern Art in Paris, where visitors could contribute with their own personal objects, adding keyword descriptions and quantified semantic features such as age or hardness. The data was projected in real-time onto a Self-Organizing Map (SOM) which was shown in the gallery....
Article
A vital mechanism of high‐level natural cognitive systems is the anticipatory capability of making decisions based on predicted events in the future. While in some cases the performance of computational cognitive systems can be improved by modeling anticipatory behavior, it has been shown that for many cognitive tasks anticipation is mandatory. In...
Article
Full-text available
Quality of Internet health information is essential because it has the potential to benefit or harm a large number of people and it is therefore essential to provide consumers with some tools to aid them in assessing the nature of the information they are accessing and how they should use it without jeopardizing their relationship with their doctor...
Conference Paper
Full-text available
In this article, we study the emergence of associations between words and concepts using the self-organizing map. In particular, we explore the meaning negotiations among communicating agents. The self-organizing map is used as a model of an agent's conceptual memory. The concepts are not explicitly given but they are learned by the agent in an uns...
Conference Paper
Full-text available
In this article, we are studying the differences between the European languages using statistical and unsupervised methods. The analysis is conducted in different levels of language, lexical, morphological and syntactic. Our prem- ise is that the difficulty of the translation could be perceived as differences or similarities in different levels of...
Article
Full-text available
In this position paper, we discuss some problems related to those semantic web methodologies that are straightforwardly based on predicate logic and related for malisms. We also discuss complementary and alternative approaches and provide some examples of such.