Kalervo Järvelin's research while affiliated with Tampere University and other places

Publications (197)

Article
Full-text available
The paper analyses Library and Information Science (LIS) articles published in leading international LIS journals based on their authors’ disciplinary backgrounds. The study combines content analysis of articles with authors’ affiliation analysis. The main research question is: Are authors’ disciplinary backgrounds associated with choice of researc...
Article
Full-text available
The study analyses contributions to Library and Information Science (LIS) by researchers representing various disciplines. How are such contributions associated with the choice of research topics and methodology? The study employs a quantitative content analysis of articles published in 31 scholarly LIS journals in 2015. Each article is seen as a c...
Article
Purpose This paper analyses the research in Library and Information Science (LIS) and reports on (1) the status of LIS research in 2015 and (2) on the evolution of LIS research longitudinally from 1965 to 2015. Design/methodology/approach The study employs a quantitative intellectual content analysis of articles published in 30+ scholarly LIS jour...
Chapter
This chapter will give an overview of how human languages differ from each other and how those differences are relevant to the development of human language understanding technology for the purposes of information access. It formulates what requirements information access technology poses (and might pose) to language technology. We also discuss a n...
Article
This article is a lightly edited text of the author's Salton Award Keynote: Information Interaction in Context, presented at the 41st SIGIR conference in Ann Arbor, July 9th, 2019. It first gives some personal background and then discusses some important areas of information seeking and IR in the author's research work. These include task-based inf...
Article
Cambridge Core - Knowledge Management, Databases and Data Mining - Interactive Information Seeking, Behaviour and Retrieval - edited by Ian Ruthven
Article
Interactive Information Seeking, Behaviour and Retrieval - edited by Ian Ruthven December 2013
Chapter
The Laboratory Model of Information Retrieval (IR) has dominated IR research for half a century. The focus of this system-driven research is IR algorithms and their evaluation. Algorithms are evaluated for their capability of finding topically relevant documents.
Conference Paper
Typing is a common form of query input for search engines and other information retrieval systems; we therefore investigate the relationship between typing behavior and search interactions. The search process is interactive and typically requires entering one or more queries, and assessing both summaries from Search Engine Result Pages and the unde...
Conference Paper
Full-text available
This paper investigates if Information Foraging Theory can be used to understand differences in user behavior when searching on mobile and desktop web search systems. Two groups of thirty-six participants were recruited to carry out six identical web search tasks on desktop or on mobile. The search tasks were prepared with a different number and di...
Article
This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in modem large IR environments. The proposed methods are (1) a novel a...
Article
Full-text available
A searcher’s interaction with a retrieval system consists of actions such as query formulation, search result list interaction and document interaction. The simulation of searcher interaction has recently gained momentum in the analysis and evaluation of interactive information retrieval (IIR). However, a key issue that has not yet been adequately...
Article
Information seeking research often reports about types of information resources, ways of acquiring them and opinions on their importance in various professions. Based on self-reporting, these findings are affected by human memory and rationalisation. This article proposes a new way of studying information resource use – based on dwell time in the c...
Article
Information searching in practice seldom is an end in itself. In work, work task (WT) performance forms the context, which information searching should serve. Therefore, information retrieval (IR) systems development/evaluation should take the WT context into account. The present paper analyzes how WT features: task complexity and task types, affec...
Article
Purpose The purpose of this paper is to investigate information retrieval (IR) in the context of authentic work tasks (WTs), as compared to traditional experimental IR study designs. Design/methodology/approach The participants were 22 professionals working in municipal administration, university research and education, and commercial companies. T...
Conference Paper
Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the di...
Article
Full-text available
The Dagstuhl Seminar on "Reproducibility of Data-Oriented Experiments in e-Science", held on 24-29 January 2016, focused on the core issues and approaches to reproducibility of experiments from a multidisciplinary point of view, sharing the experience coming from several fields of computer science. In this paper, we discuss, summarize, and adapt th...
Article
Full-text available
We present a novel measure for ranking evaluation, called Twist (τ). It is a measure for informational intents, which handles both binary and graded relevance. x stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well...
Conference Paper
Searching naturally involves stopping points, both at a query level (how far down the ranked list should I go?) and at a session level (how many queries should I issue?). Understanding when searchers stop has been of much interest to the community because it is fundamental to how we evaluate search behaviour and performance. Research has shown that...
Conference Paper
Studies in interactive information retrieval (IIR) indicate that expert searchers differ from novices in many ways. In the present paper, we identify a number of behavioral dimensions along which searchers differ (e.g. cost, gain and the accuracy of relevance assessment). We quantify these differences using simulated, multi-query search sessions. W...
Conference Paper
Most models, measures and simulations often assume that a searcher will stop at a predetermined place in a ranked list of results. However, during the course of a search session, real-world searchers will vary and adapt their interactions with a ranked list. These interactions depend upon a variety of factors, including the content and quality of t...
Conference Paper
Full-text available
In this paper, we address the question “what is the influence of user search behaviour on the effectiveness of personalized query suggestion?”. We implemented a method for query suggestion that generates candidate follow-up queries from the documents clicked by the user. This is a potentially effective method for query suggestion, but it heavily de...
Conference Paper
Full-text available
Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation-morphological variation, historical variation, and errors in optical character recognition (OCR). Because individual documents may differ significantly from each...
Article
Full-text available
Evaluation is central in research and development of information retrieval (IR). In addition to designing and implementing new retrieval mechanisms, one must also show through rigorous evaluation that they are effective. A major focus in IR is IR mechanisms’ capability of ranking relevant documents optimally for the users, given a query. Searching...
Article
The main target of this paper was to study the influence of training data quality on the text document classification performance of machine learning methods. A graded relevance corpus of ten classes and 957 text documents was classified with Self-Organising Maps (SOMs), learning vector quantisation, k-nearest neighbours searching, naïve Bayes and...
Article
Preprocessing of data is a vital part of any task involving machine learning. In the classification of text documents, the most important aspect of preprocessing is usually the dimensionality reduction of data vectors. This paper focuses on the use of a recent scatter method in the dimensionality reduction of text documents. The effectiveness of th...
Article
Relevance feedback (RF) has been studied under laboratory conditions using test collections and either test persons or simple simulation. These studies have given mixed results. Automatic (or pseudo) RF and intellectual RF, both leading to query reformulation, are the main approaches to explicit RF. In the present study we perform RF with the help...
Conference Paper
In real-life, information retrieval consists of sessions of one or more query iterations. Each iteration has several subtasks like query formulation, result scanning, document link clicking, document reading and judgment, and stopping. Each of the subtasks has behavioral factors associated with them. These factors include search goals and cost cons...
Article
Editor's Summary Recalling his start in information science studies, 2012 ASIS&T Research Award winner Kalervo Järvelin explained that reading seminal books in the field influenced his academic and research path. A call to devise a curriculum for classification, indexing and information retrieval drew him away from computer science and firmed his c...
Conference Paper
Full-text available
This paper presents results of a generative method for the management of morphological variation of query keywords in Bengali, Gujarati and Marathi. The method is called Frequent Case Generation (FCG). It is based on the skewed distributions of word forms in natu-ral languages and is suitable for languages that have either fair amount of morphologi...
Chapter
In modern large information retrieval (IR) environments, the number of documents relevant to a request may easily exceed the number of documents a user is willing to examine. Therefore it is desirable to rank highly relevant documents first in search results. To develop retrieval methods for this purpose requires evaluating retrieval methods accord...
Conference Paper
Full-text available
Measuring is a key to scientific progress. This is particularly true for research concerning complex systems. Multilingual and multime-dia information access systems, such as search engines, are increasingly complex: they need to satisfy diverse user needs and support challeng-ing tasks. Their development calls for proper evaluation methodologies t...
Article
Real life information retrieval takes place in sessions, where users search by iterating between various cognitive, perceptual and motor subtasks through an interactive interface. The sessions may follow diverse strategies, which, together with the interface characteristics, affect user effort (cost), experience and session effectiveness. In this p...
Conference Paper
The paper discusses briefly user-oriented evaluation in test collections with simulated users and real users, as well as operational systems evaluation. It concludes by a glimpse of issues beyond evaluation. The paper provides pointers to literature where much more thorough discussion of each topic may be found.
Article
The practical goal of information retrieval (IR) research is to create ways to support humans to better access information in order to better carry out their tasks. Because of this, IR research has a primarily technological interest in knowledge creation -- how to interact with information (better)? IR research therefore has a constructive aspect (...
Article
Full-text available
We analyze barriers to task-based information access in molecular medicine, focusing on research tasks, which provide task performance sessions of varying complexity. Molecular medicine is a relevant domain because it offers thousands of digital resources as the information environment. Data were collected through shadowing of real work tasks. Thir...
Article
Full-text available
A novel graph-based language-independent stemming algorithm suitable for information retrieval is proposed in this article. The main features of the algorithm are retrieval effectiveness, generality, and computational efficiency. We test our approach on seven languages (using collections from the TREC, CLEF, and FIRE evaluation platforms) of varyin...
Article
Full-text available
In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural...
Conference Paper
Much of the research in relevance feedback (RF) has been performed under laboratory conditions using test collections and either test persons or simple simulation. These studies have given mixed results. The design of the present study is unique. First, the initial queries are realistically short queries generated by real end-users. Second, we perf...
Conference Paper
The ultimate goal of information retrieval (IR) research is to create ways to support humans to better access information in order to better carry out their (work) tasks. Because of this, IR research has a primarily technological interest in knowledge creation – how to find information (better)? IR research therefore has a constructive aspect (to c...
Conference Paper
This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to...
Article
Semantic associations are direct or indirect linkages between two entities that are construed from existing associations among entities. In this paper we extend our previous query language approach for discovering semantic associations with an ability to retrieve semantic associations that, besides explicitly stated (base) associations, may contain...
Conference Paper
We present a dictionary- and corpus-independent statistical lemmatizer StaLe that deals with the out-of-vocabulary (OOV) problem of dictionary-based lemmatization by generating candidate lemmas for any inflected word forms. StaLe can be applied with little effort to languages lacking linguistic resources. We show the performance of StaLe both in le...
Article
The Discounted Cumulated Impact (DCI) index has recently been proposed for research evaluation. In the present work an earlier dataset by Cronin and Meho (2007) is reanalyzed, with the aim of exemplifying the salient features of the DCI index. We apply the index on, and compare our results to, the outcomes of the Cronin-Meho (2007) study. Both auth...
Conference Paper
Users of traditional information retrieval (IR) systems encounter the problems of vocabulary mismatch and fuzzy search goals. This is due to the number of ways the search concepts may be expressed in texts. The effects of the vocabulary mismatch can be alleviated via query expansion and document annotation through thesauri, tags or ontologies. On t...
Article
Full-text available
All search in the real-world is inherently interactive. Information retrieval (IR) has a firm tradition of using simulation to evaluate IR systems as embodied by the Cranfield paradigm. However, to a large extent, such system evaluations ignore user interaction. Simulations provide a way to go beyond this limitation. With an increasing number of re...
Conference Paper
There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries...
Conference Paper
Full-text available
This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics. We therefore constructed self-organising maps that were effective for this task and tested them with German newspaper documents. We compared the results gained to those o...
Article
Full-text available
There are numerous approaches for integrating data from heterogeneous data sources. A common background assumption is that the data sources remain quite stable and are known in advance. Hence an integration system can be built to manipulate them. In practice there is, however, often a demand for supporting ad hoc information needs concerning unexpe...
Conference Paper
The paper makes three points of significance for IR research: (1) The Cranfield paradigm of IR evaluation seems to lose power when one looks at human instead of system performance. (2) Searchers using IR systems in real-life use rather short queries, which individually often have poor performance. However, when used in sessions, they may be surpris...
Article
Full-text available
In the context of creating large scale test collections, the present paper discusses methods of constructing a patent test collection for evaluation of prior art search. In particular, it addresses criteria for topic selection and identiflcation of recall bases. These issues arose while organizing the CLEF-IP evaluation track and were the subject o...
Article
The Laboratory Model of information retrieval (IR) evaluation has been challenged by pro-gress in research related to relevance and information seeking as well as by the growing need for accounting for interaction in evaluation. Real human users introduce non-binary, subjec-tive and dynamic relevance judgments into IR processes and affect these pro...
Article
Purpose – The aim of this paper is to explore the possibility of retrieving information with Kohonen self-organising maps, which are known to be effective to group objects according to their similarity or dissimilarity. Design/methodology/approach – After conventional preprocessing, such as transforming into vector space, documents from a German do...
Conference Paper
Research on relevance feedback (RFB) in information retrieval (IR) has given mixed results. Success in RFB seems to depend on the searcher's willingness to provide feedback and ability to identify relevant documents or query keys. The paper is based on simulating many user scenarios regarding the amount and quality of RFB. In addition, we experimen...
Article
Full-text available
An important area of improving access to health information is the study of task-based information access in the health domain. This is a significant challenge towards developing focused information retrieval (IR) systems. Due to the complexities of this context, its study requires multiple and often tedious means of data collection, which yields a...
Article
Full-text available
Kirjastotiede ja informatiikka — tiedon hankinnan tiede Järvelin, Kalervo; Vakkari, Pertti, Kirjastotiede ja informatiikka — tiedon han-kinnan tiede (Library and information science — a science of information seeking). Kirjastotiede ja informatiikka 7 (1): 18—32, 1988. The nature of library and information science (LIS) is discussed. The aim is to...
Article
The article by K. Järvelin & O. Persson published in JASIST 59(9), “The DCI-Index: Discounted Cumulated Impact-Based Research Evaluation,” (pp. 1433–1440) contains an unfortunate error in one of its formulas, Equation 3. The present paper gives the correction and an example of impact analysis based on the corrected formula.
Article
Artiklen demonstrerer Laboratorie-perspektivet på informationssøgning og hvordan det er indeholdt i det Integrerede Kognitive Forskningsperspektiv. Først diskuteres Laboratorie-perspektivets underliggende antagelser og velkendte ulemper og begrænsninger. Dernæst diskuteres informationsinteraktion fra et Integreret Kognitivt Perspektiv. ’Ultra-light...
Article
Full-text available
CLIR resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast vol- umes of data, offers a natural source for this. We experimented with fo- cused crawling as a means to acquire comparable corpora in the...
Article
Full-text available
Introduction. Describes and analyses the information environment of research work in molecular medicine. We presume an interdependence between the information environment, the research process and the related work tasks. Method. This is a qualitative case study using mixed methods. Empirical data were gathered using two surveys and six semi-structu...
Conference Paper
Full-text available