Krisztian Balog's research while affiliated with Google Inc. and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (55)
Research on conversational search has so far mostly focused on query rewriting and multi-stage passage retrieval. However, synthesizing the top retrieved passages into a complete, relevant, and concise response is still an open challenge. Having snippet-level annotations of relevant passages would enable both (1) the training of response generation...
Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models...
Web search is an experience that naturally lends itself to recommendations, including query suggestions and related entities. In this article, we propose to recommend specific tasks to users, based on their search queries, such as planning a holiday trip or organizing a party. Specifically, we introduce the problem of query-based task recommendatio...
Conversational systems can be particularly effective in supporting complex information seeking scenarios with evolving information needs. Finding the right products on an e-commerce platform is one such scenario, where a conversational agent would need to be able to provide search capabilities over the item catalog, understand and make recommendati...
This paper presents an ecosystem for personal knowledge graphs (PKG), commonly defined as resources of structured information about entities related to an individual, their attributes, and the relations between them. PKGs are a key enabler of secure and sophisticated personal data management and personalized services. However, there are challenges...
Despite the potential impact of explanations on decision making, there is a lack of research on quantifying their effect on users' choices. This paper presents an experimental protocol for measuring the degree to which positively or negatively biased explanations can lead to users choosing suboptimal recommendations. Key elements of this protocol i...
This paper reports on an effort of reproducing the organizers’ baseline as well as the top performing participant submission at the 2021 edition of the TREC Conversational Assistance track. TREC systems are commonly regarded as reference points for effectiveness comparison. Yet, the papers accompanying them have less strict requirements than peer-r...
Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e.g. a playlist or radio) than over single items (e.g. songs). Unfortunately, this is an underexplored area of research, with most existing recommendation systems limited to understanding preferences over single items. Curating an i...
Explainable recommenders are systems that explain why an item is recommended, in addition to suggesting relevant items to the users of the system. Although explanations are known to be able to significantly affect a user's decision-making process, significant gaps remain concerning methodologies to evaluate them. This hinders cross-comparison betwe...
The term personal knowledge graph (PKG) has been broadly used to refer to structured representation of information about a given user, primarily in the form of entities that are personally related to the user. The potential of personal knowledge graphs as a means of managing and organizing personal data, as well as a source of background knowledge...
The 44th European Conference on Information Retrieval (ECIR'22) was held in Stavanger, Norway. It represents a landmark, not only for being the northernmost ECIR ever, but also for being the first major IR conference in a hybrid format. This article reports on ECIR'22 from the organizers' perspective, with a particular emphasis on elements of the h...
Conversational recommendation systems (CRSs) enable users to use natural language feedback to control their recommendations, overcoming many of the challenges of traditional recommendation systems. However, the practical adoption of CRSs remains limited due to a lack of rich and diverse conversational training data that pairs user utterances with r...
This paper reports on an effort of reproducing the organizers' baseline as well as the top performing participant submission at the 2021 edition of the TREC Conversational Assistance track. TREC systems are commonly regarded as reference points for effectiveness comparison. Yet, the papers accompanying them have less strict requirements than peer-r...
DAGFiNN is a conversational conference assistant that can be made available for a given conference both as a chatbot on the website and as a Furhat robot physically exhibited at the conference venue. Conference participants can interact with the assistant to get advice on various questions, ranging from where to eat in the city or how to get to the...
Knowledge graph question answering (KGQA) facilitates information access by leveraging structured data without requiring formal query language expertise from the user. Instead, users can express their information needs by simply asking their questions in natural language (NL). Datasets used to train KGQA models that would provide such a service are...
Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specif...
User simulation has been a cost-effective technique for evaluating conversational recommender systems. However, building a human-like simulator is still an open challenge. In this work, we focus on how users reformulate their utterances when a conversational agent fails to understand them. First, we perform a user study, involving five conversation...
Automated reviewer recommendation for scientific conferences currently relies on the assumption that the program committee has the necessary expertise to handle all submissions. However, topical discrepancies between received submissions and reviewer candidates might lead to unreliable reviews or overburdening of reviewers, and may result in the re...
Simulation is used as a low-cost and repeatable means of experimentation. As Information Retrieval (IR) researchers, we are no strangers to the idea of using simulation within our own field---such as the traditional means of IR system evaluation as manifested through the Cranfield paradigm. While simulation has been used in other areas of IR resear...
A key distinguishing feature of conversational recommender systems over traditional recommender systems is their ability to elicit user preferences using natural language. Currently, the predominant approach to preference elicitation is to ask questions directly about items or item attributes. These strategies do not perform well in cases where the...
This paper summarizes our participation in the SMART Task of the ISWC 2020 Challenge. A particular question we are interested in answering is how well neural methods, and specifically transformer models, such as BERT, perform on the answer type prediction task compared to traditional approaches. Our main finding is that coarse-grained answer types...
This paper presents a test collection for contextual point of interest (POI) recommendation in a narrative-driven scenario. There, user history is not available, instead, user requests are described in natural language. The requests in our collection are manually collected from social sharing websites, and are annotated with various types of metada...
We address how to robustly interpret natural language refinements (or critiques) in recommender systems. In particular, in human-human recommendation settings people frequently use soft attributes to express preferences about items, including concepts like the originality of a movie plot, the noisiness of a venue, or the complexity of a recipe. Whi...
Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we prop...
Providing personalized recommendations that are also accompanied by explanations as to why an item is recommended is a research area of growing importance. At the same time, progress is limited by the availability of open evaluation resources. In this work, we address the task of scientific literature recommendation. We present arXivDigest, which i...
Synthetic data generation is important to training and evaluating neural models for question answering over knowledge graphs. The quality of the data and the partitioning of the datasets into training, validation and test splits impact the performance of the models trained on this data. If the synthetic data generation depends on templates, as is t...
Conversational recommender systems support users in accomplishing recommendation-related goals via multi-turn conversations. To better model dynamically changing user preferences and provide the community with a reusable development framework, we introduce IAI MovieBot, a conversational recommender system for movies. It features a task-specific dia...
Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate...
Tabular data provide answers to a significant portion of search queries. However, reciting an entire result table is impractical in conversational search systems. We propose to generate natural language summaries as answers to describe the complex information contained in a table. Through crowdsourcing experiments, we build a new conversation-orien...
Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represents a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of info...
This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions. Specifically, we propose to develop and operate a prototype conversational search system for scholarly activities. This...
Knowledge graphs, organizing structured information about entities, and their attributes and relationships, are ubiquitous today. Entities, in this context, are usually taken to be anyone or anything considered to be globally important. This, however, rules out many entities people interact with on a daily basis. In this position paper, we present...
Most recommender systems base their recommendations on implicit or explicit item-level feedback provided by users. These item ratings are combined into a complex user model, which then predicts the suitability of other items. While effective, such methods have limited scrutability and transparency. For instance, if a user's interests change, then m...
It is held as a truism that deep neural networks require large datasets to train effective models. However, large datasets, especially with high-quality labels, can be expensive to obtain. This study sets out to investigate (i) how large a dataset must be to train well-performing models, and (ii) what impact can be shown from fractional changes to...
Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This stud...
It is held as a truism that deep neural networks require large datasets to train effective models. However, large datasets, especially with high-quality labels, can be expensive to obtain. This study sets out to investigate (i) how large a dataset must be to train well-performing models, and (ii) what impact can be shown from fractional changes to...
Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This stud...
A/B testing is currently being increasingly adopted for the evaluation of commercial information access systems with a large user base since it provides the advantage of observing the efficiency and effectiveness of information access systems under real conditions. Unfortunately, unless university-based researchers closely collaborate with industry...
Citations
... Further, we are unaware of any datasets that capture a user's detailed preferences in natural language, and attempt to rate recommendations on unseen items. Existing datasets such as [2,7] tend to rely on much simpler characterizations. ...
... Azzopardi et al. (2018) note that their list attempts to represent the main actions previously observed and discussed in the literature, but it is non-exhaustive and is meant to be taken as a starting point. For example, Bernard and Balog (2023) expand a selected set of communicative functions from ISO 24617-2 with intents from in order to characterize multi-goal conversations in an e-commerce setting. ...
... Explanations in RS can help users to decide about the relevance of recommendations (Tintarev and Masthoff, 2015;Millecamp et al., 2022) and therefore have an influence on usage (Herlocker et al., 2000;Tintarev and Masthoff, 2012;Zhang et al., 2014;Zhao et al., 2019;Millecamp et al., 2019;Tran et al., 2021;Guesmi et al., 2021;Xian et al., 2021;Balog et al., 2023). It is well known that explanations can significantly increase usage (Click-Through Rate (CTR)). ...
... In addition to those, facilitating multi-modal interactions would be especially pertinent in a conversational setting. While past research has primarily focused on uni-modal (i.e., natural language) interactions, other modalities as input (e.g., speech, pointing/clicking, body gestures) and output (e.g., speech and multimedia elements) have also begun to receive increasing attention (Liao et al., 2021;Kostric et al., 2022;Deldjoo et al., 2021;Hauptmann et al., 2020). Modeling the action space around these and selecting the appropriate modality for a given user action/context are interesting challenges to be addressed in future simulators. ...
... However, it differs in the task of focus, creating long-form narrative queries for NDR. Finally, our work also builds on the recent perspective of Radlinski et al. [33] who make a case for natural language user profiles driving recommenders -narrative requests tie closely to natural language user profiles. Our work presents a step toward these systems. ...
... In the case of literally no user data, an initial preference elicitation step is carried out with interactive recommendation approaches. These approaches can apply several channels for the acquisition of user knowledge, such as demographic filtering [3], presenting a set of features or contexts to be selected [9,24], providing a set of example recommendations to be labeled [34], and question-based preference elicitation [7,25] or intent detection via intelligent dialogue systems [28]. An initial interaction phase with the user supply data that enable the execution of subsequent recommendation models. ...
... We empirically evaluate Mint in a publicly available test collection for point of interest recommendation: pointrec [1]. To train our NDR models, we generate synthetic training data based on user-item interaction datasets from Yelp 3 . ...
... User simulators have also been successfully applied as a reinforcement learning environment for dialogue policy optimization [53]. However, user simulators as evaluation methods for TDSs are still under-explored [3]. ...
... We then had a group of annotators assess the binary relevance of each of these 100 queries to the 50 restaurants in our candidate set. Each review was labeled by 5 annotators and the annotations showed a kappa agreement score of 0.528, demonstrating moderate agreement according to Landis and Koch [15], which is expected given the subjectivity of the queries in this task [1]. There is a total number of 29k reviews in this dataset. ...
... .] semantic graphs that [contain] interdependencies between attributes". Regarding knowledge graphs, Linjordet and Balog [26] discuss the use of templates for knowledge graph generation and associated problems due to leakage across data splits, which informed our decision for a random train-test split on the generated knowledge graphs. We rely on the findings of Libes et al. [27], who explore challenges and suggest desirable features for the generation of synthetic data for manufacturing scenarios. ...