
João MagalhãesUniversidade NOVA de Lisboa | NOVA · Department of Informatics (DI)
João Magalhães
Professor
About
149
Publications
28,770
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
994
Citations
Introduction
Publications
Publications (149)
Conversational systems must be robust to user interactions that naturally exhibit diverse conversational traits. Capturing and simulating these diverse traits coherently and efficiently presents a complex challenge. This paper introduces Multi-Trait Adaptive Decoding (mTAD), a method that generates diverse user profiles at decoding-time by sampling...
Guiding users through complex procedural plans is an inherently multimodal task in which having visually illustrated plan steps is crucial to deliver an effective plan guidance. However, existing works on plan-following language models (LMs) often are not capable of multimodal input and output. In this work, we present MM-PlanLLM, the first multimo...
Training Large Language Models (LLMs) to follow user instructions has been shown to supply the LLM with ample capacity to converse fluently while being aligned with humans. Yet, it is not completely clear how an LLM can lead a plan-grounded conversation in mixed-initiative settings where instructions flow in both directions of the conversation, i.e...
Dialogue systems need to deal with the unpredictability of user intents to track dialogue state and the heterogeneity of slots to understand user preferences. In this paper we investigate the hypothesis that solving these challenges as one unified model will allow the transfer of parameter support data across the different tasks. The proposed princ...
With the growing interest in recreating live and realistic outside experiences within the confines of our homes, the online shopping industry has also been impacted. However, traditional modes of interaction with online storefronts have remained mainly unchanged. This paper studies the factors influencing user experience and interaction in 3D virtu...
Dialogue systems need to deal with the unpredictability of user intents to track dialogue state and the heterogeneity of slots to understand user preferences. In this paper we investigate the hypothesis that solving these challenges as one unified model will allow the transfer of parameter support data across the different tasks. The proposed princ...
For task-oriented dialog agents, the tone of voice mediates user-agent interactions, playing a central role in the flow of a conversation. Distinct from domain-agnostic politeness constructs, in specific domains such as online stores, booking platforms, and others, agents need to be capable of adopting highly specific vocabulary, with significant i...
This paper describes the vision, scientific contributions, and technical details of the Task Wizard (TWIZ) team's participation in the Alexa TaskBot Challenge 2021. Our bot design envisions the support of an engaging experience, where users are guided through multimodal conversations, towards the successful completion of the selected task. This is...
Face authentication and biometrics are becoming a commodity in many situations of our society. As its application becomes widespread, vulnerability to attacks becomes a challenge that needs to be tackled. In this paper, we propose a non-intrusive on the fly liveness detection system, based on 1D convolutional neural networks, that given pulse signa...
On the quest of providing a more natural interaction between users and search systems, open-domain conversational search assistants have emerged, by assisting users in answering questions about open topics in a conversational manner. In this work, we show how the Transformer architecture achieves state-of-the-art results in key IR tasks, leveraging...
Creating a cohesive, high-quality, relevant, media story is a challenge that news media editors face on a daily basis. This challenge is aggravated by the flood of highly relevant information that is constantly pouring onto the newsroom. To assist news media editors in this daunting task, this paper proposes a framework to organize news content int...
Queuing at airport border controls is one of the bottlenecks in the flow of passengers, which results in a poor travel experience and in serious health risks, like COVID19, due to the concentration of people and contact surfaces [4]. To address this problem, biometrics-on-the-move removes physical barriers for passengers, while preserving security...
The conversational search paradigm introduces a step change over the traditional search paradigm by allowing users to interact with search agents in a multi-turn and natural fashion. The conversation flows naturally and is usually centered around a target field of knowledge. In this work, we propose a knowledge-driven answer generation approach for...
The use of conversational assistants to search for information is becoming increasingly more popular among the general public, pushing the research towards more advanced and sophisticated techniques. In the last few years, in particular, the interest in conversational search is increasing, not only because of the generalization of conversational as...
Open-domain conversational search assistants aim at answering user questions about open topics in a conversational manner. In this paper we show how the Transformer architecture [30] achieves state-of-the-art results in key IR tasks, leveraging the creation of conversational assistants that engage in open-domain conversational search with single, y...
Open-domain conversational search assistants aim at answering user questions about open topics in a conversational manner. In this paper we show how the Transformer architecture achieves state-of-the-art results in key IR tasks, leveraging the creation of conversational assistants that engage in open-domain conversational search with single, yet in...
In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also...
Many children with speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game, which is controlled by the children's voices in real time, with the purpose of helping children on practicing the production of European Portuguese (EP) sibilant consonants. For this, the game uses a sibilant consonant cla...
The development of reliable speech therapy computer tools that automatically classify speech productions depends on the quality of the speech data set used to train the classification algorithms. The data set should characterize the population in terms of age, gender and native language, but it should also have other important properties that chara...
This two-volume set LNCS 12035 and 12036 constitutes the refereed proceedings of the 42nd European Conference on IR Research, ECIR 2020, held in Lisbon, Portugal, in April 2020.
The 55 full papers presented together with 8 reproducibility papers, 46 short papers, 10 demonstration papers, 12 invited CLEF papers, 7 doctoral consortium papers, 4 works...
This two-volume set LNCS 12035 and 12036 constitutes the refereed proceedings of the 42nd European Conference on IR Research, ECIR 2020, held in Lisbon, Portugal, in April 2020.
The 55 full papers presented together with 8 reproducibility papers, 46 short papers, 10 demonstration papers, 12 invited CLEF papers, 7 doctoral consortium papers, 4 works...
Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a no...
Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time. Under this paradigm, a new embedding is needed that structures visual-textual interactions according to the temporal dimension, thus, preserving data's original temporal organisation. This paper introduces a nove...
Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time. Under this paradigm, a new embedding is needed that structures visual-textual interactions according to the temporal dimension, thus, preserving data's original temporal organisation. This paper introduces a nove...
Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a no...
Many children suffering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children’s voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game do...
Media editors in the newsroom are constantly pressed to provide a "like-being there" coverage of live events. Social media provides a disorganised collection of images and videos that media professionals need to grasp before publishing their latest news updated. Automated news visual storyline editing with social media content can be very challengi...
Media editors in the newsroom are constantly pressed to provide a"like-being there" coverage of live events. Social media provides a disorganised collection of images and videos that media professionals need to grasp before publishing their latest news updated. Automated news visual storyline editing with social media content can be very challengin...
Distributing multimedia indexes to multiple nodes enables search over very large datasets (i.e., over one billion images and videos), but comes with a set of challenges: \textithow to distribute documents and queries effectively across nodes to support concurrent querying? andhow to deal with the increased potential for lack of response from nodes...
Traditional keyword extraction methods make the assumption that corpora is static. However, in social media, information is highly dynamic, with individual words showing a dynamic behaviour. In this paper we propose an unsupervised approach that jointly models words’ temporal behaviour and keyword’s semantic affinity, to address the task of dynamic...
The abundance and ever growing expansion of user-generated content defines a paradigm in multimedia consumption. While user immersion through audio has gained relevance in the later years due to the growing interest in virtual and augmented reality immersion technologies, the existent user-generated content visualization techniques are still not ma...
Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics...
The distortion of sibilant sounds is a common type of speech sound disorder in European Portuguese speaking children. Speech and language pathologists (SLP) use different types of speech production tasks to assess these distortions. One of these tasks consists of the sustained production of isolated sibilants. Using these sound productions, SLPs us...
Combining multiple retrieval functions can lead to notable gains in retrieval performance. Learning to Rank (LETOR) techniques achieve outstanding retrieval results, by learning models with no bounds on model complexity. Often, minor retrieval gains are attained at a significant cost in model complexity. This paper focuses on the research question:...
In this paper we address the task of gender classification on picture sharing social media networks such as Instagram and Flickr. We aim to infer the gender of an user given only a small set of the images shared in its profile. We make the assumption that user's images contain a collection of visual elements that implicitly encode discriminative pa...
Multimedia information have strong temporal correlations that shape the way modalities co-occur over time. In this paper we study the dynamic nature of multimedia and social-media information, where the temporal dimension emerges as a strong source of evidence for learning the temporal correlations across visual and textual modalities. So far, cros...
News editors need to find the photos that best illustrate a news piece and fulfill news-media quality standards, while being pressed to also find the most recent photos of live events. Recently, it became common to use social-media content in the context of news media for its unique value in terms of immediacy and quality. Consequently, the amount...
In microblog retrieval, query expansion can be essential to obtain good search results due to the short size of queries and posts. Since information in microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance feedback (PRF) with an external corpus has a higher chance of retrieving more relevant documents and improving ranking...
In microblog retrieval, query expansion can be essential to obtain good search results due to the short size of queries and posts. Since information in microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance feedback (PRF) with an external corpus has a higher chance of retrieving more relevant documents and improving ranking...
News editors need to find the photos that best illustrate a news piece and fulfill news-media quality standards, while being pressed to also find the most recent photos of live events. Recently, it became common to use social-media content in the context of news media for its unique value in terms of immediacy and quality. Consequently, the amount...
The rise of large data streams introduces new challenges regarding the delivery of relevant content towards an information need. This need can be seen as a broad topic of information. By identifying sub-streams within a broader data stream, we can retrieve relevant content that matches the multiple facets of the topic; thus summarizing information,...
Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms are based on document similarity, without partition size or redundancy constraints. Our goal is to create an index partitioning algorithm that addresses the specific properties of a distributed system: load balancing across nodes, redundancy in node f...
In this paper we propose a large-scale high-dimensional indexing algorithm based on sparse approximation and inverted indexing. Our goal was to devise a method that smoothly scales to handle databases with over 100 million descriptors on a single machine. To meet this goal, we implemented an inverted indexed based on a sparsifying dictionary with l...
In recommender systems, the cold-start problem is a common challenge. When a new item has no ratings, it becomes difficult to relate it to other items or users. In this paper, we address the cold-start problem and propose to leverage on social-media trends and reputations to improve the recommendation of new items. The proposed framework models the...
Using solely the information retrieved by audio fingerprinting techniques, we propose methods to treat a possibly large dataset of user-generated audio content, that (1) enable the grouping of several audio files that contain a common audio excerpt (i.e., are relative to the same event), and (2) give information about how those files are correlated...
The increase of the quantity of user-generated content experienced in social media has boosted the importance of analysing and organising the content by its quality. Here, we propose a method that uses audio fingerprinting to organise and infer the quality of user-generated audio content. The proposed method detects the overlapping segments between...
Using solely the information retrieved by audio fingerprinting techniques, we propose methods to treat a possibly large dataset of user-generated audio content, that (1) enable the grouping of several audio files that contain a common audio excerpt (i.e., are relative to the same event), and (2) give information about how those files are correlated...
The increase of the quantity of user-generated content experienced in social media has boosted the importance of analysing and organising the content by its quality. Here, we propose a method that uses audio fingerprinting to organise and infer the quality of user-generated audio content. The proposed method detects the overlapping segments between...
In this paper, we propose a collaborative system to let users share their own videos and interact among themselves to collaboratively do a video coverage of live events. Our intention is to motivate users to make positive contributions to the comprehensiveness of available videos about that event. To achieve this we propose a collaborative video fr...
This paper addresses the problem of balanced, redundant indexing of media information. Our goal is to partition and distribute the search index, taking advantage of the distributed systems properties: balanced load across nodes, redundancy on node down and efficient node usage under concurrent querying. We follow an information compression approach...
3D video is introducing great changes in many health related areas. The realism of such information provides health professionals with strong evidence analysis tools to facilitate clinical decision processes. Speech and language therapy aims to help subjects in correcting several disorders. The assessment of the patient by the speech and language t...
Speech is the main form of human communication. Thus it is important to detect and treat speech sound disorders as early as possible during childhood. When children need to attend speech therapy it is critical to keep them motivated on doing the therapy exercises.
Software systems for speech therapy can be a useful tool to keep the child interested...
Affective-interaction in computer games is a novel area with several new challenges, such as detecting players facial expressions robustly. Many of the existing facial expression datasets are composed of a set of posed face images not captured in a realistic affective-interaction setting. The contribution of this paper is an affective-interaction d...
Traditional speech therapy approaches for speech sound disorders have a lot of advantages to gain from computer-based therapy systems. With speech recognition techniques the motivation elements of these systems can be automated in order to get an interactive environment that motivates the therapy attendee towards better performances. Here we propos...
In this demo we show how we can enhance real-time microblog search by monitoring news sources on Twitter. We improve retrieval through query expansion using pseudo-relevance feedback. However, instead of doing feedback on the original corpus we use a separate Twitter news index. This allows the system to find additional terms associated with the or...
In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This pap...
In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This pap...