Ricardo CamposUniversidade da Beira Interior | UBI · Department of Computer Science
Ricardo Campos
PhD in Computer Science
About
112
Publications
34,308
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,587
Citations
Introduction
Ricardo Campos is a professor at the Polytechnic Institute of Tomar and member of LIAAD-INESC TEC. He is PhD in Computer Science by the U.Porto. His work on Temporal IR led him to win the Fraunhofer Portugal Challenge 2013 and to be distinguished as an “outstanding” researcher by the INESC TEC research lab. He is an editorial board member of IPM Journal, co-chaired international conferences in IR, being also a PC member of several international conferences. He has published research in the fields of IR having being awarded the best short paper award at ECIR'18, the best demo presentation award at ECIR'19, the Recognized Reviewer Award in 2019. In 2018 he has also been awarded the 1st prize of the Arquivo.pt Award for his project Conta-me História. More at http://www.ccc.ipt.pt/~ricardo
Additional affiliations
October 2003 - March 2017
Publications
Publications (112)
Temporal information retrieval has been a topic of great interest in recent years. Its purpose is to improve the effectiveness of information retrieval methods by exploiting temporal information in documents and queries. In this article, we present a survey of the existing literature on temporal information retrieval. In addition to giving an overv...
Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not...
In the web environment, most of the queries issued by users are implicit by nature. Inferring the different temporal intents of this type of query enhances the overall temporal part of the web search results. Previous works tackling this problem usually focused on news queries, where the retrieval of the most recent results related to the query are...
In this work, we propose a lightweight approach for keyword
extraction and ranking based on an unsupervised methodology to select the most
important keywords of a single document. To understand the merits of our
proposal, we compare it against RAKE, TextRank and SingleRank methods
(three well-known unsupervised approaches) and the baseline TF.IDF,...
In this paper, we present YAKE!, a novel feature-based system for
multi-lingual keyword extraction from single documents, which supports texts
of different sizes, domains or languages. Unlike most systems, YAKE! does not
rely on dictionaries or thesauri, neither it is trained against any corpora. Instead,
we follow an unsupervised approach which bu...
Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive informati...
Event extraction is an Information Retrieval task that commonly consists of identifying the central word for the event (trigger) and the event's arguments. This task has been extensively studied for English but lags behind for Portuguese, partly due to the lack of task-specific annotated corpora. This paper proposes a framework in which two separat...
The Seventh International Workshop on Narrative Extraction from Texts (Text2Story'24) was held on March 24 th , 2024, in conjunction with the 46 th European Conference on Information Retrieval (ECIR 2024) in Glasgow, Scotland. Over the day, more than 50 attendees engaged in discussions and presentations focused on recent advancements in narrative r...
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness...
The Text2Story Workshop series, dedicated to Narrative Extraction from Texts, has been running successfully since 2018. Over the past six years, significant progress, largely propelled by Transformers and Large Language Models, has advanced our understanding of natural language text. Nevertheless, the representation, analysis, generation, and compr...
Interest in the news has been declining and digital news subscriptions are still a hard sell for the average Internet user who is often used to consuming news through social media without any fees. To attract readers and engage with them, digital news outlets are forced to look for and integrate innovative solutions. In this work, we propose Perfil...
The first edition of the International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT'23) was held on July 27 th , 2023, in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) in Taipei, Taiwan. To support both online and in-person particip...
Event extraction is an Information Retrieval task that commonly consists of identifying the central word for the event (trigger) and the event’s arguments. This task has been extensively studied for English but lags behind for Portuguese, partly due to the lack of task-specific annotated corpora. This paper proposes a framework in which two separat...
Topics discussed on social media platforms contain a disparate amount of information written in colloquial language, making it difficult to understand the narrative of the topic. In this paper, we take a step forward, towards the resolution of this problem by proposing a framework that performs the automatic extraction of narratives from a document...
The Sixth International Workshop on Narrative Extraction from Texts (Text2Story'23) was held on April 2 nd , 2023, in conjunction with the 45 th European Conference on Information Retrieval (ECIR 2023) in Dublin, Ireland. Continuing the tradition of past years, the workshop was held as a hybrid event. Online participation was allowed using the Zoom...
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summa...
In our data-flooded age, an enormous amount of redundant, but also disparate textual data is collected on a daily basis on a wide variety of topics. Much of this information refers to documents related to the same theme, that is, different versions of the same document, or different documents discussing the same topic. Being aware of such differenc...
Our civilization creates enormous volumes of digital data, a substantial fraction of which is preserved and made publicly available for present and future usage. Additionally, historical born-analog records are progressively being digitized and incorporated into digital document repositories. While professionals often have a clear idea of what they...
This paper presents a new test collection for Legal IR, FALQU: Finding Answers to Legal Questions, where questions and answers were obtained from Law Stack Exchange (LawSE), a Q&A website for legal professionals, and others with experience in law. Much in line with Stack overflow, Law Stack Exchange has a variety of questions on different topics su...
Over these past five years, significant breakthroughs, led by Transformers and large language models, have been made in understanding natural language text. However, the ability to capture contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. These unsolved...
In recent years, the amount of information generated, consumed and stored has grown at an astonishing rate, making it difficult for those seeking information to extract knowledge in good time. This has become even more important, as the average reader is not as willing to spare more time out of their already busy schedule as in the past, thus prior...
The rise of social media has brought a great transformation to the way news are discovered and shared. Unlike traditional news sources, social media allows anyone to cover a story. Therefore, sometimes an event is already discussed by people before a journalist turns it into a news article. Twitter is a particularly appealing social network for dis...
Over the past few decades, the amount of information generated turned the Web into the largest knowledge infrastructure existing to date. Web archives have been at the forefront of data preservation, preventing the losses of significant data to humankind. Different snapshots of the web are saved everyday enabling users to surf the past web and to t...
Extracting keywords from textual data is a crucial step for text analysis. One such process may involve a considerable amount of time when done manually. In this paper, we show how keyword extraction techniques can be used to untap texts of political nature. To accomplish this objective, we conduct a case-study on top of 16 Portuguese (PT) politica...
The Fifth International Workshop on Narrative Extraction from Texts (Text2Story'22) was held on the April 10 th , 2022, in conjunction with the 44 th European Conference on Information Retrieval (ECIR 2022) in Stavanger, Norway. Due to the COVID-19 restrictions that are still active in some countries, the workshop was held as an hybrid event, combi...
Temporal information extraction (TIE) has attracted a great deal of interest over the last two decades, leading to the development of a significant number of datasets. Despite its benefits, having access to a large volume of corpora makes it difficult when it comes to benchmark TIE systems. On the one hand, different datasets have different annotat...
Narratives are present in many forms of human expression and can be understood as a fundamental way of communication between people. Computational understanding of the underlying story of a narrative, however, may be a rather complex task for both linguists and computational linguistics. Such task can be approached using natural language processing...
Reasoning about spatial information is fundamental in natural language to fully understand relationships between entities and/or between events. However, the complexity underlying such reasoning makes it hard to represent formally spatial information. Despite the growing interest on this topic, and the development of some frameworks, many problems...
Narrative extraction, understanding, verification, and visualization are currently popular topics for users interested in achieving a deeper understanding of text, researchers who want to develop accurate methods for text mining, and commercial companies that strive to provide efficient tools for that. Information Retrieval (IR), Natural Language P...
Social media platforms are used to discuss current events with very complex narratives that become difficult to understand. In this work, we introduce Tweet2Story, a web app to automatically extract narratives from small texts such as tweets and describe them through annotations. By doing this, we aim to mitigate the difficulties existing on creati...
The Fourth International Workshop on Narrative Extraction from Texts (Text2Story'21) was held on the April 1 st , 2021, in conjunction with the 43 rd European Conference on Information Retrieval (ECIR 2021). Due to the Covid-19 outbreak, the workshop was held online on Zoom platform. During the course of the day, an average of more than 80 attendee...
Despite significant advances in web archive infrastructures, the problem of exploring the historical heritage preserved by web archives is yet to be solved. Timeline generation emerges in this context as one possible solution for automatically producing summaries of news over time. Thanks to this, users can gain a better sense of reported news even...
Narrative extraction, understanding and visualization is currently a popular topic and an important tool for humans interested in achieving a deeper understanding of text. Information Retrieval (IR), Natural Language Processing (NLP) and Machine Learning (ML) already offer many instruments that aid the exploration of narrative elements in text and...
Many archival collections have been recently digitized and made available to a wide public. The contained documents however tend to have limited attractiveness for ordinary users, since content may appear obsolete and uninteresting. Archival document collections can become more attractive for users if suitable content can be recommended to them. Th...
The rise of social media and the explosion of digital news in the web sphere have created new challenges to extract knowledge and make sense of published information. Automated timeline generation appears in this context as a promising answer to help users dealing with this information overload problem. Formally, Timeline Summarization (TLS) can be...
Over the past few years, the amount of information generated, consumed and stored on the Web has grown exponentially, making it impossible for users to keep up to date. Temporal data representation can help in this process by giving documents a sense of organization. Timelines are a natural way to showcase this data, giving users the chance to get...
Event extraction (EE) is one of the core information extraction tasks, whose purpose is to automatically identify and extract information about incidents and their actors from texts. This may be beneficial to several domains such as knowledge base construction, question answering and summarization tasks, to name a few. The problem of extracting eve...
The Third International Workshop on Narrative Extraction from Texts (Text2Story'20 [https://text2story20.inesctec.pt/]) was held on the 14th of April 2020, in conjunction with the 42 nd European Conference on Information Retrieval (ECIR 2020). This year due to the Covid-19 outbreak the Text2Story workshop was held online on Zoom platform. During th...
ECIR 2020 ¹ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organising the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organisers and th...
ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the worksh...
The Third International Workshop on Narrative Extraction from Texts (Text2Story’20) [text2story20.inesctec.pt] held in conjunction with the 42 European Conference on Information Retrieval (ECIR 2020) gives researchers of IR, NLP and other fields, the opportunity to share their recent advances in extraction and formal representation of narratives. T...
Event extraction (EE) is one of the core information extraction tasks, whose purpose is to automatically identify and extract information about incidents and their actors from texts. This may be beneficial to several domains such as knowledge bases, question answering, information retrieval and summarization tasks, to name a few. The problem of ext...
Users tend to search over the Internet to get the most updated news when an event occurs. Search engines should then be capable of effectively retrieving relevant documents for event-related queries. As the previous studies have shown, different retrieval models are needed for different types of events. Therefore, the first step for improving effec...
As the amount of generated information grows, reading and summarizing texts of large collections turns into a challenging task. Many documents do not come with descriptive terms, thus requiring humans to generate keywords on-the-fly. The need to automate this kind of task demands the development of keyword extraction systems with the ability to aut...
Old documents tend to be difficult to be analyzed and understood, not only for average users but oftentimes for professionals as well. This is due to the context shift, vocabulary evolution and, in general, the lack of precise knowledge about the writing styles in the past. We propose a concept of positioning document in the context of its time, an...
The Second International Workshop on Narrative Extraction from Texts (Text2Story'19 [http://text2story19.inesctec.pt/]) was held on the 14th of April 2019, in conjunction with the 41 st European Conference on Information Retrieval (ECIR 2019) in Cologne, Germany. The workshop provided a platform for researchers in IR, NLP, and design and visualizat...
As video games are developing fast, many users issue queries related to video games in a daily fashion. While there were a few attempts to understand their behavior, little is known on how the video game-related searches are done. Digesting and analyzing this search behavior may thus be faced as an important contribution for search engines to provi...
In this demo, we present a tool that allows to automatically generate temporal summarization of news collections. Conta-me Histórias (Tell me stories) is a friendly user interface that enables users to explore and revisit events in the past. To select relevant stories and temporal periods, we rely on a key-phrase extraction algorithm developed by o...
Building upon the success of the first edition, we organize the second edition of the Text2Story Workshop on Narrative Extraction from Texts in conjunction with the 41st European Conference on Information Retrieval (ECIR 2019) on April 14, 2019. Our objective is to further consolidate the efforts of the community and reflect upon the progress made...
The 2nd workshop on User Interfaces for Spatial-Temporal Data Analysis (UISTDA2019)¹ took place in conjunction with the 24th Annual Meeting of the Intelligent Interfaces community (ACM IUI2019) in Los Angeles, USA on March 20, 2019. The goal of this workshop is to share latest progress and developments, current challenges and potential applications...
Human language constantly evolves due to the changing world and the need for easier forms of expression and communication. Our knowledge of language evolution is however still fragmentary despite significant interest of both researchers as well as wider public in the evolution of language. In this paper, we present an
interactive framework that pe...
The 1st International Workshop on Narrative Extraction from Texts (Text2Story 2018) was held in conjunction with the 40th European Conference on Information Retrieval, ECIR 2018, Grenoble on the 26 th March 2018. The workshop aimed to help foster the collaboration of researchers on a wide range of multidisciplinary issues related to the text-to-nar...
First prize of the Arquivo.pt 2018 award with the project "Conta-me Histórias" available at: http://contamehistorias.pt
More information available at:
http://sobre.arquivo.pt/en/arquivo-pt-2018-award-winners/
Over the last few years, an increasing number of user's and enterprises on the internet has generated a global marketplace for both employers and job seekers. Despite the fact that online job search is now more preferable than traditional methods - leading to better matches between the job seekers and the employer's intents - there is still little...
Web searches are done by users every day on a million-daily basis. Many of these web searches are related to events, social occasions that attracts society's attention. Events may happen multiple times on cyclic or non-periodic occasions. These are known as spiky events. When these events occur, multiple spikes can be observed in query logs trigger...
The development of information retrieval algorithms and temporal information retrieval ones has been extensively carried out over the last few years. While several studies have been conducted, most of these researches relate to English, leading to a lack of knowledge in several other important languages. This includes the Persian one. In this work,...
ECIR 2018 Best Short Paper Award for the paper entitled "A Text Feature Based Automatic Keyword Extraction Method for Single Documents"
The increasing availability of text information in the form of news articles, comments or posts poses new challenges for those who aim to understand the sto-ryline of an event. Although understanding natural language text has improved over the last couple of years with several research works emerging on the grounds of information extraction and tex...
Recently, many historical texts have become digitized and made accessible for search and browsing. Professionals who work with collections of such texts often need to verify the correctness of documents’ key metadata - their creation dates. In this paper, we demonstrate an interactive system for estimating the age of documents. It may be useful not...
Time has strong influence on web search. The temporal intent of the searcher adds an important dimension to the relevance judgments of web queries. However, lack of understanding their temporal requirements increases the ambiguity of the queries, turning retrieval effectiveness improvements into a complex task. In this paper, we propose an approach...
Many user information needs are strongly influenced by time. Some of these intents are expressed by users in queries issued indistinctively over time. Others follow a seasonal pattern. Examples of the latter are the queries “Golden Globe Award”, “September 11th” or “Halloween”, which refer to seasonal events that occur or have occurred at a specifi...
Automatic topic detection in document collections is an important tool for various tasks. In particular, it is valuable for studying and understanding socio-political phenomena. A currently relevant example is the automatic analysis of streams of posts issued by different activist groups in the current Brazilian turmoil, through the analysis of the...
The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more an...
The news industry has gone through seismic shifts in the past decade with digital content and social media completely redefining how people consume news. Readers check for accurate fresh news from multiple sources throughout the day using dedicated apps or social media on their smartphones and tablets. At the same time, news publishers rely more an...
The Special Issue of Information Processing and Management includes research papers on the intersection between time and information retrieval. In 'Evaluating Document Filtering Systems over Time', Tom Kenter and Krisztian Balog propose a time-aware way of measuring a system's performance at filtering documents. Manika Kar, SeAa7acute;rgio Nunes an...
Temporal information retrieval has been a topic of great interest in recent years. Despite the efforts that have been conducted so far, most popular search engines remain underdeveloped when it comes to explicitly considering the use of temporal information in their search process. In this paper we present GTE-Rank, an online searching tool that ta...
In this paper, we present GTE-Cluster an online temporal search interface which consistently allows searching for topics in a temporal perspective by clustering relevant temporal Web search results. GTE-Cluster is designed to improve user experience by augmenting document relevance with temporal relevance. The rationale is that offering the user a...
Temporal information retrieval has been a topic of great interest in recent years. Despite the efforts that have been conducted so far, most popular search engines remain underdeveloped when it comes to explicitly considering the use of temporal information in their search process. In this paper we present GTE-Rank, an online searching tool that ta...
How can we run large-scale, community-wide evaluations of information retrieval systems if we lack the ability to distribute the document collection on which the task is based? This was the challenge we faced in the TREC Microblog tracks over the past ...
With the growing popularity of research in Temporal Information Retrieval (T-IR), a large amount of temporal data is ready to be exploited. The ability to exploit this information can be potentially useful for several tasks. For example, when querying "Football World Cup Germany", it would be interesting to have two separate clusters {1974,2006} co...