[show abstract][hide abstract] ABSTRACT: When developing a conversational agent, there is often an urgent need to have
a prototype available in order to test the application with real users. A
Wizard of Oz is a possibility, but sometimes the agent should be simply
deployed in the environment where it will be used. Here, the agent should be
able to capture as many interactions as possible and to understand how people
react to failure. In this paper, we focus on the rapid development of a natural
language understanding module by non experts. Our approach follows the learning
paradigm and sees the process of understanding natural language as a
classification problem. We test our module with a conversational agent that
answers questions in the art domain. Moreover, we show how our approach can be
used by a natural language interface to a cinema database.
[show abstract][hide abstract] ABSTRACT: The task of expert finding has been getting increasing attention in
information retrieval literature. However, the current state-of-the-art is
still lacking in principled approaches for combining different sources of
evidence in an optimal way. This paper explores the usage of learning to rank
methods as a principled approach for combining multiple estimators of
expertise, derived from the textual contents, from the graph-structure with the
citation patterns for the community of experts, and from profile information
about the experts. Experiments made over a dataset of academic publications,
for the area of Computer Science, attest for the adequacy of the proposed
[show abstract][hide abstract] ABSTRACT: Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts, and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combing all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.
Expert Systems 01/2013; (in press). · 0.77 Impact Factor
[show abstract][hide abstract] ABSTRACT: Literary reading is an important activity for individuals and can be a long term commitment, making book choice an important task for book lovers and public library users. In this paper, we present a hybrid recommendation system to help readers decide which book to read next. We study book and author recommendations in a hybrid recommendation setting and test our algorithm on the LitRec data set. Our hybrid method combines two item-based collaborative filtering algorithms to predict books and authors that the user will like. Author predictions are expanded into a booklist that is subsequently aggregated with the former book predictions. Finally, the resulting booklist is used to yield the top-n book recommendations. By means of various experiments, we demonstrate that author recommendation can improve overall book recommendation.
[show abstract][hide abstract] ABSTRACT: Literary reading is an important activity for individuals and choosing to
read a book can be a long time commitment, making book choice an important task
for book lovers and public library users. In this paper we present an hybrid
recommendation system to help readers decide which book to read next. We study
book and author recommendation in an hybrid recommendation setting and test our
approach in the LitRec data set. Our hybrid book recommendation approach
purposed combines two item-based collaborative filtering algorithms to predict
books and authors that the user will like. Author predictions are expanded in
to a book list that is subsequently aggregated with the former list generated
through the initial collaborative recommender. Finally, the resulting book list
is used to yield the top-n book recommendations. By means of various
experiments, we demonstrate that author recommendation can improve overall book
[show abstract][hide abstract] ABSTRACT: The task of expert finding has been getting increasing at-tention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining differ-ent sources of evidence. This paper explores the usage of unsupervised rank aggregation methods as a principled approach for combining mul-tiple estimators of expertise, derived from the textual contents, from the graph-structure of the citation patterns for the community of experts, and from profile information about the experts. We specifically exper-imented two unsupervised rank aggregation approaches well known in the information retrieval literature, namely CombSUM and CombMNZ. Experiments made over a dataset of academic publications for the area of Computer Science attest for the adequacy of these methods.
[show abstract][hide abstract] ABSTRACT: This paper describes an approach for performing recognition and resolution of place names mentioned over the descriptive metadata records of typical digital libraries. Our approach exploits evidence provided by the existing structured attributes within the metadata records to support the place name recognition and resolution, in order to achieve better results than by just using lexical evidence from the textual values of these attributes. In metadata records, lexical evidence is very often insufficient for this task, since short sentences and simple expressions are predominant. Our implementation uses a dictionary based technique for recognition of place names (with names provided by Geonames), and machine learning for reasoning on the evidences and choosing a possible resolution candidate. The evaluation of our approach was performed in data sets with a metadata schema rich in Dublin Core elements. Two evaluation methods were used. First, we used cross-validation, which showed that our solution is able to achieve a very high precision of 0,99 at 0,55 recall, or a recall of 0,79 at 0,86 precision. Second, we used a comparative evaluation with an existing commercial service, where our solution performed better on any confidence level (p
Proceedings of the 2011 Joint International Conference on Digital Libraries, JCDL 2011, Ottawa, ON, Canada, June 13-17, 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: This paper presents a novel approach for detecting duplicate records in the context of digital gazetteers, using state-of-the-art machine learning techniques. It reports a thorough evaluation of alternative machine learning approaches designed for the task of classifying pairs of gazetteer records as either duplicates or not, built by using support vector machines or alternating decision trees with different combinations of similarity scores for the feature vectors. Experimental results show that using feature vectors that combine multiple similarity scores, derived from place names, semantic relationships, place types and geospatial footprints, leads to an increase in accuracy. The paper also discusses how the proposed duplicate detection approach can scale to large collections, through the usage of filtering or blocking techniques.
GeoSpatial Semantics - 4th International Conference, GeoS 2011, Brest, France, May 12-13, 2011. Proceedings; 01/2011
[show abstract][hide abstract] ABSTRACT: Geo-temporal information is pervasive over textual documents, since most of them contain references to particular locations, calendar dates, clock times or duration periods. An important text analytics problem is therefore related to resolving the place names and the temporal expressions referenced in the texts, i.e. linking the character strings in the documents that correspond to either locations or temporal instances, to the specific geospatial coordinates or the time intervals that they refer to. However, geo-temporal reference resolution presents several non-trivial problems to the area of text mining, due to the inherent ambiguity and contextual assumptions of natural language discourse.
19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, November 1-4, 2011, Chicago, IL, USA, Proceedings; 01/2011
[show abstract][hide abstract] ABSTRACT: This paper presents a machine learning method for resolving place references in text, i.e. linking character strings in documents
to locations on the surface of the Earth. This is a fundamental task in the area of Geographic Information Retrieval, supporting
access through geography to large document collections. The proposed method is an instance of stacked learning, in which a
first learner based on a Hidden Markov Model is used to annotate place references, and then a second learner implementing
a regression through a Support Vector Machine is used to rank the possible disabiguations for the references that were initially
annotated. The proposed method was evaluated through gold-standard document collections in three different languages, having
place references annotated by humans. Results show that the proposed method compares favorably against commercial state-of-the-art
systems such as the Metacarta geo-tagger and Yahoo! Placemaker.
[show abstract][hide abstract] ABSTRACT: The task of Learning to Rank is currently getting increasing attention, providind a sound methodology for combining different sources of evidence. The goal is to design and apply machine learning methods to automatically learn a function from training data that can sort documents according to their relevance. Geographic information retrieval has also emerged as an active and growing research area, addressing the retrieval of textual documents according to geographic criteria of relevance. In this paper, we explore the usage of a learning to rank approach for geographic information retrieval, leveraging on the datasets made available in the context of the previous GeoCLEF evaluation campaigns. The idea is to combine different metrics of textual and geographic similarity into a single ranking function, through the use of the SV Mmap framework. Experimental results show that the proposed approach can outperform baselines based on heuristic combinations of features.
Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR 2010, Zurich, Switzerland, February 18-19, 2010; 01/2010
[show abstract][hide abstract] ABSTRACT: This paper presents an approach for categorizing documents according to their implicit locational relevance. We report a thorough
evaluation of several classifiers designed for this task, built by using support vector machines with multiple alternatives
for feature vectors. Experimental results show that using feature vectors that combine document terms and URL n-grams, with
simple features related to the locality of the document (e.g. total count of place references) leads to high accuracy values.
The paper also discusses how the proposed categorization approach can be used to help improve tasks such as document retrieval
or online contextual advertisement.
Progress in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 12-15, 2009. Proceedings; 01/2009
[show abstract][hide abstract] ABSTRACT: This demo presents a user interface for a Geo-Temporal search service built in the sequence of DIGMAP project. DIGMAP was
a co-funded European Union project on old digitized maps and deals with resources rich in geographic and temporal information.
This search interface followed a mashup approach using existing DIGMAP components: a metadata repository, a text mining tool,
a Gazetteer, and a service to generate geographic contextual thumbnails. Google Maps API is used to provide a friendly and
interactive user interface. This demo will present the resulting geo-temporal search engine functionalities, whose interface
uses WEB 2.0 capabilities to provide contextualization in time and space and text clustering.
Research and Advanced Technology for Digital Libraries, 13th European Conference, ECDL 2009, Corfu, Greece, September 27 - October 2, 2009. Proceedings; 01/2009
Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments, 5th International Conference, UAHCI 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July 19-24, 2009. Proceedings, Part II; 01/2009
[show abstract][hide abstract] ABSTRACT: DIGMAP is a digital library specialized in searching and browsing services for old maps and related resources. The service reuses metadata from national libraries and other relevant third party metadata sources, providing added value services by aggregating all the data in comprehensive collections, browsing indexes and search functions. The services are based in a set of specialized tools, compris-ing namely a catalogue, an image's feature indexer, a metadata repository, a geographic gazetteer and a geo-parser. The extraction of relevant visual features from images of digitized maps is another focus of the project. The architecture and the technology give it also the ability to easily interoperate with other complementary external services.
[show abstract][hide abstract] ABSTRACT: We describe our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track, where we measured the retrieval performance of LGTE, an index engine for Geo-Temporal collection which is mostly based on Lucene, together with extensions for query expansion and mu ltinomial language modelling. We experiment an N-Gram stemming model to improve our last year experiments which consisted in combinations of query expansion, Lucene's off-the-shelf ranking scheme and the ranki ng scheme based on multinomial language modeling. The N-Gram stemming model was based in a linear combination of N-Gram, with n between 2 and 5, using weight factors obtained by learning from last year topics and asse ssments. The rochio ranking function was also adapted to implement this N-Gram model. Results show that this stemming technique together with query expansi on and multinomial language modeling both result in increased performa nce.
Multilingual Information Access Evaluation I. Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2, 2009, Revised Selected Papers; 01/2009
[show abstract][hide abstract] ABSTRACT: In this paper, we propose a universal solution to w eb search and web browsing on handheld devices for visually impaired people. For this purpose, we propose (1) t o automatically cluster web page results and (2) to s ummarize all the information in web pages so that speech-to- speech interaction is used efficiently to access informati on.
Proceedings of the Twenty-Second International Florida Artificial Intelligence Research Society Conference, May 19-21, 2009, Sanibel Island, Florida, USA; 01/2009
[show abstract][hide abstract] ABSTRACT: DIGMAP aims to become the main international resource discovery service for digitized old maps existing in libraries. The service reuses metadata from European national libraries and other relevant third party metadata sources. The gathered metadata is enhanced locally with geographical indexing and with record linking/clustering, leveraging on geographic gazetteers and authority files. When available, the images of the maps are also processed to extract potentially relevant features. This made it possible to develop a rich integrated environment for searching and browsing services with four perspectives: image's features, textual, geographic and temporal information.
ACM/IEEE Joint Conference on Digital Libraries, JCDL 2008, Pittsburgh, PA, USA, June 16-20, 2008; 01/2008