Bruno Martins

Instituto Técnico y Cultural, Santa Clara de Portugal, Michoacán, Mexico

Are you Bruno Martins?

Claim your profile

Publications (59)0.77 Total impact

  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When developing a conversational agent, there is often an urgent need to have a prototype available in order to test the application with real users. A Wizard of Oz is a possibility, but sometimes the agent should be simply deployed in the environment where it will be used. Here, the agent should be able to capture as many interactions as possible and to understand how people react to failure. In this paper, we focus on the rapid development of a natural language understanding module by non experts. Our approach follows the learning paradigm and sees the process of understanding natural language as a classification problem. We test our module with a conversational agent that answers questions in the art domain. Moreover, we show how our approach can be used by a natural language interface to a cinema database.
    02/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The task of expert finding has been getting increasing attention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining different sources of evidence in an optimal way. This paper explores the usage of learning to rank methods as a principled approach for combining multiple estimators of expertise, derived from the textual contents, from the graph-structure with the citation patterns for the community of experts, and from profile information about the experts. Experiments made over a dataset of academic publications, for the area of Computer Science, attest for the adequacy of the proposed approaches.
    02/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts, and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combing all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.
    Expert Systems 01/2013; (in press). · 0.77 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Reading is an important activity for individuals. Content-based recommendation systems are, typically, used to recommend scientific papers or news, where search is driven by topic. Literary reading or reading for leisure differs from scientific reading, because users search books not only for their topic but also by author or writing style. Choosing a new book to read can be tricky and recommendation systems can make it easy by selecting books that the user will like. In this paper we study recommendation through writing style and the influence of negative examples in user preferences. Our experiments were conducted in a hybrid set-up that combines a collaborative filtering algorithm with stylometric relevance feedback. Using the LitRec data set, we demonstrate that writing style influences book selection; that book content, characterized with writing style, can be used to improve collaborative filtering results; and that negative examples do not improve final predictions.
    Proceedings of the fifth ACM workshop on Research advances in large digital book repositories and complementary media; 10/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Literary reading is an important activity for individuals and can be a long term commitment, making book choice an important task for book lovers and public library users. In this paper, we present a hybrid recommendation system to help readers decide which book to read next. We study book and author recommendations in a hybrid recommendation setting and test our algorithm on the LitRec data set. Our hybrid method combines two item-based collaborative filtering algorithms to predict books and authors that the user will like. Author predictions are expanded into a booklist that is subsequently aggregated with the former book predictions. Finally, the resulting booklist is used to yield the top-n book recommendations. By means of various experiments, we demonstrate that author recommendation can improve overall book recommendation.
    JCDL 2012; 06/2012
  • André Nunes, Pável Calado, Bruno Martins
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an approach for resolving user identifiers in the context of social networks, using techniques from the area of duplicate record detection [1]. We reduce the user identity resolution problem into a binary classification task, where the goal is to classify pairs of identifiers as either belonging to the same person or not. The pairs are represented as feature vectors that combine multiple sources of similarity (e.g. similarity between profile information, descriptions of people's interests, and people's friend lists). We report on a thorough evaluation of different machine learning algorithms and different feature sets, concluding that user identities can be resolved with high accuracy.
    03/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Literary reading is an important activity for individuals and choosing to read a book can be a long time commitment, making book choice an important task for book lovers and public library users. In this paper we present an hybrid recommendation system to help readers decide which book to read next. We study book and author recommendation in an hybrid recommendation setting and test our approach in the LitRec data set. Our hybrid book recommendation approach purposed combines two item-based collaborative filtering algorithms to predict books and authors that the user will like. Author predictions are expanded in to a book list that is subsequently aggregated with the former list generated through the initial collaborative recommender. Finally, the resulting book list is used to yield the top-n book recommendations. By means of various experiments, we demonstrate that author recommendation can improve overall book recommendation.
    03/2012;
  • Bruno Martins
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel approach for detecting duplicate records in the context of digital gazetteers, using state-of-the-art machine learning techniques. It reports a thorough evaluation of alternative machine learning approaches designed for the task of classifying pairs of gazetteer records as either duplicates or not, built by using support vector machines or alternating decision trees with different combinations of similarity scores for the feature vectors. Experimental results show that using feature vectors that combine multiple similarity scores, derived from place names, semantic relationships, place types and geospatial footprints, leads to an increase in accuracy. The paper also discusses how the proposed duplicate detection approach can scale to large collections, through the usage of filtering or blocking techniques.
    GeoSpatial Semantics - 4th International Conference, GeoS 2011, Brest, France, May 12-13, 2011. Proceedings; 01/2011
  • Vitor Loureiro, Ivo Anastácio, Bruno Martins
    [Show abstract] [Hide abstract]
    ABSTRACT: Geo-temporal information is pervasive over textual documents, since most of them contain references to particular locations, calendar dates, clock times or duration periods. An important text analytics problem is therefore related to resolving the place names and the temporal expressions referenced in the texts, i.e. linking the character strings in the documents that correspond to either locations or temporal instances, to the specific geospatial coordinates or the time intervals that they refer to. However, geo-temporal reference resolution presents several non-trivial problems to the area of text mining, due to the inherent ambiguity and contextual assumptions of natural language discourse.
    19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, November 1-4, 2011, Chicago, IL, USA, Proceedings; 01/2011
  • Rui Candeias, Bruno Martins
    [Show abstract] [Hide abstract]
    ABSTRACT: The association of illustrative photos to textual contents is a challenging cross-media retrieval problem with many practical applications. We have, for instance, that the association of photos to specific parts of travelogues, i.e. textual descriptions for travel experiences, may lead to a better usage of these documents. Despite the huge number of high quality photos in websites like Flickr, these photos are currently not being properly explored in cross-media retrieval tasks.
    19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, November 1-4, 2011, Chicago, IL, USA, Proceedings; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The task of expert finding has been getting increasing at-tention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining differ-ent sources of evidence. This paper explores the usage of unsupervised rank aggregation methods as a principled approach for combining mul-tiple estimators of expertise, derived from the textual contents, from the graph-structure of the citation patterns for the community of experts, and from profile information about the experts. We specifically exper-imented two unsupervised rank aggregation approaches well known in the information retrieval literature, namely CombSUM and CombMNZ. Experiments made over a dataset of academic publications for the area of Computer Science attest for the adequacy of these methods.
    01/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an approach for performing recognition and resolution of place names mentioned over the descriptive metadata records of typical digital libraries. Our approach exploits evidence provided by the existing structured attributes within the metadata records to support the place name recognition and resolution, in order to achieve better results than by just using lexical evidence from the textual values of these attributes. In metadata records, lexical evidence is very often insufficient for this task, since short sentences and simple expressions are predominant. Our implementation uses a dictionary based technique for recognition of place names (with names provided by Geonames), and machine learning for reasoning on the evidences and choosing a possible resolution candidate. The evaluation of our approach was performed in data sets with a metadata schema rich in Dublin Core elements. Two evaluation methods were used. First, we used cross-validation, which showed that our solution is able to achieve a very high precision of 0,99 at 0,55 recall, or a recall of 0,79 at 0,86 precision. Second, we used a comparative evaluation with an existing commercial service, where our solution performed better on any confidence level (p
    Proceedings of the 2011 Joint International Conference on Digital Libraries, JCDL 2011, Ottawa, ON, Canada, June 13-17, 2011; 01/2011
  • Bruno Martins, Ivo Anastácio, Pável Calado
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a machine learning method for resolving place references in text, i.e. linking character strings in documents to locations on the surface of the Earth. This is a fundamental task in the area of Geographic Information Retrieval, supporting access through geography to large document collections. The proposed method is an instance of stacked learning, in which a first learner based on a Hidden Markov Model is used to annotate place references, and then a second learner implementing a regression through a Support Vector Machine is used to rank the possible disabiguations for the references that were initially annotated. The proposed method was evaluated through gold-standard document collections in three different languages, having place references annotated by humans. Results show that the proposed method compares favorably against commercial state-of-the-art systems such as the Metacarta geo-tagger and Yahoo! Placemaker.
    07/2010: pages 221-236;
  • Source
    Bruno Martins, Pável Calado
    [Show abstract] [Hide abstract]
    ABSTRACT: The task of Learning to Rank is currently getting increasing attention, providind a sound methodology for combining different sources of evidence. The goal is to design and apply machine learning methods to automatically learn a function from training data that can sort documents according to their relevance. Geographic information retrieval has also emerged as an active and growing research area, addressing the retrieval of textual documents according to geographic criteria of relevance. In this paper, we explore the usage of a learning to rank approach for geographic information retrieval, leveraging on the datasets made available in the context of the previous GeoCLEF evaluation campaigns. The idea is to combine different metrics of textual and geographic similarity into a single ranking function, through the use of the SV Mmap framework. Experimental results show that the proposed approach can outperform baselines based on heuristic combinations of features.
    Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR 2010, Zurich, Switzerland, February 18-19, 2010; 01/2010
  • Source
    Ivo Anastácio, Bruno Martins, Pável Calado
    [Show abstract] [Hide abstract]
    ABSTRACT: Geotargeting is a specialization of contextual advertising where the objective is to target ads to Website visitors concentrated in well-defined areas. Current approaches involve targeting ads based on the physical location of the visitors, estimated through their IP addresses. However, there are many situations where it would be more interesting to target ads based on the geographic scope of the target pages, i.e., on the general area implied by the locations mentioned in the textual contents of the pages. Our proposal applies techniques from the area of geographic information retrieval to the problem of geotargeting. We address the task through a pipeline of processing stages, which involves (i) determining the geographic scope of target pages, (ii) classifying target pages according to locational relevance, and (iii) retrieving ads relevant to the target page, using both textual contents and geographic scopes. Experimental results attest for the adequacy of the proposed methods in each of the individual processing stages.
    Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR 2010, Zurich, Switzerland, February 18-19, 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a universal solution to w eb search and web browsing on handheld devices for visually impaired people. For this purpose, we propose (1) t o automatically cluster web page results and (2) to s ummarize all the information in web pages so that speech-to- speech interaction is used efficiently to access informati on.
    Proceedings of the Twenty-Second International Florida Artificial Intelligence Research Society Conference, May 19-21, 2009, Sanibel Island, Florida, USA; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DIGMAP is a digital library specialized in searching and browsing services for old maps and related resources. The service reuses metadata from national libraries and other relevant third party metadata sources, providing added value services by aggregating all the data in comprehensive collections, browsing indexes and search functions. The services are based in a set of specialized tools, compris-ing namely a catalogue, an image's feature indexer, a metadata repository, a geographic gazetteer and a geo-parser. The extraction of relevant visual features from images of digitized maps is another focus of the project. The architecture and the technology give it also the ability to easily interoperate with other complementary external services.
    01/2009; 4:1-8.
  • Source
    Ivo Anastácio, Bruno Martins, Pável Calado
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an approach for categorizing documents according to their implicit locational relevance. We report a thorough evaluation of several classifiers designed for this task, built by using support vector machines with multiple alternatives for feature vectors. Experimental results show that using feature vectors that combine document terms and URL n-grams, with simple features related to the locality of the document (e.g. total count of place references) leads to high accuracy values. The paper also discusses how the proposed categorization approach can be used to help improve tasks such as document retrieval or online contextual advertisement.
    Progress in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 12-15, 2009. Proceedings; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track, where we measured the retrieval performance of LGTE, an index engine for Geo-Temporal collection which is mostly based on Lucene, together with extensions for query expansion and mu ltinomial language modelling. We experiment an N-Gram stemming model to improve our last year experiments which consisted in combinations of query expansion, Lucene's off-the-shelf ranking scheme and the ranki ng scheme based on multinomial language modeling. The N-Gram stemming model was based in a linear combination of N-Gram, with n between 2 and 5, using weight factors obtained by learning from last year topics and asse ssments. The rochio ranking function was also adapted to implement this N-Gram model. Results show that this stemming technique together with query expansi on and multinomial language modeling both result in increased performa nce.
    Multilingual Information Access Evaluation I. Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2, 2009, Revised Selected Papers; 01/2009

Publication Stats

443 Citations
0.77 Total Impact Points

Top Journals

Institutions

  • 2009–2013
    • Instituto Técnico y Cultural
      Santa Clara de Portugal, Michoacán, Mexico
    • Universidade da Beira Interior
      Ковильян, Castelo Branco, Portugal
  • 2012
    • Inesc-ID
      Lisboa, Lisbon, Portugal
  • 2008–2009
    • Technical University of Lisbon
      • Departamento de Engenharia Informática (DEI)
      Lisbon, Lisbon, Portugal
  • 2007
    • Instituto Superior de Contabilidade e Administração de Lisboa
      Lisboa, Lisbon, Portugal
  • 2004–2005
    • University of Lisbon
      • • Departamento de Informática
      • • Faculdade de Ciências
      Lisbon, Lisbon, Portugal
    • Faculdade Campo Grande
      Campo Grande, Estado de Mato Grosso do Sul, Brazil