Conference Proceeding

MIRACLE at Ad-Hoc CLEF 2005: Merging and Combining Without Using a Single Approach.

01/2005; DOI:10.1007/11878773_4 In proceeding of: Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers
Source: DBLP

ABSTRACT This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments
this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving
our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point
was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance
feedback. Some of these basic components were used in different combinations and order of application for document indexing
and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents
retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging
process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both
cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.

0 0
 · 
0 Bookmarks
 · 
38 Views
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. The first one is the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, if there is any. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information.
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of these baseline runs. The improvements introduced for this campaign have been a few ones: we have used an entity recognition and indexing prototype tool into our tokenizing scheme, and we have run more combining experiments for the robust multilingual case than in previous campaigns. However, no significative improvements have been achieved. For the this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, French, Hungarian, and Portuguese. - Bilingual: English to Bulgarian, French, Hungarian, and Portuguese; Spanish to French and Portuguese; and French to Portuguese. - Robust monolingual: German, English, Spanish, French, Italian, and Dutch. - Robust bilingual: English to German, Italian to Spanish, and French to Dutch. - Robust multilingual: English to robust monolingual languages. We still need to work harder to improve some aspects of our processing scheme, being the most important, to our knowledge, the entities recognition and normalization.
    01/2006;
  • [show abstract] [hide abstract]
    ABSTRACT: This paper describes MIRACLE approach to WebCLEF. A set of independent indexes was constructed for each top level domain of the EuroGOV collection. Each index contains information extracted from the document, like URL, title, keywords, detected named entities or HTML headers. These indexes are queried to obtain partial document rankings, which are combined with various relative weights to test the value of each index. The final aim is to identify which index (or combination of them) is more relevant for a retrieval task, avoiding the construction of a full-text index.
    Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers; 01/2005

Full-text (2 Sources)

View
0 Downloads
Available from
Feb 11, 2014