MIRACLE at Ad-Hoc CLEF 2005: Merging and Combining Without Using a Single Approach.
ABSTRACT This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments
this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving
our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point
was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance
feedback. Some of these basic components were used in different combinations and order of application for document indexing
and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents
retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging
process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both
cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.
- SourceAvailable from: César de Pablo-Sánchez[Show abstract] [Hide abstract]
ABSTRACT: Los Sistemas de Búsqueda de Respuestas (SBR) amplían las capacidades de un buscador de información tradicional con la capacidad de encontrar respuestas precisas a las preguntas del usuario. El objetivo principal es facilitar el acceso a la información y disminuir el tiempo y el esfuerzo que el usuario debe emplear para encontrar una información concreta en una lista de documentos relevantes. En esta investigación se han abordado dos trabajos relacionados con los SBR. La primera parte presenta una arquitectura para SBR en castellano basada en la combinación y adaptación de diferentes técnicas de Recuperación y de Extracción de Información. Esta arquitectura está integrada por tres módulos principales que incluyen el análisis de la pregunta, la recuperación de pasajes relevantes y la extracción y selección de respuestas. En ella se ha prestado especial atención al tratamiento de las Entidades Nombradas puesto que, con frecuencia, son el tema de las preguntas o son buenas candidatas como respuestas. La propuesta se ha encarnado en el SBR del grupo MIRACLE que ha sido evaluado de forma independiente durante varias ediciones en la tarea compartida CLEF@QA, parte del foro de evaluación competitiva Cross-Language Evaluation Forum (CLEF). Se describen aquí las participaciones y los resultados obtenidos entre 2004 y 2007. El SBR de MIRACLE ha obtenido resultados moderados en el desempeño de la tarea con tasas de respuestas correctas entre el 20% y el 30%. Entre los resultados obtenidos destacan los de la tarea principal de 2005 y la tarea piloto de Búsqueda de Respuestas en tiempo real de 2006, RealTimeQA. Esta última tarea, además de requerir respuestas correctas incluía el tiempo de respuesta como un factor adicional en la evaluación. Estos resultados respaldan la validez de la arquitectura propuesta como una alternativa viable para los SBR sobre colecciones textuales y también corrobora resultados similares para el inglés y otras lenguas. Por otro lado, el análisis de los resultados a lo largo de las diferentes ediciones de CLEF así como la comparación con otros SBR apunta nuevos problemas y retos. Según nuestra experiencia, los sistemas de QA son más complicados de adaptar a otros dominios y lenguas que los sistemas de Recuperación de Información. Este problema viene heredado del uso de herramientas complejas de análisis de lenguaje como analizadores morfológicos, sintácticos y semánticos. Entre estos últimos se cuentan las herramientas para el Reconocimiento y Clasificación de Entidades Nombradas (NERC en inglés) así como para la Detección y Clasificación de Relaciones (RDC en inglés). Debido a la di cultad de adaptación del SBR a distintos dominios y colecciones, en la segunda parte de esta tesis se investiga una propuesta diferente basada en la adquisición de conocimiento mediante métodos de aprendizaje ligeramente supervisado. El objetivo de esta investigación es adquirir recursos semánticos útiles para las tareas de NERC y RDC usando colecciones de textos no anotados. Además, se trata de eliminar la dependencia de herramientas de análisis lingüístico con el fin de facilitar que las técnicas sean portables a diferentes dominios e idiomas. En primer lugar, se ha realizado un estudio de diferentes algoritmos para NERC y RDC de forma semisupervisada a partir de unos pocos ejemplos (bootstrapping). Este trabajo propone primero una arquitectura común y compara diferentes funciones que se han usado en la evaluación y selección de resultados intermedios, tanto instancias como patrones. La principal propuesta es un nuevo algoritmo que permite la adquisición simultánea e iterativa de instancias y patrones asociados a una relación. Incluye también la posibilidad de adquirir varias relaciones de forma simultánea y mediante el uso de la hipótesis de exclusividad obtener mejores resultados. Como característica distintiva el algoritmo explora la colección de textos con una estrategia basada en indización, que permite adquirir conocimiento de grandes colecciones. La estrategia de selección de candidatos y la evaluación se basan en la construcción de un grafo de instancias y patrones, que justifica nuestro método para la selección de candidatos. Este procedimiento es semejante al frente de exploración de una araña web y permite encontrar las instancias más parecidas a las semillas con las evidencias disponibles. Este algoritmo se ha implementado en el sistema SPINDEL y para su evaluación se ha comenzado con el caso concreto de la adquisición de recursos para las clases de Entidades Nombradas más comunes, Persona, Lugar y Organización. El objetivo es adquirir nombres asociados a cada una de las categorías así como patrones contextuales que permitan detectar menciones asociadas a una clase. Se presentan resultados para la adquisición de dos idiomas distintos, castellano e inglés, y para el castellano, en dos dominios diferentes, noticias y textos de una enciclopedia colaborativa, Wikipedia. En ambos casos el uso de herramientas de análisis lingüístico se ha limitado de acuerdo con el objetivo de avanzar hacia la independencia de idioma. Las listas adquiridas mediante bootstrapping parten de menos de 40 semillas por clase y obtienen del orden de 30.000 instancias de calidad variable. Además se obtienen listas de patrones indicativos asociados a cada clase de entidad. La evaluación indirecta confirma la utilidad de ambos recursos en la clasificación de Entidades Nombradas usando un enfoque simple basado únicamente en diccionarios. La mejor configuración obtiene para la clasificación en castellano una medida F de 67,17 y para inglés de 55,99. Además se confirma la utilidad de los patrones adquiridos que en ambos casos ayudan a mejorar la cobertura. El módulo requiere menor esfuerzo de desarrollo que los enfoques supervisados, si incluimos la necesidad de anotación, aunque su rendimiento es inferior por el momento. En definitiva, esta investigación constituye un primer paso hacia el desarrollo de aplicaciones semánticas como los SBR que requieran menos esfuerzo de adaptación a un dominio o lenguaje nuevo.------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Question Answering (QA) systems add new capabilities to traditional search engines with the ability to find precise answers to user questions. Their objective is to enable easier information access by reducing the time and effort that the user requires to find a concrete information among a list of relevant documents. In this thesis we have carried out two works related with QA systems. The first part introduces an architecture for QA systems for Spanish which is based on the combination and adaptation of different techniques from Information Retrieval (IR) and Information Extraction (IE). This architecture is composed by three modules that include question analysis, relevant passage retrieval and answer extraction and selection. The appropriate processing of Named Entities (NE) has received special attention because of their importance as question themes and candidate answers. The proposed architecture has been implemented as part of the MIRACLE QA system. This system has taken part in independent evaluations like the CLEF@QA track in the Cross-Language Evaluation Forum (CLEF). Results from 2004 to 2007 campaigns as well as the details and the evolution of the system have been described in deep. The MIRACLE QA system has obtained moderate performance with a first answer accuracy ranging between 20% and 30%. Nevertheless, it is important to highlight the results obtained in the 2005 main QA task and the RealTimeQA pilot task in 2006. The last one included response time as an important additional variable of the evaluation. These results back the proposed architecture as an option for QA from textual collection and confirm similar findings obtained for English and other languages. On the other hand, the analysis of the results along evaluation campaigns and the comparison with other QA systems point problems with current systems and new challenges. According to our experience, it is more dificult to tailor QA systems to different domains and languages than IR systems. The problem is inherited by the use of complex language analysis tools like POS taggers, parsers and other semantic analyzers, like NE Recognition and Classification (NERC) and Relation Detection and Characterization (RDC) tools. The second part of this thesis tackles this problem and proposes a different approach to adapting QA systems for di erent languages and collections. The proposal focuses on acquiring knowledge for the semantic analyzers based on lightly supervised approaches. The goal is to obtain useful resources that help to perform NERC or RDC using as few annotated resources as possible. Besides, we try to avoid dependencies from other language analysis tools with the purpose that these methods apply to different languages and domains. First of all, we have study previous work on building NERC and RDC modules with few supervision, particularly bootstrapping methods. We propose a common framework for different bootstrapping systems that help to unify different evaluation functions for intermediate results. The main proposal is a new algorithm that is able to simultaneously acquire instances and patterns associated to a relation of interest. It also uses mutual exclusion among relations to reduce concept drift and achieve better results. A distinctive characteristic is that it uses a query based exploration strategy of the text collection which enables their use for larger collections. Candidate selection and evaluation are based on incrementally building a graph of instances and patterns which also justifies our evaluation function. The discovery approach is analogous to the front of exploration in a web crawler and it is able to find the most similar instances to the available seeds. This algorithm has been implemented in the SPINDEL system. We have selected for evaluation the task of acquiring resources for the most common NE classes, Person, Location and Organization. The objective is to acquire name instances that belong to any of the classes as well as contextual patterns that help to detect mentions of NE that belong to that class. We present results for the acquisition of resources from raw text from two different languages, Spanish and English. We also performed experiments for Spanish in two different collections, news and texts from a collaborative encyclopedia, Wikipedia. Both cases are tackled with limited language analysis tools and resources. With an initial list of 40 instance seeds, the bootstrapping process is able to acquire large name lists containing up to 30.000 instances with a variable quality. Besides, large lists of indicative patterns are obtained too. Our indirect evaluation confirms the utility of both resources to classify NE using a simple dictionary recognition approach. Best results for Spanish obtained a F-score of 67,17 and for English this value is 55,99. The module requires much less development effort than annotation for supervised algorithms although the performance is not in pair yet. This research is a first step towards the development of semantic applications like QA for a new language or domain with no annotated corpora that requires less adaptation effort.
- [Show abstract] [Hide abstract]
ABSTRACT: This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. The first one is the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, if there is any. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information.05/2008;
- [Show abstract] [Hide abstract]
ABSTRACT: This paper describes MIRACLE approach to WebCLEF. A set of independent indexes was constructed for each top level domain of the EuroGOV collection. Each index contains information extracted from the document, like URL, title, keywords, detected named entities or HTML headers. These indexes are queried to obtain partial document rankings, which are combined with various relative weights to test the value of each index. The final aim is to identify which index (or combination of them) is more relevant for a retrieval task, avoiding the construction of a full-text index.Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers; 01/2005
C. Peters et al. (Eds.): CLEF 2005, LNCS 4022, pp. 44 – 53, 2006.
© Springer-Verlag Berlin Heidelberg 2006
MIRACLE at Ad-Hoc CLEF 2005: Merging and
Combining Without Using a Single Approach
José M. Goñi-Menoyo1, José C. González-Cristóbal1,3, and Julio Villena-Román2,3
1 Universidad Politécnica de Madrid
2 Universidad Carlos III de Madrid
3 DAEDALUS - Data, Decisions and Language, S.A.
Abstract. This paper presents the 2005 Miracle’s team approach to the Ad-Hoc
Information Retrieval tasks. The goal for the experiments this year was twofold:
to continue testing the effect of combination approaches on information re-
trieval tasks, and improving our basic processing and indexing tools, adapting
them to new languages with strange encoding schemes. The starting point was a
set of basic components: stemming, transforming, filtering, proper nouns ex-
traction, paragraph extraction, and pseudo-relevance feedback. Some of these
basic components were used in different combinations and order of application
for document indexing and for query processing. Second-order combinations
were also tested, by averaging or selective combination of the documents re-
trieved by different approaches for a particular query. In the multilingual track,
we concentrated our work on the merging process of the results of monolingual
runs to get the overall multilingual result, relying on available translations. In
both cross-lingual tracks, we have used available translation resources, and in
some cases we have used a combination approach.
The MIRACLE team is made up of three university research groups located in Madrid
(UPM, UC3M and UAM) along with DAEDALUS, a company founded in 1998 as a
spin-off of two of these groups. DAEDALUS is a leading company in linguistic tech-
nologies in Spain and is the coordinator of the MIRACLE team. This is our third par-
ticipation in CLEF, after 2003 and 2004. As well as bilingual, monolingual and cross
lingual tasks, the team has participated this year in the ImageCLEF, Q&A, WebCLEF
and GeoCLEF tracks.
The starting point was a set of basic components: stemming, transformation (trans-
literation, elimination of diacritics and conversion to lowercase), filtering (elimination
of stop and frequent words), extracting proper nouns, extracting paragraphs, and
pseudo-relevance feedback. Some of these basic components are used in different
combinations and order of application for document indexing and for query process-
ing. Second order combinations were also tested, mainly by averaging or by selective
combination of the documents retrieved by different approaches for a particular
query. When evidence is found of better precision of one system at one extreme of the
recall level (i.e. 1), complemented by the better precision of another system at the
MIRACLE at Ad-Hoc CLEF 2005 45
other recall end (i.e. 0), then both are combined to benefit from their complementary
Additionally, during the last year our group has been improving an indexing sys-
tem based on the trie data structure, which was reported last year. Tries  have been
successfully used by the MIRACLE team for years, as an efficient technique for the
storage and retrieval of huge lexical resources, combined with a continuation-based
approach to morphological treatment . However, the adaptation of these structures
to manage document indexing and retrieval for IR applications efficiently has been a
hard task, mainly in the issues concerning the performance of the construction of the
index. Thus, this year we have used only our trie-based indexing system, and so, the
Xapian  indexing system used in the previous CLEF editions was no longer
needed. In fact, we have been able to carry out more experiments than the previous
year, since we have had more computing time available because of this improvement
in indexing efficiency.
For the 2005 bilingual track, runs were submitted for the following language pairs:
English to Bulgarian, French, Hungarian and Portuguese; and Spanish to French and
Portuguese. For the multilingual track, runs were submitted using as source language
English, French, and Spanish. Finally, in the monolingual case runs were submitted
for Bulgarian, French, Hungarian, and Portuguese.
2 Description of the MIRACLE Toolbox
Document collections were pre-processed before indexing, using different combina-
tions of elementary processes, each one oriented towards a particular experiment. For
each of these, topic queries were also processed using the same combination of proc-
esses. (Although some variants have been used, as will be described later.)
The baseline approach to document and topic query processing is made up of a
combination of the following steps:
− Extraction: The raw text from different document collections or topic files is ex-
tracted with ad-hoc scripts that selected the contents of the desired XML elements.
All those permitted for automatic runs were used. (Depending on the collection, all
of the existing TEXT, TITLE, LEAD1, TX, LD, TI, or ST for document collec-
tions, and the contents of the TITLE, DESC, and NARR for topic queries.) The
contents of these tags were concatenated, without further distinction to feed subse-
quent processing steps. This extraction treatment has a special filter for extracting
topic queries in the case of the use of the narrative field: some patterns that were
obtained from the topics of the past campaigns are eliminated, since they are recur-
rent and misleading in the retrieval process; for example, for English, “… are not
relevant.”, or “…are to be excluded.”. All the sentences that contain these patterns
are filtered out.
− Paragraphs extraction: In some experiments, we indexed paragraphs1 instead of
documents. Thus, the subsequent retrieval process returned document paragraphs,
so we needed to combine the relevance measures from all paragraphs retrieved for
1 Paragraphs are either marked by the <P> tag in the original XML document, or are separated
from each other by two carriage returns, so they are easily detected.
46 J.M. Goñi-Menoyo, J.C. González-Cristóbal, and J. Villena-Román
the same document. We tested several approaches for this combination, for exam-
ple counting the number of paragraphs, adding relevance measures or using the
maximum of the relevance figures of the paragraphs retrieved. Experimentally, we
got best results using the following formula for document relevance:
where n is the number of paragraphs retrieved for document N, reljN is the relevance
measure obtained for the j-th paragraph of document N, and m refers to the paragraph
with maximum relevance. The coefficient ξ was adjusted experimentally to 0.75.
The idea behind this formula is to give paramount importance to the maximum para-
graph relevance, but taking into account the rest of the relevant paragraphs to a lesser
extent. Paragraph extraction was not used for topic processing.
− Tokenization: This process extracts basic text components, detecting and isolating
punctuation symbols. Some basic entities are also treated, such as numbers, initials,
abbreviations, and years. For now, we do not treat compounds, proper nouns, acro-
nyms or other entities. The outcomes of this process are only single words and
years that appear as numbers in the text (e.g. 1995, 2004, etc.).
− Filtering: All words recognized as stopwords are filtered out. Stopwords in the
target languages were initially obtained from , but were extended using several
other sources and our own knowledge and resources. We also used other lists of
words to exclude from the indexing and querying processes, which were obtained
from the topics of past CLEF editions. We consider that such words have no se-
mantics in the type of queries used in CLEF; for example, in the English list: ap-
pear, relevant, document, report, etc.
− Transformation: The items that resulted from tokenization were normalized by con-
verting all uppercase letters to lowercase and eliminating accents. This process is
usually carried out after stemming, although it can be done before, but the resulting
lexemes are different. We ought to do it before stemming in the case of the Bulgarian
and Hungarian languages, since these stemmers did not work well with uppercase
letters. Note that the accent removal process is not applicable for Bulgarian.
− Stemming: This process is applied to each one of the words to be indexed or used
for retrieval. We used standard stemmers from Porter  for most languages, ex-
cept for Hungarian, where we used a stemmer from Neuchatel .
− Proper noun extraction: In some experiments, we try to detect and extract proper
nouns in the text. The detection was very simple: Any chunk that results from the
tokenization process is considered a proper noun provided that its first letter is up-
percase, unless this word is included in the stopwords list or in a specifically built
list of words that are not suitable to be proper nouns (mainly verbs and adverbs).
We opted for this simple strategy2 since we did not have available huge lists of
proper nouns. In the experiments that used this process, only the proper nouns ex-
tracted from the topics fed a query to an index of documents of normal words,
where neither proper nouns were extracted nor stemming was carried out.
− Linguistic processing: In the Multi-8 track, and only in the case of Spanish as
topic language, we tested an approach consisting in pre-processing the topics with
2 Note that multi-word proper nouns cannot be treated this way.
MIRACLE at Ad-Hoc CLEF 2005 47
a high quality morphologic analysis tool. This tool is STILUS3. STILUS not only
recognizes closed words, but also expressions (prepositional, adverbial, etc.). In
this case, STILUS is simply used to discard closed words and expressions from the
topics and to obtain the main form of their component words (in most cases, singu-
lar masculine or feminine for nouns and adjectives and infinitive for verbs). The
queries are so transformed to a simple list of words that are passed to the automatic
translators (one word per line).
− Translation: For cross-lingual tracks, popular on-line translation or available dic-
tionary resources were used to translate topic queries to target languages:
ATRANS was used for the pairs EsFr and EsPt; Bultra and Webtrance for EnBg4;
MoBiCAT for EnHu; and SYSTRAN was used for the language pairs EnFr, EsFr,
and EnPt. However, for multilingual runs having English as topic language, we
avoided working on the translation problem for some runs. In this case, we have
used the provided translations for topic queries , testing Savoy’s  approach
to translation concatenations. Two cases were considered: all available translations
are concatenated, and selected translations are concatenated. Table 1 shows the
translations used for both cases.
In the Multi-8 track we also used automatic translation systems: for Spanish and
French as topic languages, ATRANS was used for the pairs EsFr and EsPt; World-
Lingo for EsDe, EsIt, and EsNl; InterTrans for EsFi, EsSv, FrFi, and FrSv; and
SYSTRAN was used for all the other language pairs. Only one translator was used
for each pair.
− Final use
• Indexing: When all the documents processed through a combination of the for-
mer steps are ready for indexing, they are fed into our indexing trie engine to
build the document collection index.
• Retrieval: When all the documents processed by a combination of the afore-
mentioned steps are topic queries, they are fed to an ad-hoc front-end of the re-
trieval trie engine to search the previously built document collection index. In
the 2005 experiments, only OR combinations of the search terms were used. The
retrieval model used is the well-known Robertson’s Okapi BM-25  formula
for the probabilistic retrieval model, without relevance feedback.
After retrieval, some other special processes were used to define additional ex-
Pseudo-relevance feedback: We used this technique in some experiments. After a
first retrieval step, we processed the first retrieved document to get their indexing
terms that, after a standard processing5 (see below) are fed back to a second re-
trieval step, whose result is used.
3 STILUS® is a trademark of DAEDALUS-Data, Decisions and Language, S.A. It is the core
of the Spanish-processing tools of the company, that include spell, grammar and style check-
ers, fuzzy search engines, semantic processing, etc.
4 In the case of Bulgarian, an average combination of the results from the translations with the
Webtrance and Bultra systems from English to Bulgarian has also been used.
5 Both retrieval processes can be independent of each other: we could have used two different
treatments for the queries and documents, so using different indexes for each of the retrievals.
In our case, only standard treatments were used for both retrieval steps.
48 J.M. Goñi-Menoyo, J.C. González-Cristóbal, and J. Villena-Román
Table 1. Available automatic translations used for concatenating
ALT for Babelfish Altavista, BA1, BA2, and BA36 for Babylon, FRE for
FreeTranslation, GOO for Google Language Tools, INT for InterTrans, LIN for
WordLingo, REV for Reverso, and SYS for Systran. The entries in the table contain
A (for ALL) if a translation is available for English to the topic language shown in
the heading row of a column, and it is used for the concatenation of all available
translations; and H if a translation is used for the selected concatenation of transla-
− Combination: The results from some basic experiments were combined in differ-
ent ways. The underlying hypothesis is that, to some extent, the documents with a
good score in almost all experiments are more likely to be relevant than other
documents that have a good score in one experiment, but a bad one in others. Two
strategies were followed for combining experiments:
• Average: The relevance figures obtained using the probabilistic retrieval in all
the experiments to be combined for a particular document in a given query are
added. This approach combines the relevance figures of the experiments without
highlighting a particular experiment.
• Asymmetric WDX combination: In this particular type of combination, two
experiments are combined in the following way: The relevance of the first D
documents for each query of the first experiment is preserved for the resulting
combined relevance, whereas the relevance for the remaining documents in both
experiments are combined using weights W and X. We have only run experi-
ments labeled “011”, that is, the ones that get the most relevant documents from
the first basic experiment and all the remaining documents retrieved from the
second basic experiment, re-sorting all the results using the original relevance
− Merging: In the multilingual case, the approach used requires that the monolingual
results list for each one of the target languages have to be merged. The results ob-
tained are very sensitive to the merging approach for the relevance measures. The
6 The digit after BA shows how many words are used from the translation of a word, provided
that it returns more than one.
MIRACLE at Ad-Hoc CLEF 2005 49
probabilistic BM25  formula used for monolingual retrieval gives relevance
measures that depend heavily on parameters that are too dependent on the mono-
lingual collection, so it is not very good for this type of multilingual merging, since
relevance measures are not comparable between collections. In spite of this, we
carried out merging experiments using the relevance figures obtained from each
monolingual retrieval process, considering three cases:7
• Using original relevance measures for each document as obtained from the
monolingual retrieval process. The results are made up of the documents with
greater relevance measures.
• Normalizing relevance measures with respect to the maximum relevance meas-
ure obtained for each topic query i (standard normalization):
Then, the results are made up of the documents with greater normalized rele-
• Normalizing relevance measures with respect to the maximum and minimum
relevance measure obtained for each topic query i (alternate normalization):
Then, the results are made up of the documents with greater alternate normal-
ized relevance measures.
In addition to all this, we tried a different approach to merging: Considering
that the more relevant documents for each of the topics are usually the first ones
in the results list, we will select from each monolingual results file a variable
number of documents, proportional to the average relevance number of the first
N documents. Thus, if we need 1,000 documents for a given topic query, we
will get more documents from languages where the average relevance of the
first N relevant documents is greater. We did all this both from non-normalized
runs, but normalized after the merging process is carried out (with standard and
alternate normalization); and from runs normalized with alternate normaliza-
tion. We tested several cases using results from baseline runs, using several val-
ues for N: 1, 10, 50, 125, 250, and 1,000.
3 Description of the Experiments
For this campaign we have designed several experiments in which the documents for
indexing and the topic queries for retrieval are processed using a particular combina-
tion of some of the steps described in the previous section. A detailed inventory of the
experiments, the processes used for each one, and their encoding in the name of the
experiment can be found in the papers submitted to the CLEF 2005 Workshop (,
). Details of the documents collections and the tasks can be found in the introduc-
tion  and track overview  papers.
7 Round-robin merging for results of each monolingual collection has not been used.
50 J.M. Goñi-Menoyo, J.C. González-Cristóbal, and J. Villena-Román
Several hundreds of experiments were run, and the criterion for choosing the ones
to be submitted was the runs that obtained best results using topic queries and qrels
sets from the 2004 campaign. Except for Portuguese, the best results obtained came
from runs that were not submitted. We think that this behavior can be explained since
the results depend to a great extent on the different topics selected each year. It is
worth noting that we obtained the best results using the narrative field of the topic
queries in all cases, as well as the standard processing approach.
We expected to have had better results using combinations of proper noun indexing
with standard runs, as it seemed to follow from the results from 2004 campaign, but it
has not been the case. It is clear that the quality of the tokenization step is of para-
mount importance for precise document processing. We still think that a high-quality
entity recognition (proper nouns or acronyms for people, companies, countries, loca-
tions, and so on) could improve the precision and recall figures of the overall re-
trieval, as well as a correct recognition and normalization of dates, times, numbers,
etc. Pseudo-relevance feedback has not performed quite well, but we ran quite few
experiments of this type to extract general conclusions. On the other hand, these runs
had a lot of querying terms, which made them very slow.
Regarding the basic experiments, the general conclusions were known in advance:
retrieval performance can be improved by using stemming, filtering of frequent words
and appropriate weighting.
Regarding cross-lingual experiments, the MIRACLE team has worked on their
merging and combining aspects, departing from the translation ones. Combining ap-
proaches seems to improve results in some cases. For example, the average combin-
ing approach allows us to obtain better results when combining the results from trans-
lations for Bulgarian than the Bultra or Webtrance systems alone. In multilingual
experiments, combining (concatenating) translations permits better results, as was re-
ported previously , when good translations are available. Regarding the merging
aspects, our approach did not obtain better results than standard merging, whether
normalized or not. Alternate normalizations seem to behave better than the standard
normalization, whereas the latter behaves better than no normalization. This occurs
too when normalization is used in our own approach to merging.
Regarding the approach consisting of preprocessing queries in the source topic
language with high quality tools for extracting content words before translation, the
results have been good when used in the case of Spanish (with our tool STILUS).
This approach achieved the best precision figures at 0 and at 1 recall extremes, al-
though worse average precision than other runs.
In the appendix we have included two figures that summarize these results.
Figure 1 shows a comparison of the results obtained in the best runs in the monolin-
gual experiments for each target language. The best results are obtained for French
and Portuguese, and the worst for Bulgarian. Figure 2 shows the results obtained in
the best runs in the cross-lingual experiments for bilingual and multilingual runs, con-
sidering all source languages used.
4 Conclusions and Future Work
Future work of the MIRACLE team in these tasks will be directed to several lines of
research: (a) Tuning our indexing and retrieval trie-based engine in order to get even
MIRACLE at Ad-Hoc CLEF 2005 51
better performance in the indexing and retrieval phases, and (b) improving the tokeni-
zation step; in our opinion, this is one of the most critical processing ones and can im-
prove the overall results of the IR process. Good entity recognition and normalization
is still missing from our processing scheme for these tasks. We need better perform-
ance of the retrieval system to drive runs that are efficient when the query has some
hundred terms, as occurs when using pseudo-relevance feedback. We also need to
explore further the combination schemes with these enhancements of the basic
Regarding cross-lingual tasks, future work will be centered on the merging aspects
of the monolingual results. The translation aspects of this process are of no interest to
us, since our research interests depart from all this: we will only use translation re-
sources available, and we will try to combine them to get better results.
On the other hand, the process of merging the monolingual results is very sensitive
in the way it is done; there are some techniques to be explored. In addition to that,
perhaps a different way of measuring relevance is needed for monolingual retrieval
when multilingual merging has to be carried out. Such a measure should be independ-
ent of the collection, so monolingual relevance measures would be comparable.
This work has been partially supported by the Spanish R+D National Plan, by means
of the project RIMMEL (Multilingual and Multimedia Information Retrieval, and its
Special mention to our colleagues of the MIRACLE team should be done (in alp-
habetical order): Ana María García-Serrano, Ana González-Ledesma, José Mª Gui-
rao-Miras, Sara Lana-Serrano, José Luis Martínez-Fernández, Paloma Martínez-
Fernández, Ángel Martínez-González, Antonio Moreno-Sandoval and César de Pa-
1. Aoe, J.-I., Morimoto, K., and Sato, T.: An Efficient Implementation of Trie Structures.
Software Practice and Experience 22(9): 695-721 (1992)
2. CLEF 2005 Multilingual Information
http://www.computing.dcu.ie/~gjones/CLEF2005/Multi-8/ [Visited 11/08/2005].
3. González, J.C., Goñi-Menoyo, J.M., and Villena-Román, J.: MIRACLE’s 2005 Approach
to Cross-lingual Information Retrieval. Working Notes for the CLEF 2005 Workshop. Vi-
enna, Austria (2005) Online http://clef.isti.cnr.it/2005/working_notes/workingnotes2005/
gonzalez05.pdf [Visited 05/11/2005].
4. Goñi-Menoyo, J. M., González-Cristóbal, J. C., and Fombella-Mourelle, J.: An optimised
trie index for natural language processing lexicons. MIRACLE Technical Report. Univer-
sidad Politécnica de Madrid (2004)
Retrieval resources page. On line
52 J.M. Goñi-Menoyo, J.C. González-Cristóbal, and J. Villena-Román
5. Goñi-Menoyo, J. M., González, J. C., and Villena-Román, J.: MIRACLE’s 2005 Approach
to Monolingual Information Retrieval. Working Notes for the CLEF 2005 Workshop. Vi-
enna,Austria (2005) On line http://clef.isti.cnr.it/2005/working_notes/workingnotes2005/
menoyo05.pdf [Visited 05/11/2005].
6. Di Nunzio, G. M., Ferro, N., and Jones, G. J. F.: CLEF 2005: Ad Hoc Multilingual Track
Overview. Proceedings of the Cross Language Evaluation Forum 2005, Springer Lecture
Notes in Computer science, 2006 (in this volume).
7. Peters, C.: What happened in CLEF 2005. Proceedings of the Cross Language Evaluation
Forum 2005, Springer Lecture Notes in Computer science, 2006 (in this volume).
8. Porter, M.: Snowball stemmers
http://www.snowball.tartarus.org [Visited 13/07/2005].
9. Robertson, S.E. et al.: Okapi at TREC-3. In Overview of the Third Text REtrieval Confer-
ence (TREC-3). D.K. Harman (Ed.). Gaithersburg, MD: NIST (1995)
10. Savoy, J.: Report on CLEF-2003 Multilingual Tracks. Comparative Evaluation of Multi-
lingual Information Access Systems (Peters, C; Gonzalo, J.; Brascher, M.; and Kluck, M.,
Eds.). Lecture Notes in Computer Science, vol. 3237, pp. 64-73. Springer. (2004)
11. University of Neuchatel. Page of resources for CLEF (Stopwords, transliteration, stem-
mers…) On line http://www.unine.ch/info/clef [Visited 13/07/2005].
12. Xapian: an Open Source Probabilistic Information Retrieval library. On line http://www.
xapian.org [Visited 13/07/2005].
and resources page. On line
Fig. 1. Comparison of results from the best monolingual experiments
0 0.2 0.4 0.6 0.8 1
Best monolingual experiments
MIRACLE at Ad-Hoc CLEF 2005 53
Fig. 2. Comparison of results from the best cross-lingual experiments
0 0.2 0.4 0.6 0.8 1
Best cross-lingual experiments