Conference Paper

Overview of the Second Text Retrieval Conference (TREC-2).

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The HD-tree is a RAM/disk-based tree that incorporates and extends indexing strategies of the digital trees and B-trees, taking advantages of their strengths in search performance and index capability. The HD-tree is compared with the Prefix B-tree using real textual data from Text REtrieval Conference (TREC) collections [37]. Queries are generated with different cluster levels to study the effectiveness of the HD-tree. ...
... The performance of the HD-tree on prefix searches are tested using real textual data. A sample database, WSJ1, is generated from the Wall Street Journal (entire year of 1991), which is a part of the Text REtrieval Conference (TREC) collection [37]. Markup tags are removed, texts are split into segments of 5MB each, and unique prefixes of the suffix strings at word boundaries are extracted for every segment. ...
... In our study on expert profiling, test queries (i.e., " information needs " ) are readily available: They are potentially all experts from the knowledge-intensive organization being considered. At the TREC ad hoc tracks, test queries with too few or too many relevant documents are sometimes rejected (Harman, 1995; Voorhees & Harman, 2000). Harman (1995) reports that for all created queries, a trial run on a sample of documents from the complete collection yielded between 25 (narrow query) and 100 (broad query) relevant documents. ...
... , 2000). In our study on expert profiling, test queries (i.e., " information needs " ) are readily available: They are potentially all experts from the knowledge-intensive organization being considered. At the TREC ad hoc tracks, test queries with too few or too many relevant documents are sometimes rejected (Harman, 1995; Voorhees & Harman, 2000). Harman (1995) reports that for all created queries, a trial run on a sample of documents from the complete collection yielded between 25 (narrow query) and 100 (broad query) relevant documents. Zobel (1998) notes that selecting queries based on the number of relevant documents may introduce a bias. In our experiments, we retain all test queries (i.e. ...
Article
Expertise retrieval has attracted significant interest in the field of information retrieval. Expert finding has been studied extensively, with less attention going to the complementary task of expert profiling, that is, automatically identifying topics about which a person is knowledgeable. We describe a test collection for expert profiling in which expert users have self‐selected their knowledge areas. Motivated by the sparseness of this set of knowledge areas, we report on an assessment experiment in which academic experts judge a profile that has been automatically generated by state‐of‐the‐art expert‐profiling algorithms; optionally, experts can indicate a level of expertise for relevant areas. Experts may also give feedback on the quality of the system‐generated knowledge areas. We report on a content analysis of these comments and gain insights into what aspects of profiles matter to experts. We provide an error analysis of the system‐generated profiles, identifying factors that help explain why certain experts may be harder to profile than others. We also analyze the impact on evaluating expert‐profiling systems of using self‐selected versus judged system‐generated knowledge areas as ground truth; they rank systems somewhat differently but detect about the same amount of pairwise significant differences despite the fact that the judged system‐generated assessments are more sparse.
... There are good reasons to believe that the conventional methods, e.g. those applied in TREC-1-3123, do not entirely fulfil these requirements [4, p. 22]. IR experiments have been carried out for nearly forty years. ...
... In spite of this long IR research tradition, there has not been much innovation regarding the methods used. For instance, the original Cranfield methodology [5] is still employed today in TREC [1, 2]. The criticism against the conventional methods are, for example: ...
Article
Full-text available
The paper describes the ideas and assumptions underlying the development of a new method for the evaluation and testing of interactive information retrieval (IR) systems, and reports on the initial tests of the proposed method. The method is designed to collect different types of empirical data, i.e. cognitive data as well as traditional systems performance data. The method is based on the novel concept of a ‘simulated work task situation’ or scenario and the involvement of real end users. The method is also based on a mixture of simulated and real information needs, and involves a group of test persons as well as assessments made by individual panel members. The relevance assessments are made with reference to the concepts of topical as well as situational relevance. The method takes into account the dynamic nature of information needs which are assumed to develop over time for the same user, a variability which is presumed to be strongly connected to the processes of relevance assessment.
... The Associated Press (AP) is a sub-collection of newswire articles from the TREC corpus [Harman, 1995]. Seven collections of newswire articles are developed using AP data (Full details of these collections are given in Section 3.1.1). ...
... The AP10k and AP100k collections are drawn from the AP sub-collection of TREC data [Harman, 1995], while AP500k data contains documents from not only AP, but also WSJ and SJM (these two collections also consist of newswire articles). The seven candidates-the same as selected in AP7 data-are used as the target authors to keep the consistency with the other AA investigations in this thesis. ...
... (1) où Q est le nombre de requêtes jugées et AP q (Average Precision) est la précision moyenne pour une requête q qu'on obtient à partir de l'équation 2 (Harman, 1994) . ...
... Quand on ne dispose pas de jugements de pertinence gradués, cette mesure a une corrélation très forte avec la précision moyenne (Sakai, 2007), elle n'est donc pas considérée comme une mesure d'évaluation du rappel. Une autre mesure qui peut paraître adaptée à une recherche orientée rappel est la précision au rang R (R-Prec) (Harman, 1994), où R est le nombre de documents pertinents (Relevant) dans la collection pour une requête. Cette mesure reste une variante de la précision pure à un rang précis dans la liste de résultats, donc elle n'est pas adaptée à un contexte de rappel. ...
Article
La majorité des métriques d’évaluation en recherche d’information donnent plus d’importance à la précision qu’au rappel, ce qui est cohérent avec certains contextes, comme la recherche sur internet, où les utilisateurs regardent uniquement les premiers documents apparaissant en début de la liste de résultats. Ces métriques d’évaluation ne sont pas capables de prédire à quel point l’utilisateur sera satisfait du système s’il cherche à trouver le maximum de documents pertinents, comme dans le cas de la recherche de brevets ou de la recherche dans un contexte médical. Dans cet article, nous définissons les besoins principaux pour évaluer et comparer des systèmes de recherche d’information dans un contexte orienté rappel. Nous montrons que les mesures précédemment proposées dans la littérature ne répondent pas à ces besoins, ce qui nous amène à proposer la mesure MOR qui permet de comparer correctement le rappel des systèmes de recherche d’information.
... For settings where labels cannot be imputed reliably, precision can be estimated by labeling predicted positives, and recall can be estimated with respect to a non-exhaustive set of known positives(Harman, 1992;Ji et al., 2011).2 Using the transitive closure increases the total number of positives from 149,263 to 228,548, so this adds many positives but does not overwhelm the original data. ...
... Another variant of Precision is Noninterpolated average precision " corresponds to the area under superlative (non-interpolated) recall/precision curve. " [5] This metric is measured by " computing the precision after every retrieved relevant document and then averaging these precisions over the total number of retrieved relevant documents " for a given query. There will be a different average precision, in general, for each query. ...
Article
Full-text available
Information Retrieval (IR) is used to store and represent the knowledge and the retrieval of information relevant for a special user query. Multilingual Information Retrieval (MLIR) system helps the users to pose the query in one language and retrieve the documents in more than one language. One of the basic performance measures of IR systems is Precision. While this measure work well in monolingual web retrieval, not suitable for CLIR (Cross-lingual Information Retrieval) or MLIR where two or more languages are involved. This paper proposed a metric which measures Precision at K which is the proportion of relevant documents in the first K positions when more than one document languages involved in the retrieval system i.e. MLIR. Experimental results demonstrate that the proposed metric is effective in systems where more than one document languages involved in the retrieval.
... To examine such effects, local document collections were built in the following manner. One hundred queries were taken from the Ad-Hoc task of TREC-6 (Voorhees and Harman, 1997) and TREC-7 (Voorhees and Harman, 1998), namely, ...
... Distributed as part of the TREC project, it does not contain duplicates. The highly skew trec dataset is the complete set of word occurrences, with duplicates, from the first of five TREC CDs (Harman 1995). The url dataset, also extracted from TREC web data, consists of complete URLs with duplicates. ...
Conference Paper
Full-text available
Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other high-performance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.
... The most famous QA effectiveness competition mentioned in previous section is famous American TREC[2]. Its Japanese equivalent is called NTCIR[7] and most of our group decided to participate in its QAC[1] task for the first time, though only one debutante had a QA background. ...
Conference Paper
Full-text available
This paper is a report from collective participation in NTCIR-5 Question Answering Challenge between researchers from Mie University, Hokkaido University and Otaru University of Commerce. Although our re- sults were not impressive, we would like to share our experiences with everyone who think about participat- ing in the challenge but is afraid of his or her lack of experience in the field. Understanding the prob- lems of QA from the practical side was very instruc- tive and gave us a stronger base for future trials. We briefly introduce our preparations and participation then conclude with analysis what can be simply done with freely available tools.
... Alternatively, they suggest using much more than six hidden layers in their model; however, they found out that the classification performance increased only slightly. Schütze, Hull, and Pedersen (1995) evaluated linear discriminant analysis (LDA), logistic regression, and neural networks on the TREC-2 and TREC-3 (Harman, 1995a(Harman, , 1995b collections. They selected the 200 most relevant features using the v 2 statistic. ...
Article
Widespread digitization of information in today’s internet age has intensified the need for effective textual document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naïve Bayes (NB), Neural Network (NN), Linear Least Squares Fit (LLSF), k-Nearest-Neighbor (kNN), and Support Vector Machines (SVM); with SVMs showing better results in most cases. In this paper we introduce a new approach called dynamic architecture for artificial neural networks (DAN2) as an alternative for solving textual document classification problems. DAN2 is a scalable algorithm that does not require parameter settings or network architecture configuration. To show DAN2 as an effective and scalable alternative for text classification, we present comparative results for the Reuters-21578 benchmark dataset. Our results show DAN2 to perform very well against the current leading solutions (kNN and SVM) using established classification metrics.
... As a background model of news document text, we used newswire documents from Associated Press, Wall Street Journal, and the Federal Register from TREC disks 1 and 2 [1]. Term document frequency statistics were gathered and stored to an auxiliary vocabulary file. ...
Article
This is RMIT's first year of participation in the TDT evaluati on. Our system uses a linear classifier to track topics and an approac h based on our previous work in document routing. We aimed this year to develop a baseline system, and to then test selected variati ons, in- cluding adaptive tracking. Our contribution this year to ha ve im- plemented an efficient system, that is, one that maximises tr acking document throughput and minimises system overheads.
... All four of the groups used the INQUERY retrieval engine [54]. All four groups had the same retrieval/routing task, namely the selection of relevant documents from a subset of a TREC2 test collection [131,127] given two search topics (automobile recalls and tobacco advertising and the young). The key difference between each group was the availability of relevance feedback and the extent to which it could be manipulated by the user, ranging from a system offering no relevance feedback at all (the control group), to a 'penetrable' relevance feedback interface that supported user interaction (i.e. the ability to edit system selected expansion terms). ...
... In order to assess the comparative value of the new approach vs. the old one (i.e., the non-incremental OCAT approach), we used examples derived by analyzing almost 3000 text documents from the TIPSTER collection of documents [3,4]. For this purpose, we used the document surrogate concept as introduced by Salton [5] in order to represent text documents as binary vectors. ...
Article
Full-text available
This paper introduces an incremental algorithm for learning a Boolean function from examples. The functions are constructedin the d isjunctive normal form (DNF) or the conjunctive normal form (CNF) andemphasis is placedin inferring functions with as few clauses as possible. This incremental algorithm can be combinedwith any existing algorithm that infers a Boolean function from examples. In this paper it is combinedwith the one clause at a time (OCAT) approach (Comput. Oper. Res. 21(2) (1994) 185) and(J. Global Optim. 5(1) (1994) 64) which is a non-incremental learning approach. An extensive computational study was undertaken to assess the performance characteristics of the new approach. As examples, we used binary vectors that represent text documents from dicient and it derives more accurate Boolean functions. As it was anticipated, the Boolean functions (in DNF or CNF form) d erivedby the new algorithm are comprisedby more clauses than the functions derived by the non-incremental approach. Scope and purpose There is a growing needfor method s that can analyze information andinfer patterns in a way that can be useful to the analyst. This is the core of data mining and knowledge discovery from databases. Such methods often infer a Boolean function from observations that belong to di
... Test collections are used to evaluate and compare different retrieval systems [35]. We use the large test collections built as part of the TREC initiative [11] represented in the form of topics that describe the information need at different levels. Each topic consists of three fields: " title " , " description " , and " narrative " . ...
Article
Abstract Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the diculties of comparing documents of dierent length, and enables identication of short blocks of relevant material amongst otherwise irrelevant text. In this paper, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either xed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval eectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents. Keywords: passage retrieval, document retrieval, eective ranking, similarity measures, pivoted
... Jansen, Spink, and Saracevic (1999) conducted an analysis of Web relevance feedback usage using data from Excite. Relevance feedback is a classic IR technique reported to be successful with many IR systems (Harman, 1992). The researchers con-cluded that only about 3% of the queries (1,597) could have been generated by Excite's relevance feedback option. ...
... Furthermore, most of the IR research and development (R&D) efforts concentrating on improvement of effectiveness in automatic representation and searching, treated IR systems and processes as static and not as dynamic or interactive (Saracevic, 1995). Such research, carried now for over thirty years, has reached a certain maturity, as evidenced by Text Retrieval Conference (TREC) experiments (Harman, 1995). In contrast, research on interactive aspects of IR has not reached a maturity; it may be even said that it is barely emerging out of infancy. ...
Article
Full-text available
The purpose is to critically examine traditional and interactive models that have emerged in IR, and to propose an interactive IR model based on different levels in the interactive processes. The traditional IR model was explicitly or implicitly adapted in majority of algorithmic works on IR, and in evaluation studies. Strengths and weaknesses of the traditional model are examined. Several interactive IR models have been developed over the years; the cognitive model by Ingwersen and the episodes model by Belkin are examined. A stratified interactive IR model is proposed with suggestion that it has a potential to account for a variety of aspects in the processes involved in IR interaction. In this model IR interaction is decomposed into several levels that subtly affect each other. The paper concludes with general remarks on the state of IR interaction research.
... To evaluate the performance of the proposed query refinement approach, we performed experiments several times. The test data was drawn from " TREC " conferences[6]. We used part of these data sets crawled in 1997[7]for TREC 9 and 10, which have sets of ten topics and accompanying relevance determinations. For user input queries, we used the title field from each TREC topic; moreover, we used the Wilcoxon signed rank test to evaluate the significance of the effectiveness of the results[8]. ...
Article
Effective information gathering and retrieval of the most relevant web documents on the topic of interest is difficult due to the large amount of information that exists in various formats. Current information gathering and retrieval techniques are unable to exploit semantic knowledge within documents in the "big data" environment; therefore, they cannot provide precise answers to specific questions. Existing commercial big data analytic platforms are restricted to a single data type; moreover, different big data analytic platforms are effective at processing different data types. Therefore, the development of a common big data platform that is suitable for efficiently processing various data types is needed. Furthermore, users often possess more than one intelligent device. It is therefore important to find an efficient preference profile construction approach to record the user context and personalized applications. In this way, user needs can be tailored according to the user`s dynamic interests by tracking all devices owned by the user.
... ); end for y (tr) ← BoxKDE(y (tr) , y ; if pj ≥ maxj then maxj = pj; ˆ Y = k; end if end for end forˆy forˆ forˆy ← InverseCDF ( y (tr) , ˆ Y ); Note that unlike the standard news corpora in NLP or the SEC-mandated financial report, Transcripts of earnings call is a very special genre of text. For example, the length of WSJ documents is typically one to three hundreds (Harman, 1995), but the averaged document length of our three earnings calls datasets is 7677. Depending on the amount of interactions in the question answering session, the complexities of the calls vary. ...
Conference Paper
Earnings call summarizes the financial performance of a company, and it is an important indicator of the future financial risks of the company. We quantitatively study how earnings calls are correlated with the financial risks, with a special focus on the financial crisis of 2009. In particular, we perform a text regression task: given the transcript of an earnings call, we predict the volatility of stock prices from the week after the call is made. We propose the use of copula: a powerful statistical framework that separately models the uniform marginals and their complex multivariate stochastic dependencies, while not requiring any prior assumptions on the distributions of the covariate and the dependent variable. By performing probability integral transform, our approach moves beyond the standard count-based bag-ofwords models in NLP, and improves previous work on text regression by incorporating the correlation among local features in the form of semiparametric Gaussian copula. In experiments, we show that our model significantly outperforms strong linear and non-linear discriminative baselines on three datasets under various settings.
... Expérimentations et résultats Les expérimentations ont été menées sur des documents de diérentes collections, et les approches sont testées avec des requêtes de TREC-1 [Har93] et TREC-2 [Har95], en utilisant un système basé sur le modèle probabiliste [RJ88]. ...
Article
Information retrieval on structured documents attempts to answer in a precise way to a user request by providing only elements of documents (doxels) that satis es this need for information. This thesis investigates the characterization of relations (structural and non-structural) between parts of structured documents in this context. We model structured documents indexing using the structure and relations between doxels and we characterize these relations by relative exhaustivity and speci city values. The querying process based on these structured documents generates virtual documents as results, indicating the relevant links between doxels. The model is validated through the evaluation campaign INEX 2007 data (660 000 documents Wikipedia, 100 requests) and the results show an improvement of 24% in average precision with the vector space model.
... A perfect ranking would place all the positive pairs (for which the entailment holds) before all the negative pairs. This task was evaluated using the Average precision measure, which is a common evaluation measure for ranking (e.g. in information retrieval) (Voorhees and Harman, 1999). ...
Data
Full-text available
... INQUERY system has a weighting scheme based on the rank of the feedback term in the list of possible feedback terms (the lower ranks get the highest discounts). Harman (1995) experimented with term re-weighting in query expansion, feedback term selection techniques or sort orders, and the effectiveness of performing multiple iterations of relevance feedback, and determined that using only selected terms for relevance feedback produced better results than using all the terms from the relevant documents. This researcher also found that using a feedback term sort order that incorporated information about the frequency of the term in the document, as well as the term's overall collection frequency, produced improved results. ...
Thesis
Full-text available
The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Our investigation of these questions has led to a novel way to use Wikipedia to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. We consider the top page on the ranked list to be highly important in determining the types of queries. Our aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, our approach could trace the path of a query in our index, and retrieve specific results for each type. We use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, we use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. We propose a new indexing method, based on the structures of the queries and the content of documents. Our search engine uses a core index for offline data and a hash index for real-time V data, which leads to improved performance. The TREC Web track evaluation of our experiments showed that our approach can be successfully employed for different types of queries. We evaluated our search engine on different sets of queries from TREC 2010, 2011 and 2012 Web tracks. Our approach achieved very good results in the TREC 2010 training queries. In the TREC 2011 testing queries, our approach was one of the six best compared to all other approaches (including those that used a very large corpus of 500 million documents), and it was second best when compared to approaches that used only part of the corpus (50 million documents), as ours did. In the TREC 2012 testing queries, our approach was second best if compared to all the approaches, and first if compared only to systems that used the subset of 50 million documents.
... On the other hand, researchers have achieved some success in improving retrieval effectiveness by combining different query and document representations such as [129] and [7]. Recently, this method has been extensively used by participants of TREC web track to improve web search performance [26] [55]. Most recently, Ogilvie and Callan [102] analyzed the conditions for successful combination of different document representations based on TREC web collection. ...
... TREC organizers used a pooling method that took the documents from the top of the ranking of all algorithms for a given search topic and merged them into one set (removing repetitions). The relevance of these pooled documents to the search topic was then evaluated by a single expert [60,61]. ...
Chapter
Full-text available
This chapter creates a groundwork and conceptual foundation for the rest of the book. The chapter introduces a definition of information credibility and of Web content credibility evaluation support. Methods of measuring Web content credibility are discussed, along with available datasets, including fake news datasets. The subject of bias and subjectivity of Web content credibility evaluations is discussed.
... For the experimental study, we employed two data sets: Dataset TREC-2 10 000 terms (pruned) Meneame 5780 users (pruned) [59] provides details about the Meneame data set and [60] describes the TREC-2 data collection. The terms and users were selected (pruned) to avoid effects from sparse or noisy data. ...
Article
Full-text available
In many applications, independence of event occurrences is assumed, even if there is evidence for dependence. Capturing dependence leads to complex models, and even if the complex models were superior, they fail to beat the simplicity and scalability of the independence assumption. Therefore, many models assume independence and apply heuristics to improve results. Theoretical explanations of the heuristics are seldom given or generalizable. This paper reports that some of these heuristics can be explained as encoding dependence in an exponent based on the generalized harmonic sum. Unlike independence, where the probability of subsequent occurrences of an event is the product of the single event probability, harmony is based on a product with decaying exponent. For independence, the sequence probability is $p^{1+1+ \cdots +1}=p^n$, whereas for harmony, it is $p^{1+1/2+ \cdots +1/n}$. The generalized harmonic sum leads to a spectrum of harmony assumptions. This paper shows that harmony assumptions naturally extend probability theory. An experimental evaluation for information retrieval (IR; term occurrences) and social networks (SN's; user interactions) shows that assuming harmony is more suitable than assuming independence. The potential impact of harmony assumptions lies beyond IR and SN's, since many applications rely on probability theory and apply heuristics to compensate the independence assumption. Given the concept of harmony assumptions, the dependence between multiple occurrences of an event can be reflected in an intuitive and effective way.
... In (Zhao and Zobel, 2005), the authors use data extracted from the "TREC" corpus (Harman, 1994). This corpus contains different newswire articles that cover different writing styles and different information domains. ...
Thesis
Full-text available
Author profiling and identification are two areas of data-driven computational linguistics that have gained a lot of relevance due to their potential applications in, e.g., forensic linguistic studies, marketing analysis, and historic/literary authorship verification. Author profiling aims to identify demographic traits of the authors, while author identification aims to identify the authors themselves by searching for distinctive linguistic patterns that distinguish them. The majority of approaches in the related work tends to focus on the content of the texts. We argue that focusing on structure rather than content can be more effective. The main focus of the thesis is thus on feature engineering, the development, evaluation and application of the feature set in the context of machine learning techniques to author profiling and identification. We prove the profiling potential of syntactic and discourse features, which achieve state-of-the-art performance in many different scenarios, especially when combined with other features.
... Three datasets have been used for assessing topic models (see Table 1). The first dataset we used is the TREC AP corpus [14]. This corpus was previously used 2 http://www.cs.princeton.edu/ ...
... So alternative approaches to complete manual checking have to be devised in a domain-dependent manner. See, for instance, the ingenious approach taken by the TREC text retrieval contest [22] that exploits the fact that many different answers (from different retrieval systems) are available for the same set of queries. ...
Article
Full-text available
Background. Data from software version archives and defect databases can be used for defect insertion circumstance analysis and defect prediction. The first step in such analyses is identifying defect-correcting changes in the version archive (bugfix commits) and enriching them with additional metadata by establishing bugfix links to corresponding entries in the defect database. Candidate bugfix commits are typically identified via heuristic string matching on the commit message. Research Questions. Which filters could be used to obtain a set of bugfix links? How to tune their parameters? What accuracy is achieved? Method. We analyze a modular set of seven independent filters, including new ones that make use of reverse links, and evaluate visual heuristics for setting cutoff parameters. For a commercial repository, a product expert manually verifies over 2500 links to validate the results with unprecedented accuracy. Results. The heuristics pick a very good parameter value for five filters and a reasonably good one for the sixth. The combined filtering, called bflinks, provides 93% precision and only 7% results loss. Conclusion. Bflinks can provide high-quality results and adapts to repositories with different properties.
... Last but not least, it is important to note that the bioCADDIE test query set is extremely small, with just 15 queries. Typical IR challenges such as the TREC challenges include at least 50 test queries (66)(67)(68). This may be considered to be a low number of test queries for providing consistent and robust evaluation results. ...
Article
Full-text available
In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one.
Article
Full-text available
A key element in modern text retrieval systems is the weighting of individual words for importance. Early in the development of document retrieval methods it was recognized that performance could be improved if weights were based at least in part on the frequencies of individual terms in the database. This observation led investigators to propose inverse document frequency weighting, which has become the most commonly used approach. Inverse document frequency weighting can be given some justification based on probabilistic arguments. However, many different formulas have been tried and it is difficult to distinguish between these on a purely theoretical basis. Witten, Moffat and Bell, have proposed a monotonicity condition as fundamental: ‘a term that appears in many documents should not be regarded as more important than a term that appears in a few’. Based on this monotonicity assumption and probabilistic arguments we show here how the TREC data can be used to learn ideal global weights. Using cross-validation we show that these weights are a modest but statistically significant improvement over IDF weights. One conclusion is that IDF weights are close to optimal within the probabilistic assumptions that are commonly made.
Article
Full-text available
We compared the information retrieval performances of some popular search engines (namely, Google, Yahoo, AlltheWeb, Gigablast, Zworks and AltaVista and Bing/MSN) in response to a list of ten queries, varying in complexity. These queries were run on each search engine and the precision and response time of the retrieved results were recorded. The first ten documents on each retrieval output were evaluated as being ‘relevant’ or ‘non-relevant’ for evaluation of the search engine’s precision. To evaluate response time, normalised recall ratios were calculated at various cut-off points for each query and search engine. This study shows that Google appears to be the best search engine in terms of both average precision (70%) and average response time (2 s). Gigablast and AlltheWeb performed the worst overall in this study.
Conference Paper
Full-text available
In this paper, some new indexing methodologies and applications in Information Retrieval (IR) has been presented. Some new algorithms with high coverage of IR applications has be introduced by this paper. Main strategy is introducing and evaluating Information Retrieval basic applications and modulation. Some future directions in IR methodologies and evaluations are the other subjects and focuses of this paper.
Article
We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.
Article
Full-text available
This paper describes our participation in the TREC 2005 Genomics track. We took part in the ad hoc retrieval task and aimed at integrat- ing thesauri in the retrieval model. We devel- oped three thesauri-based methods, two of which made use of the existing MeSH theasurus and terms. One method uses blind relevance feed- back on MeSH terms, the other uses an index of the MeSH thesaurus for query expansion. The third method makes use of a dynamically gener- ated lookup list, by which gene acronyms and syn- onyms could be inferred. We show that, despite the relatively minor improvements in retrieval per- formance of individually applied methods, a com- bination works best and is able to deliver signifi- cant improvements over the baseline.
Data
Full-text available
This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval. Our term-weighting functions are created within a general framework made up of three components. Each of the three components is built independently from the others. We obtain the term-weighting functions from the general model in a purely theoretic way instantiating each component with different probability distribution forms. The thesis begins with investigating the nature of the statistical inference involved in Information Retrieval. We explore the estimation problem underlying the process of sampling. De Finetti’s theorem is used to show how to convert the frequentist approach into Bayesian inference and we display and employ the derived estimation techniques in the context of Information Retrieval. We initially pay a great attention to the construction of the basic sample spaces of Information Retrieval. The notion of single or multiple sampling from different populations in the context of Information Retrieval is extensively discussed and used through-out the thesis. The language modelling approach and the standard probabilistic model are studied under the same foundational view and are experimentally compared to the divergence-from-randomness approach. In revisiting the main information retrieval models in the literature, we show that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness. We finally introduce a novel framework for the query expansion. This framework is based on the models of divergence-from-randomness and it can be applied to arbitrary models of IR, divergence-based, language modelling and probabilistic models included. We have done a very large number of experiment and results show that the framework generates highly effective Information Retrieval models.
Article
Full-text available
This paper presents a working voice-activated web-based Mandarin Chinese spoken document retrieval system. This system has integrated technologies of both spoken document retrieval and voice-activated WWW browser. The target database to be retrieved consists of tens of hours of radio and television Mandarin Chinese broadcast news. Extensive experiments have been conducted and a prototype system has been successfully implemented.
Article
La información en sus diversas modalidades tanto la científica como la humanística y, en concreto, el proceso intercultural de la traducción, son analizados como hitos que integrados en un modelo de análisis permiten clarificar el proceso mundial de la globalización. No es posible pensar en el desarrollo global sin considerar la recepción y producción por las sociedades occidentales del proceso informativo y traductológico. Ítem más las traducciones como bien indicó Itama Ben-Zohar son indicadores naturales del proceso de recepción e interconexión entre diversas culturas; y de ahí la necesidad de generar modelos de recuperación especializados en lugar de motores de búsquedas no especializados.
Article
With the amount and variety of information available on digital repositories, answering complex user needs and personalizing information access became a hard task. Putting the user in the retrieval loop has emerged as a reasonable alternative to enhance search effectiveness and consequently the user experience. Due to the great advances on machine learning techniques, optimizing search engines according to user preferences has attracted great attention from the research and industry communities. Interactively learning-to-rank has greatly evolved over the last decade but it still faces great theoretical and practical obstacles. This paper describes basic concepts and reviews state-of-the-art methods on the several research fields that complementarily support the creation of interactive information retrieval (IIR) systems. By revisiting ground concepts and gathering recent advances, this article also intends to foster new research activities on IIR by highlighting great challenges and promising directions. The aggregated knowledge provided here is intended to work as a comprehensive introduction to those interested in IIR development, while also providing important insights on the vast opportunities of novel research.
Article
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specifi c problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR.
Thesis
Full-text available
Information retrieval methods, especially considering multimedia data, have evolved to wards the integration of multiple sources of evidence in the analysis of the relevance of items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness. This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevance-diversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal learning-to-rank method trained with relevance feedback over diversified results. Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking is effective on improving result relevance and also boosts diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversity.
Conference Paper
Full-text available
In this chapter we present the main data structures and algorithms for searching large text collections. We emphasize inverted files, the most used index, but also review suffix arrays, which are useful in a number of specialized applications. We also cover parallel and distributed implementations of these two structures. As an example, we show how mechanisms based upon inverted files can be used to index and search the Web.
Article
Full-text available
Combining multiple information retrieval (IR) systems has been shown to improve performance over individual systems. However, it remains a challenging problem to determine when and how a set of individual systems should to be combined. In this paper, we investigate these issues using combinatorial fusion analysis and five data sets provide by TREC 2, 3, 4, 5, and 6. In particular, we compare the performance of combining six IR systems selected by random choice vs. by performance measurement from these five TREC data sets. Two experiments are conducted, which include: (1) combination of two systems and their performance outcome in terms of performance ratio and cognitive diversity, and (2) combinatorial fusion of t-systems, t = 2 to 6, using both score and rank combinations and exploration of the effect of diversity on the performance outcome. It is demonstrated in both experiments that combination of two or more systems improves the performance more significantly when the systems are selected by performance evaluation than those selected by random choice. Our work provides a distinctive method of system selection for the combination of multiple retrieval systems.
Article
Music Information Retrieval (MIR) evaluation has traditionally focused on system-centered approaches where components of MIR systems are evaluated against predefined data sets and golden answers (i.e., ground truth). There are two major limitations of such system-centered evaluation approaches: (a) The evaluation focuses on subtasks in music information retrieval, but not on entire systems and (b) users and their interactions with MIR systems are largely excluded. This article describes the first implementation of a holistic user-experience evaluation in MIR, the MIREX Grand Challenge, where complete MIR systems are evaluated, with user experience being the single overarching goal. It is the first time that complete MIR systems have been evaluated with end users in a realistic scenario. We present the design of the evaluation task, the evaluation criteria and a novel evaluation interface, and the data-collection platform. This is followed by an analysis of the results, reflection on the experience and lessons learned, and plans for future directions.
Article
We compared the information retrieval performances of some popular search engines (namely, Google, Yahoo, AlltheWeb, Gigablast, Zworks and Alta Vista and Bing/MSN) in response to a list of ten queries, varying in complexity. These queries were run on each search engine and the precision and response time of the retrieved results were recorded. The first ten documents on each retrieval output were evaluated as being 'relevant' or 'non-relevant' for evaluation of the search engine's precision. To evaluate response time, normalised recall ratios were calculated at various cut-off points for each query and search engine. This study shows that Google appears to be the best search engine in terms of both average precision (70%) and average response time (2 s). Gigablast and AlltheWeb performed the worst overall in this study.
Article
This paper describes a methodology for end-to-end evaluation of Arabic document image processing software. The methodology can be easily tailored to other languages, and to other document formats (e.g., audio and video). Real-world documents often involve complexities such as multiple languages, handwriting, logos, signatures, pictures, and noise introduced by document aging, reproduction, or exposure to environment factors. Information retrieval systems that implement algorithms to account for such factors are maturing. The proposed methodology is vital for measuring system performance and comparing relative merits.
Article
Ranking in information retrieval has been traditionally approached as a pursuit of relevant information, under the assumption that the users' information needs are unambiguously conveyed by their submitted queries. Nevertheless, as an inherently limited representation of a more complex information need, every query can arguably be considered ambiguous to some extent. In order to tackle query ambiguity, search result diversification approaches have recently been proposed to produce rankings aimed to satisfy the multiple possible information needs underlying a query. In this survey, we review the published literature on search result diversification. In particular, we discuss the motivations for diversifying the search results for an ambiguous query and provide a formal definition of the search result diversification problem. In addition, we describe the most successful approaches in the literature for producing and evaluating diversity in multiple search domains. Finally, we also discuss recent advances as well as open research directions in the field of search result diversification.
Article
In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.
Report on the Need for and Provision of an "Ideal
  • Sparck Jones
  • K Van Rijsbergen
Sparck Jones K. and Van Rijsbergen C. (1975). Report on the Need for and Provision of an "Ideal" Information Retrieval Test Collection,B ritish Library Research and Development Report 5266, Computer Laboratory,U niversity of Cambridge. 1993.
A Study of the Overlap among Docu- ment Representations
  • P Gupta
Gupta P.( 1982). A Study of the Overlap among Docu- ment Representations. Information Technology: Research and Development,1(2), 261-274.
The First TextR Etrieval Conference (TREC-1).N ational Institute of Standards and Technology Special Publication
  • D Harman
Harman D. (1993) (Ed.).The First TextR Etrieval Conference (TREC-1).N ational Institute of Standards and Technology Special Publication 500-207,