Keyphrase extraction-based query expansion in digital libraries

Conference Paper (PDF Available) · January 2006with33 Reads
DOI: 10.1145/1141753.1141800 · Source: DBLP
Conference: ACM/IEEE Joint Conference on Digital Libraries, JCDL 2006, Chapel Hill, NC, USA, June 11-15, 2006, Proceedings
Abstract
In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into Disjunctive Normal Form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision.

Figures

Figure
Figure
    • "Keyphrases are single words or phrases that provide a summary of a text (Tucker and Whittaker, 2009) and thus might improve searching (Song et al., 2006 ) in a large collection of texts. As manual extraction of keyphrases is a tedious task, a wide variety of keyphrase extraction approaches has been proposed. "
    Full-text · Conference Paper · Jun 2014 · ACM Computing Surveys
    • "There is a vast amount of NLP literature on keyphrase extraction (Kim et al., 2010;). The semantic data provided by key-phrase extraction can be used as metadata for refining NLP applications, such as summarization (D'Avanzo and Magnini, 2005; Lawrie et al., 2001), text ranking (Mihalcea and Tarau, 2004), indexing (Medelyan and Witten, 2006), query expansion (Song et al., 2006), or document management and topic search (). The closest work to ours is (Turney, 1999 ) because they highlight key-phrases in the text to facilitate its skimming. "
    Full-text · Conference Paper · Jan 2014 · ACM Computing Surveys
    • "For example, it is possible to index the user query and the top-ranked snippets by relation paths induced from parse trees, and then learn the most relevant paths to the query [Sun et al. 2006]. The syntactic approach may be most useful for natural language queries; to solve more general search tasks, the linguistic analysis can be more effectively integrated with statistical [Song et al. 2006] or taxonomic information [Liu et al. 2008]. "
    [Show abstract] [Hide abstract] ABSTRACT: The relative ineffectiveness of information retrieval systems is largely caused by the inaccuracy with which a query formed by a few keywords models the actual user information need. One well known method to overcome this limitation is automatic query expansion (AQE), whereby the user’s original query is augmented by new features with a similar meaning. AQE has a long history in the information retrieval community but it is only in the last years that it has reached a level of scientific and experimental maturity, especially in laboratory settings such as TREC. This survey presents a unified view of a large number of recent approaches to AQE that leverage various data sources and employ very different principles and techniques. The following questions are addressed. Why is query expansion so important to improve search effectiveness? What are the main steps involved in the design and implementation of an AQE component? What approaches to AQE are available and how do they compare? Which issues must still be resolved before AQE becomes a standard component of large operational information retrieval systems (e.g., search engines)?
    Full-text · Article · Jan 2012
Show more

People who read this publication also read