Conference Paper

Keyphrase extraction-based query expansion in digital libraries.

DOI: 10.1145/1141753.1141800 Conference: ACM/IEEE Joint Conference on Digital Libraries, JCDL 2006, Chapel Hill, NC, USA, June 11-15, 2006, Proceedings
Source: DBLP

ABSTRACT In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into Disjunctive Normal Form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75% without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.
    Knowledge and Information Systems 34(3). · 2.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respective feature sets. The proposed approach can be applied to any document, however, in order to know the effectiveness of the system for digital libraries, we have carried out the evaluation on a set of scientific documents, and compared our results with current keyphrase extraction systems.
    Digital Libraries - 6th Italian Research Conference, IRCDL 2010, Padua, Italy, January 28-29, 2010. Revised Selected Papers; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current digital libraries suffer from the information overload problem which prevents an effective access to knowledge. This is particularly true for scientic digital libraries where a growing amount of scientific articles can be explored by users with different needs, backgrounds, and interests. Recommender systems can tackle this limitation by filtering resources according to specific user needs. This paper introduces a content-based recommendation approach for enhancing the access to scientific digital libraries where a keyphrase extraction module is used to produce a rich description of both content of papers and user interests.
    Digital Libraries and Archives - 7th Italian Research Conference, IRCDL 2011, Pisa, Italy, January 20-21, 2011.; 01/2011

Full-text (2 Sources)

Available from
May 15, 2014