Conference Paper

Incorporating Document Keyphrases in Search Results.

Conference: 10th Americas Conference on Information Systems, AMCIS 2004, New York, NY, USA, August 6-8, 2004
Source: DBLP


Effectiveness and efficiency of searching and returned results presentation is the key to a search engine. Before downloading and examining the document text, users usually first judge the relevance of a return hit to the query by looking at document metadata presented in the return result. However, the metadata coming with the return hit is usually not rich enough for users to predict the content of the document. Keyphrases provide a concise summary of a document's content, offering subject metadata characterizing and summarizing document. In this paper, we propose a mechanism of enriching the metadata of the return results by incorporating automatically extracted document keyphrases in each return hit. By looking at the keyphrases in each return hit, the user can predict the content of the document more easily, quickly, and accurately. The experimental results show that our solution may save users time up to 32% and users would like to use our proposed search interface with document keyphrases as part of the metadata of a return hit.

Download full-text


Available from: Yi-Fang Wu
  • Source
    • "Keyword extraction has been an area of active research and applied to NLP tasks such as document categorization (Manning and Schutze, 2000), indexing (Li et al., 2004), and text mining on social networking services ((Li et al., 2010); (Zhao et al., 2011); (Wu et al., 2010)). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce a method that extracts keywords in a language with the help of the other. In our approach, we bridge and fuse conventionally irrelevant word statistics in languages. The method involves estimating preferences for keywords w.r.t. domain topics and generating cross-lingual bridges for word statistics integration. At run-time, we transform parallel articles into word graphs, build cross-lingual edges, and exploit PageRank with word keyness information for keyword extraction. We present the system, BiKEA, that applies the method to keyword analysis. Experiments show that keyword extraction benefits from PageRank, globally learned keyword preferences, and cross-lingual word statistics interaction which respects language diversity.
    Full-text · Conference Paper · Jun 2014
  • Source
    • "It can be used for various tasks such as document summarization (Litvak and Last, 2008) and indexing (Li et al., 2004). While it appears natural to use keyphrases to summarize Twitter content, compared with traditional text collections, keyphrase extraction from Twitter is more challenging in at least two aspects: 1) Tweets are much shorter than traditional articles and not all tweets contain useful information ; 2) Topics tend to be more diverse in Twitter than in formal articles such as news reports. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Summarizing and analyzing Twitter content is an important and challenging task. In this paper, we propose to extract topical keyphrases as one way to summarize Twitter. We propose a context-sensitive topical PageRank method for keyword ranking and a probabilistic scoring function that considers both relevance and interestingness of keyphrases for keyphrase ranking. We evaluate our proposed methods on a large Twitter data set. Experiments show that these methods are very effective for topical keyphrase extraction.
    Full-text · Conference Paper · Jun 2011
  • Source
    • "Keyphrases are also very useful for digital libraries and Web search engines. In digital libraries, the keyphrases of a scientific paper can help users to get a rough sense of the paper [5], whereas in Web search the keyphrases of a web page can serve as metadata for indexing and retrieving web pages for user supplied queries [6]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Tag cloud, also known as word cloud, are very useful for quickly perceiving the most prominent terms embedded within a text collection to determine their relative prominence. The effectiveness of tag clouds to conceptualize a text corpus is directly proportional to the quality of the keyphrases extracted from the corpus. Although, authors provide a list of about five to ten keywords in scientific publications that are used to map them into their respective domain, due to exponential growth in non-scientific documents on the World Wide Web, an automatic mechanism is sought to identify keyphrases embedded within them for tag cloud generation. In this paper, we propose a web content mining technique to extract keyphrases from web documents for tag cloud generation. Instead of using partial or full parsing, the proposed method applies n-gram technique followed by various heuristics-based refinements to identify a set of lexical and semantic features from text documents. We propose a rich set of domain-independent features to model candidate keyphrases very effectively for establishing their keyphraseness using classification models. We also propose a font-determination function to determine the relative font-size of keyphrases for tag cloud generation. The efficacy of the proposed method is established through experimentation. The proposed method outperforms the popular keyphrase extraction system KEA.
    Full-text · Conference Paper · Jan 2011
Show more