Conference Paper

Incorporating Document Keyphrases in Search Results.

Conference: 10th Americas Conference on Information Systems, AMCIS 2004, New York, NY, USA, August 6-8, 2004
Source: DBLP

ABSTRACT Effectiveness and efficiency of searching and returned results presentation is the key to a search engine. Before downloading and examining the document text, users usually first judge the relevance of a return hit to the query by looking at document metadata presented in the return result. However, the metadata coming with the return hit is usually not rich enough for users to predict the content of the document. Keyphrases provide a concise summary of a document's content, offering subject metadata characterizing and summarizing document. In this paper, we propose a mechanism of enriching the metadata of the return results by incorporating automatically extracted document keyphrases in each return hit. By looking at the keyphrases in each return hit, the user can predict the content of the document more easily, quickly, and accurately. The experimental results show that our solution may save users time up to 32% and users would like to use our proposed search interface with document keyphrases as part of the metadata of a return hit.


Available from: Yi-Fang Wu, Jun 12, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated medical concept recognition is important for medical informatics such as medical document retrieval and text mining research. In this paper, we present a software tool called keyphrase identification program (KIP) for identifying topical concepts from medical documents. KIP combines two functions: noun phrase extraction and keyphrase identification. The former automatically extracts noun phrases from medical literature as keyphrase candidates. The latter assigns weights to extracted noun phrases for a medical document based on how important they are to that document and how domain specific they are in the medical domain. The experimental results show that our noun phrase extractor is effective in identifying noun phrases from medical documents, so is the keyphrase extractor in identifying important medical conceptual terms. They both performed better than the systems they were compared to.
    Journal of Biomedical Informatics 01/2007; 39(6):668-79. DOI:10.1016/j.jbi.2006.02.001 · 2.48 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Social media has long been a popular resource for sentiment analysis and data mining. In this paper, we learn to predict reader interest after article reading using social interaction content in social media. The abundant interaction content (e.g., reader feedback) aims to replace typically private reader profile and browse history. Our method involves estimating interest preferences with respect to article topics and identifying quality social content concerning informativity. During interest analysis, we combine and transform articles and their reader responses into PageRank word graph to balance author- and reader-end influence. Semantic features of words, such as their content sources (authors vs. readers), syntactic parts-of-speech, and degrees of references (i.e., significances) among authors and readers, are used to weight PageRank word graph. We present the prototype system, Interest Finder, that applies the method to reader interest prediction by calculating word interestingness scores. Two sets of evaluation show that traditional, local Page Rank can more accurately cover more span of reader interest with the help of topical interest preferences learned globally, word nodes' semantic information, and, most important of all, quality social interaction content such as reader feedback.
    IJCNP; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We introduce a method for learning to predict reader interest. In our approach, interest analysis bases on PageRank and social interaction content (e.g., reader feedback in social media). The method involves automatically estimating topical interest preferences and automatically determining the sentiments for social content. In interest prediction, difference content sources of articles and reader feedback representing readers' viewpoints are weighted accordingly and transformed into content-word weighted word graph. Then, PageRank suggests reader interest with the help of word interestingness scores. We present the prototype system, InterestFinder, that applies the method to interest analysis. Experimental evaluation shows that content source and content word weighting, and scores of interest preferences for words inferred across articles are quite helpful. And our system benefits more from subjective social interaction content than objective one and using none, in covering general readers' interest spans.
    IEEE EMRITE; 01/2013