Conference Paper

Spelling Correction for Search Engine Queries.

DOI: 10.1007/978-3-540-30228-5_33 Conference: Advances in Natural Language Processing, 4th International Conference, EsTAL 2004, Alicante, Spain, October 20-22, 2004, Proceedings
Source: DBLP

ABSTRACT Search engines have become the primary means of accessing informa- tion on the Web. However, recent studies show misspelled words are very com- mon in queries to these systems. When users misspell query, the results are incor- rect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search en- gine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called “araSearch”. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically. © 2009 Wiley Periodicals, Inc.
    Journal of the American Society for Information Science and Technology 01/2009; 60(7):1448-1465. · 2.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine ( Existing research in figure search treats each figure equally, but we introduce a novel concept of "figure ranking": figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery. We empirically validated the hypothesis of figure ranking with over 100 bioscience researchers, and then developed unsupervised natural language processing (NLP) approaches to automatically rank figures. Evaluating on a collection of 202 full-text articles in which authors have ranked the figures based on importance, our best system achieved a weighted error rate of 0.2, which is significantly better than several other baseline systems we explored. We further explored a user interfacing application in which we built novel user interfaces (UIs) incorporating figure ranking, allowing bioscience researchers to efficiently access important figures. Our evaluation results show that 92% of the bioscience researchers prefer as the top two choices the user interfaces in which the most important figures are enlarged. With our automatic figure ranking NLP system, bioscience researchers preferred the UIs in which the most important figures were predicted by our NLP system than the UIs in which the most important figures were randomly assigned. In addition, our results show that there was no statistical difference in bioscience researchers' preference in the UIs generated by automatic figure ranking and UIs by human ranking annotation. The evaluation results conclude that automatic figure ranking and user interfacing as we reported in this study can be fully implemented in online publishing. The novel user interface integrated with the automatic figure ranking system provides a more efficient and robust way to access scientific information in the biomedical domain, which will further enhance our existing figure search engine to better facilitate accessing figures of interest for bioscientists.
    PLoS ONE 01/2010; 5(10):e12983. · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interaction faults caused by a flawed external system designed by a third party are a major issue faced by interconnected systems. In this paper, we define a scenario where this type of problem occurs, and describe some fault cases observed in real systems. We also discuss the most important challenges faced in this scenario, focusing on error detection. The problem is divided in several sub-problems, some of which can be addressed by traditional or simple techniques, and some of which are complex problems by themselves. The purpose of this paper is not to present ad hoc solutions to specific sub-problems, but to introduce a new scenario and give general approaches to address each sub-problem. That includes a detailed insight on important concepts, such as implicit redundancies. With this, we lay down the foundations for a wide range of future work.
    Service Availability, 5th International Service Availability Symposium, ISAS 2008, Tokyo, Japan, May 19-21, 2008, Proceedings; 01/2008

Full-text (2 Sources)

Available from
May 22, 2014