Conference Paper

Spelling Correction for Search Engine Queries

DOI: 10.1007/978-3-540-30228-5_33 Conference: Advances in Natural Language Processing, 4th International Conference, EsTAL 2004, Alicante, Spain, October 20-22, 2004, Proceedings
Source: DBLP


Search engines have become the primary means of accessing informa- tion on the Web. However, recent studies show misspelled words are very com- mon in queries to these systems. When users misspell query, the results are incor- rect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search en- gine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure.

Download full-text


Available from: Mário J. Silva
  • Source
    • "This problem was overcome by employing thesaurus based spell checkers parsing the queries. Features such as spelling suggestion were incorporated in search systems[11]. In addition to the problem of misspellings , another major problem in keyword based natural language querying is the expression of the information need. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific dataseis play a crucial role in data-driven research. While, several search tools are developed for searching documents, blogs, images, videos and various other information needs, important scientific artifacts like research dataseis lack this prerogative. The main challenge faced in developing an effective search tool for dataseis is to determine the content representation of the raw data. Dataset description provided by users is often very content-specific and short. Moreover, even the public dataseis generally have very limited description about the various research problems/applications that used them. Given the ever expanding variety of dataseis on the web and the lack of representative content for the purpose of indexing, the task of developing an effective search engine for dataset is computationally very challenging. In this work, we propose a novel 'context' based paradigm of search for dataset to overcome the problem of limited representative content for research dataseis. In contrast to any general purpose search engine which index the 'little' text information about the dataset sources, we hypothesized that the proposed paradigm of 'context' based search is more effective for dataset search. The hypothesis is tested by conducting a user study. The performance of the context based search (DataGopher) is compared with a popular general purpose search engine. The study was conducted in a real world setting where user are free to use the search engine as per the information need. Based on the user study, we find that the performance of DataGopher was favored for 58% of the total context based user queries whereas the baseline was only 26%.
    Full-text · Article · Feb 2015
  • Source
    • "One advantage of using context-based approaches is that the computation time is lower (although the training is costly). However, such context-based approaches depend on proper contexts, which are not always available [47]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine ( to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures.
    Preview · Article · Jan 2011 · PLoS ONE
  • Source
    • "the formation a suitable graph of unprocessed text information [5]. The other common method, for Lexicon representation, is utilization of a tree based data structure [2]. Many researches have been done in order to model the error pattern and specifying its parameters. "
    [Show abstract] [Hide abstract]
    ABSTRACT: CloniZER spell checker is an adaptive, language independent and 'built-in error pattern free' spell checker tool which is based on 'Ternary Search Tree' data structure. It suggests the proper form of the misspelled words using nondeterministic traverse. In other words the problem of spell checking is addressed by traverse a tree with variable weighted edges. The proposed method learns media error pattern and improves its suggestions as time goes by. Instead of using expert knowledge for error pattern modelling, the proposed algorithm learns error pattern by interaction with user. 1
    Full-text · Article ·
Show more