A Technique for Computer Detection and Correction of Spelling Errors

Communications of the ACM (Impact Factor: 3.62). 03/1964; 7(3):171-176. DOI: 10.1145/363958.363994
Source: DBLP


The method described assumes that a word which cannot be found in a dictionary has at most one error, which might be a wrong, missing or extra letter or a single transposition. The unidentified input word is compared to the dictionary again, testing each time to see if the words match—assuming one of these errors occurred. During a test run on garbled text, correct identifications were made for over 95 percent of these error types.

376 Reads
  • Source
    • "The language may also interfere in the order in which words are misspelled [van Berkel and Smedt 1988]. Alternatively, we can classify errors as (1) non-word errors, (which fit into one of Damerau's categories [Damerau 1964]), and (2) real-word errors [Kukich 1992], as previously explained. "
    [Show description] [Hide description]
    DESCRIPTION: This technical report presents an OO Writer module for rearranging the spelling suggestion list in Brazilian Portuguese. To do so, the module relies on some statistics collected from texts typed in this language. As it turned out, a comparison between the lists generated by the newly added module and the ones originally suggested by Open Office Writer showed an improvement regarding the order in which suggestions are presented to the user.
  • Source
    • "Detecting and correcting spelling errors is one of the problems that intrigued NLP researchers from an early stage. Damerau (1964) was among the first researchers to address this issue. He developed a rule-based string-matching technique for error correction, based on four edit operations (substitution, insertion, deletion, and transposition), but his work was limited by memory and computation constraints at the time. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
    Natural Language Engineering 02/2015; DOI:10.1017/S1351324915000030 · 0.64 Impact Factor
  • Source
    • "For each visitor we computed a sequence of more than 400 timesteps and nominal information about the presentation topic of equal length (see Section 3). To compare nominal data values, we apply the Levenshtein distance [7] [18]. It defines the distance between two sequences as the number of insert, edit, and delete operations necessary to transform one sequence into the other. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of persons’ indoor movement and behavior patterns can be of great value. Such an analysis enables managers and organizers in understanding the needs of customers and visitors. Event planning for exhibitions, festivals, and conferences, but also optimization of malls and stores can benefit from recorded visitor data. To show the advantage of visual analysis of movement information, we apply a new visual approach to a large indoor dataset, recorded at the re:publica conference in 2013. We present three different interactive visualization methods to reveal patterns, to deduce behavior from participants’ movements, and to show transitions between sessions and topics. For this, we apply a spectral hierarchical clustering approach and visualize results in a pixel based scarf plot. Additionally, we introduce a prediction model and visualization which serves as a monitoring tool for visitor attraction and distribution and helps to prevent bottleneck situations. We evaluate our approach by showing its applicability in a case study and validate our model on ground truth data.
    Hawaii International Conference on System Sciences (HICSS-47), Kauai; 01/2015
Show more

Preview (4 Sources)

376 Reads
Available from