A Technique for Computer Detection and Correction of Spelling Errors

Communications of the ACM (Impact Factor: 3.62). 03/1964; 7(3):171-176. DOI: 10.1145/363958.363994
Source: DBLP


The method described assumes that a word which cannot be found in a dictionary has at most one error, which might be a wrong, missing or extra letter or a single transposition. The unidentified input word is compared to the dictionary again, testing each time to see if the words match—assuming one of these errors occurred. During a test run on garbled text, correct identifications were made for over 95 percent of these error types.

Full-text preview

Available from:
  • Source
    • "To identify explicit noise, we embed a misspelling correction module in our unsupervised language model-based biomedical term detection method. For temporality and other types of named entities, we set up seed patterns and run our own bootstrapping method: It detects variants of the seed patterns in the data using Damerau-Levenshtein distance [Damerau 1964]. To identify implicit noise, we use more detailed NLP method employing syntactic analysis and filter out untrustworthy information. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We explore methods for effectively extracting information from clinical narratives that are captured in a public health consulting phone service called HealthLink. Our research investigates the application of stateof- the-art natural language processing and machine learning to clinical narratives to extract information of interest. The currently available data consist of dialogues constructed by nurses while consulting patients by phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise. When we extract the patient-related information from the noisy data, we have to remove or correct at least two kinds of noise: explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, and variants of terms, and implicit noise, which includes non-patient information and patient's untrustworthy information. To filter explicit noise, we propose our own biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms, temperature, and other types of named entities (which show patients' personal information such as age and sex), we propose a bootstrapping-based pattern learning process to detect a variety of arbitrary variations of named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information, and we visualize the named entities by constructing a graph that shows the relations between named entities. The objective of this knowledge discovery task is to identify associations between biomedical terms and to clearly expose the trends of patients' symptoms and concern; the experimental results show that we achieve reasonable performance with our noise reduction methods.
    ACM Transactions on Intelligent Systems and Technology 07/2015; 6(4):1-23. DOI:10.1145/2651444 · 1.25 Impact Factor
  • Source
    • "The language may also interfere in the order in which words are misspelled [van Berkel and Smedt 1988]. Alternatively, we can classify errors as (1) non-word errors, (which fit into one of Damerau's categories [Damerau 1964]), and (2) real-word errors [Kukich 1992], as previously explained. "
    [Show description] [Hide description]
    DESCRIPTION: This technical report presents an OO Writer module for rearranging the spelling suggestion list in Brazilian Portuguese. To do so, the module relies on some statistics collected from texts typed in this language. As it turned out, a comparison between the lists generated by the newly added module and the ones originally suggested by Open Office Writer showed an improvement regarding the order in which suggestions are presented to the user.
  • Source
    • "Detecting and correcting spelling errors is one of the problems that intrigued NLP researchers from an early stage. Damerau (1964) was among the first researchers to address this issue. He developed a rule-based string-matching technique for error correction, based on four edit operations (substitution, insertion, deletion, and transposition), but his work was limited by memory and computation constraints at the time. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
    Natural Language Engineering 02/2015; DOI:10.1017/S1351324915000030 · 0.64 Impact Factor
Show more