Article

A technique for computer detection and correction of spelling errors.

Communications of the ACM (Impact Factor: 2.86). 01/1964; 7:171-176. DOI: 10.1145/363958.363994
Source: DBLP

ABSTRACT The method described assumes that a word which cannot be found in a dictionary has at most one error, which might be a wrong, missing or extra letter or a single transposition. The unidentified input word is compared to the dictionary again, testing each time to see if the words match—assuming one of these errors occurred. During a test run on garbled text, correct identifications were made for over 95 percent of these error types.

16 Followers
 · 
706 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Distance functions are at the core of important data analysis and processing tools, e.g., PCA, classification, vector median filter, and mathematical morphology. Despite its key role, a distance function is often used without careful consideration of its underlying assumptions and mathematical construction. With the objective of identifying a suitable distance function for hyperspec-tral images so as to maintain the accuracy of hyperspectral image processing results, we compare existing distance functions and define a suitable set of selection criteria. Bearing in mind that the selection of distance functions is highly related to the actual definition of the spectrum, we also classify the existing distance functions based on how they inherently define a spectrum. Theoretical constraints and behavior, as well as numerical tests are proposed for the evaluation of distance functions. With regards to the evaluation criteria, Euclidean distance of cumulative spectrum (ECS) was found to be the most suitable distance function.
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 01/2015; 1. DOI:10.1109/JSTARS.2015.2403257 · 2.83 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The manipulation of XML based relational representations of biological systems (BioML for Bioscience Markup Language) is a big challenge in systems biology. The needs of biologists, like translational study of biological systems, cause their challenges to become grater due to the material received in next generation sequencing. Among these BioML’s, SBML is the de facto standard file format for the storage and exchange of quantitative computational models in systems biology, supported by more than 257 software packages to date. The SBML standard is used by several biological systems modeling tools and several databases for representation and knowledge sharing. Several sub systems are integrated in order to construct a complex bio system. The issue of combining biological sub-systems by merging SBML files has been addressed in several algorithms and tools. But it remains impossible to build an automatic merge system that implements reusability, flexibility, scalability and sharability. The technique existing algorithms use is name based component comparisons. This does not allow integration into Workflow Management System (WMS) to build pipelines and also does not include the mapping of quantitative data needed for a good analysis of the biological system. In this work, we present a deterministic merging algorithm that is consumable in a given WMS engine, and designed using a novel biological model similarity algorithm. This model merging system is designed with integration of four sub modules: SBMLChecker, SBMLAnot, SBMLCompare, and SBMLMerge, for model quality checking, annotation, comparison, and merging respectively. The tools are integrated into the BioExtract server leveraging iPlant collaborative resources to support users by allowing them to process large models and design work flows. These tools are also embedded into a user friendly online version SW4SBMLm.
    05/2014, Degree: MS, Supervisor: Etienne Z. Gnimpieba
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
    Natural Language Engineering 02/2015; · 0.46 Impact Factor

Preview (4 Sources)

Download
114 Downloads
Available from