ArticlePDF Available

Enhancing Neural Confidence-Based Segmentation For Cursive Handwriting Recognition

Authors:

Abstract

This paper proposes some directions for enhancing a neural network-based technique for automatically segmenting cursive handwriting. The technique fuses confidence values obtained from left and center character recognition outputs in addition to a Segmentation Point Validation output. Specifically, this paper describes the use of a recently proposed feature extraction technique (Modified Direction Feature) for representing segmentation points and characters to enhance the overall segmentation process. Promising results are presented for Segmentation Point Validation and cursive character recognition on a benchmark dataset. In addition, a new methodology for detecting segmentation paths is presented and evaluated for extracting characters from cursive handwriting. Yes Yes
A preview of the PDF is not available
... Cheng et al. [35] employ cursive handwritten word images segmenter by means of previous work that is feature-based heuristic segmenter algorithm with segmentation point validation (SPV) [5], validated LC and CC are also implemented [34]. Finally, modified direction feature (MDF) is added to enhance entire segmentation process [36]. ...
... Cheng and Blumenstein enhance heuristic segmenter (EHS) to improve feature-based heuristic segmentation (FHS) algorithm by adding ligature detection [35]. Actually, FHS failed to determine segmentation point on overlapped characters. ...
... Character extraction algorithm[33] Fig.3 Word sample sections and segmentation path generation[35] of training words. Finally Viterbi's algorithm used to determine qualified ligature. ...
Article
Pattern recognition is classification process that attempts to assign each input value to one of a given set of classes. The process of pattern recognition in the state of art has been achieved either by training of artificially intelligent tools or using heuristic rule based approaches. The objective of this paper is to provide a comparative study between artificially trained and heuristics rule based techniques employed for pattern recognition in the state of the art focused on script pattern recognition. It is observed that mainly there are two categories of script pattern recognition techniques. First category involves assistance of artificial intelligent learning and next, is based on heuristic-rules for cursive script pattern segmentation/recognition. Accordingly, a detailed critical study is performed that focuses on size of training/testing data and implication of artificial learning on script pattern recognition accuracy. Moreover, the techniques are described in details that are employed to identify character patterns. Finally, performances of different techniques on benchmark database are compared regarding pattern recognition accuracy, error rate, single or multiple classifiers being employed. Problems that still persist are also highlighted and possible directions are set.
... To overcome these limitations, researchers have employed artificial neural networks, hidden Morkov models and statistical classifiers to enhance segmentation accuracy [16][17][18][19][20][21]. Consequently, complex features are employed, which raise issues of computational complexity and huge memory usage [36,37]. ...
... In the same way, Verma [35] claim 84.87% segmentation accuracy for 300 CEDAR words. Similarly, Cheng et al. [36] acquire 95.27% segmentation rate from 317 CEDAR words. ...
Article
Full-text available
This paper presents a new, simple and fast approach for character segmen-tation of unconstrained handwritten words. The proposed approach first seeks the possible character boundaries based on characters geometric features analysis. However, due to inherited ambiguity and a lack of context, few characters are over-segmented. To increase the efficiency of the proposed approach, an Artificial Neural Network is trained with sig-nificant number of valid segmentation points for cursive handwritten words. Trained neural network extracts incorrect segmented points efficiently with high speed. For fair comparison, benchmark database CEDAR is used. The experimental results are promis-ing from complexity and accuracy points of view.
... Nevertheless, the approach was not suitable for Latin characters since they work for Farsi/Arabic characters and dealt with upper and lower contour only. Interestingly, Cheng, et al., [10] deal with distance to upper contour from baseline to enhancing the heuristic segmenter of [11]. Additionally, upper contour of string [12] with respect to ratio of width and height features are taken to solve segmentation of touching characters problem in printed character of mathematical expressions [13]. ...
... Even though character segmentation in printed character has been accepted, segmentation on handwriting character is the most difficult problem in the domain of segmentation. In this regard, few researchers integrated intelligent techniques to achieve accurate segmentation [10,1415161718192021. Verma [22] proposed robust ANN validation by fusion three confidence values. ...
Article
Full-text available
This paper presents a robust algorithm to identify the letter boundaries in images of unconstrained handwritten word. The proposed algorithm is based on vertical contour analysis. Proposed algorithm is performed to generate pre-segmentation by analyzing the vertical contours from right to left. The unwanted segmentation points are reduced using neural network validation to improve accuracy of segmentation. The neural network is utilized to validate segmentation points. The experiments are performed on the IAM benchmark database. The results are showing that the proposed algorithm capable to accurately locating the letter boundaries for unconstrained handwritten words.
... The improved segmentation algorithm is examined on test set of CEDAR database. Latter, Cheng and Blumenstein (2005a) improve their own previous work (Cheng et al. 2004; Blumenstein 2005b) and propose enhanced heuristic segmenter (EHS) to improve segmentation of cursive handwriting. In the first step, enhanced heuristic segmenter makes use of two enhanced features: ligature detection and neural assistance to locate prospective segmentation points. ...
Article
Neural network are most popular in the research community due to its generalization abilities. Additionally, it has been successfully implemented in biometrics, features selection, object tracking, document image preprocessing and classification. This paper specifically, clusters, summarize, interpret and evaluate neural networks in document Image preprocessing. The importance of the learning algorithms in neural networks training and testing for preprocessing is also highlighted. Finally, a critical analysis on the reviewed approaches and the future research guidelines in the field are suggested.
... The improved segmentation algorithm is examined on test set of CEDAR database. Latter, Cheng and Blumenstein (2005a) improve their own previous work (Cheng et al. 2004;Cheng and Blumenstein 2005b) and propose enhanced heuristic segmenter (EHS) to improve segmentation of cursive handwriting. In the first step, enhanced heuristic segmenter makes use of two enhanced features: ligature detection and neural assistance to locate prospective segmentation points. ...
Article
This paper presents detailed review in the field of off-line cursive script recognition. Various methods are analyzed that have been proposed to realize the core of script recognition in a word recognition system. These methods are discussed in view of the two most important properties of such systems: size and nature of the lexicon involved and whether or not a segmentation stage is present. Script recognition techniques are classified into three categories: firstly, segmentation-free methods or holistic approaches, that compare a sequence of observations derived from whole word image with similar references of words in the small lexicon. Secondly, segmentation-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word similar to human-like reading technique, in which secure features found all over the word are used to boot-strap a few candidates for a final evaluation phase; thirdly, hybrid approaches. Additionally, different feature extraction techniques are elaborated in conjunction with the classification process. In this scenario, implications of single and multiple classifiers are also observed. Finally, remaining problems are highlighted along with possible suggestion and strategies to solve them. KeywordsScript recognition–Character segmentation–Character recognition–Feature extraction–Holistic approaches
Article
Full-text available
Writer identification is a wide-spreading biometric which can be used as a legitimate mean to identify an individual. It facilitates the experts to automatically identify the person in many security concerns applications such as forensic science. Due to this, much attention has been drawn in this field from the last few decades. On the basis of input text, it can have various forms like online, offline, text-dependent or text-independent writer identification. The paper will present a systematic study on text-dependent and text-independent writer identification of handwritten text images for various Indic and non-Indic scripts. The various segmentation techniques used to segment handwritten text are also presented in detail. The various datasets available for researchers are given for various scripts such as English, Arabic, Chinese, Japanese, Dutch, Farsi, Devanagari, Bangla, and Kannada discussed by doing exhaustive analysis of various studies. We hope that our research will be helpful in giving better understanding of the area and provides various directions for further research.
Chapter
Memetic algorithms (MAs) are originally optimization algorithms with separate individual improvement, and they tend to fully exploit the problem area under consideration. But just like human brain, the recognition time tends to increase with increasing size of population. This paper aims to provide a logical solution using cultural evolution and local learning feature of MA. By introducing best bound population (BBP) from available set of population size, it is possible to keep recognition time in acceptable limits. The best bound population can be continuously upgraded using local search. The paper also revisits some popular techniques of character recognition using traditional approach and using genetic approach. Finally, all techniques are compared for error percentage and recognition time. The relative comparison with figures is presented to justify the findings.
Conference Paper
Memetic algorithms (MAs) are basically optimization algorithms which fully exploit the problem under consideration. This paper describes the character recognition problem using traditional approach, genetic algorithm approach and memetic algorithm approach. It also describes the basic architecture of MA and elaborates the memetic algorithm based approach to character recognition. The comparison with traditional approach and genetic algorithm approach shows that MA remarkably reduces the error rate. This paper is useful for the beginners who apply nature based computing in character recognition.
Book
The objective of Document Analysis and Recognition (DAR) is to recognize the text and graphical components of a document and to extract information. This book is a collection of research papers and state-of-the-art reviews by leading researchers all over the world including pointers to challenges and opportunities for future research directions. The main goals of the book are identification of good practices for the use of learning strategies in DAR, identification of DAR tasks more appropriate for these techniques, and highlighting new learning algorithms that may be successfully applied to DAR.
Chapter
The segmentation of cursive and mixed scripts persists to be a difficult problem in the area of handwriting recognition. This research details advances for segmenting characters in off-line cursive script. Specifically, a heuristic algorithm and a neural network-based technique, which uses a structural feature vector representation, are proposed and combined for identifying incorrect segmentation points. Following the location of appropriate anchorage points, a character extraction technique, using segmentation paths, is employed to complete the segmentation process. Results are presented for neural-based heuristic segmentation, segmentation point validation, character recognition, segmentation path detection and overall segmentation accuracy.
Article
Full-text available
Introduction Segmentation is the operation that seeks to decompose a word image in a sequence of subimages containing isolated characters. Segmentation is a critical phase of the single word recognition process, and this is witnessed by the higher performance for the recognition of isolated characters vs. that obtained for cursive words. There are two main strategies for segmentation [1]. Straight segmentation [2,3] tries to decompose the image in a set of subimages, each one corresponding to a character. In segmentation-recognition strategies [4-7] the image is subdivided in a set of subimages (strokes) whose combinations are used to generate character candidates. The number of subimages is greater than the number of characters and the process is referred to also as oversegmentation. Recognition is then used to select the correct character hypothesis from character candidates. The quality of the oversegmentation process depends on the tradeoff between the number of missed det
Article
Handprinted characters can be made more uniform in appearance than the as-written version if an appropriate linear transformation is performed on each input pattern. The transformation can be implemented electronically by programming a flying-spot raster-scanner to scan at specified angles rather than only along specified axes. Alternatively, curve-follower normalization can be achieved by transforming the coordinate waveforms in a linear combining network. Second-order moments of the pattern are convenient properties to use in specifying the transformation. By mapping the original pattern into one having a scalar moment matrix all linear pattern variations can be removed. Comparison experiments with three sets of handprinted numerals showed that error rates were reduced by integral factors if the patterns were normalized before scanning for recognition.
Article
Segmentation of cursive words into letters has been one of the major problems in handwriting recognition. We introduce a new segmentation algorithm, guided in part by the global characteristics of the handwriting. We find the successive segmentation points by evaluating a cost function at each point along the baseline. The cost of segmenting at a point is a weighted sum of four feature values at that point. The weights of the features are determined using linear programming.In our tests with 750 words written by 10 writers, 97% of the letter boundaries were correctly located.
Article
Knowledge concerning the structure of some prominent English letters, as well as the structural characteristics between background regions and character components is investigated as a novel approach to cursive script segmentation. First, connected components consisting of more than one character are split into sub-components based on their face-up or face-down background regions. Secondly, the over-segmented sub-components are merged into characters according to the knowledge of character structures and their joining characteristics. The algorithm achieved around 80% correct segmentation on a difficult database of unconstrained handwritten words.
Article
This paper is the second part of a review series on the character segmentation techniques. In this paper, we present an overview on the most important techniques used in segmenting characters from handwritten words. It is well-recognized that it is difficult to segment individual characters from handwritten words without the support from recogniton and context analysis. One common characteristic of all the existing handwritten word recogniton algorithms is that the character segmentation process is closely coupled with the recognition process. This review consists of three major portions, handprinted word segmentation, handwritten numeral segmentation and cursive word segmentation. Every algorithm discussed in the paper is accompanied with a flow chart to give a clear grasp of the algorithm. One section summarizes the terms and measurements commonly used in handwritten character segmentation. The bibliography contains a comprehensive list of work in handwritten character segmentation and recognition.
Conference Paper
This paper describes a neural network-based technique for cursive character recognition applicable to segmentation-based word recognition systems. The proposed research builds on a novel feature extraction technique that extracts direction information from the structure of character contours. This principal is extended so that the direction information is integrated with a technique for detecting transitions between background and foreground pixels in the character image. The proposed technique is compared with the standard direction feature extraction technique, providing promising results using segmented characters from the CEDAR benchmark database.
Conference Paper
High accuracy character recognition techniques can provide useful information for segmentation-based handwritten word recognition systems. This research describes neural network-based techniques for segmented character recognition that may be applied to the segmentation and recognition components of an off-line handwritten word recognition system. Two neural architectures along with two different feature extraction techniques were investigated. A novel technique for character feature extraction is discussed and compared with others in the literature. Recognition results above 80% are reported using characters automatically segmented from the CEDAR benchmark database as well as standard CEDAR alphanumerics.
Conference Paper
An algorithm for segmenting unconstrained printed and cursive words is proposed. The algorithm initially oversegments handwritten word images (for training and testing) using heuristics and feature detection. An artificial neural network (ANN) is then trained with global features extracted from segmentation points found in words designated for training. Segmentation points located in “test” word images are subsequently extracted and verified using the trained ANN. Two major sets of experiments were conducted, resulting in segmentation accuracies of 75.06% and 76.52%. The handwritten words used for experimentation were taken from the CEDAR CD-ROM. The results obtained for segmentation can easily be used for comparison with other researchers using the same benchmark database