If you want to read the PDF, try requesting it from the authors.

Abstract

As a general first step in a recognition system, preprocessing plays a very important role and can directly affect the recognition performance. This Chapter proposes a new preprocessing technique for online handwriting. The approach is to first remove the hooks of the strokes by using changed-angle threshold with length threshold, then filter the noise by using a smoothing technique, which is the combination of the Cubic Spline and the equal-interpolation methods. Finally, the handwriting is normalised. Section 2.1 introduces the problems and the related techniques of the preprocessing for online handwritten data. Section 2.2 describes our preprocessing approach for online handwritten data. The experimental results with discussions are showed in Section 2.3. The summary of this chapter is given in the last section.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Such devices, normally, sample the input in constant time intervals, thus, slow pen motion regions are over-sampled and fast motion regions are under-sampled. Further imperfections are caused by hand vibrations resulting from hesitant writing [37]. This lack of uniformity in the data, if not deliberately used by the classification system, should be reduced to avoid negative influence the classification system performance. ...
Thesis
Full-text available
Despite the long-standing belief that digital computers will challenge the future of handwriting, pen and paper remain commonly used means for communication and recording of information in daily life. In addition to the growing use of keyboard-less devices such as smart-phones and tablets, which are too small to have a convenient keyboard, handwriting recognition is receiving increasing attention in the last decades. Correct and efficient recognition of handwritten Arabic text is a challenging problem due to the cursive and unconstrained nature of the Arabic script. While real-time performance is necessary in applications involving on-line handwriting recognition, conventional approaches usually wait until the entire curve is traced out before starting the analysis, inevitably causing delays in the recognition process. This deferment prevents on-line recognition techniques from achieving high responsiveness demands expected from such systems, and from implementing advanced features of input typing, such as automatic word completion. This work presents a real-time approach for segmenting and recognizing handwritten on-line Arabic script. We demonstrate the feasibility of segmenting Arabic handwritten text during the course of writing. The proposed segmentation approach is a recognition-based method that operates on the stroke level and nominates candidate segmentation points based on morphological features. Using a fast Arabic character classifier, the system attaches a score to the sub-strokes induced by the candidate points, which captures the likelihood of the sub-stroke to represent a letter. A candidate filtering followed by a segmentation selection process are activated when the entire stroke is available. A nearest neighbours based character classifier that employs a linear-time embedding of the Earth Mover's Distance metric to a norm space is presented. The transformation of the feature space vectors into the wavelet coefficient space facilitates accurate similarity measurement and sub-linear search methods. We show that the resulting character segmentation and classification information can be used to significantly reduce the potential dictionary size and accelerate a holistic recognition process.
... The interpolation method does not overcome the hocking problem. To remove hocks from the start and end of the character, an iterative method based on Huang et al. [26] is used to check any sign change in slops between two adjacent lines at 5% of the points at the start and end of the character. ...
Article
Full-text available
In this paper a hidden Markov model and harmony search algorithms are combined for writer independent online Kurdish character recognition. The Markov model is integrated as an intermediate group classifier instead of a main character classifier/recognizer as in most of previous works. Markov model is used to classify each group of characters, according to their forms, into smaller sub groups based on common directional feature vector. This process reduced the processing time taken by the later recognition stage. The small number of candidate characters are then processed by harmony search recognizer. The harmony search recognizer uses a dominant and common movement pattern as a fitness function. The objective function is used to minimize the matching score according to the fitness function criteria and according to the least score for each segmented group of characters. Then, the system displays the generated word which has the lowest score from the generated character combinations. The system was tested on a dataset of 4500 words structured with 21,234 characters in different positions or forms (isolated, start, middle and end). The system scored 93.52% successful recognition rate with an average of 500ms. The system showed a high improvement in recognition rate when compared to similar systems that use HMM as its main recognizer.
... If so, one of them is removed. • Interpolating Points: To add any missing points by linear Interpolation [3]. • Smoothing: To eliminate hardware imperfections and trembles in writing each point is substituted with the weighted average of its neighboring points [1]. ...
Article
Full-text available
The percentage of people who produce a neat and clear handwriting is declining sharply. The traditional approach for handwriting teaching is to have a dedicated teacher for long hours of handwriting practice. Unfortunately, this is not feasible in many cases. In this paper we introduce an automated tool for teaching Arabic handwriting using tablet PCs and on-line handwriting recognition techniques. This tool can simulate the tasks performed by a human handwriting teacher of detecting the segments of hypothesized writing errors and producing instructive real time feedback to help the student to improve his handwriting quality. The tool consists of two main components, the guided writing component and the free writing component. In the guided writing mode the student is required to write over transparent images for the training examples to limit his hand movements. After the student acquires the basic skills of handwriting he can practice the free writing mode where he writes with his own style, as he usually does in his daily handwritings. The first version of the tool was tested in several schools for children with edge ranging 4-11. The results are promising and show that this tool can help students to analyze their own writing and understand how they can improve it.
Chapter
Full-text available
In this work, the approach for online recognition of 2D sequences using deep bidirectional LSTM was proposed. One of the complex cases of online sequence recognition is handwritten mathematical expressions (HME). In spite of many achievements in this area, it is a still challenging task as, in addition to character segmentation and recognition, the tasks of structure, relations, and grammar analysis should be resolved. Such a combination of recognizers could lead to an increase in computational complexity for large expressions, which is unacceptable for on-device recognition in mobile applications. As end-to-end neural systems do not achieve plausible accuracy and recognition speed for on-device calculations so far, to overcome this problem we proposed a deep-learning solution that employs recurrent neural networks (RNNs) for structure and character recognition in combination with re-ordering and modified CYK algorithm for expression construction. Also, we explored a variety of structural and optimization enhancements to CYK algorithm that significantly improved the performance in terms of the recognition speed while the recognition accuracy remained at the same level. The ablation study for the introduced optimization techniques demonstrated significant improvement of recognition speed keeping the recognition accuracy comparable with the existing state-of-the-art approaches.
Article
Full-text available
In this paper, we suggest a deep learning strategy for decision support, based on a greedy algorithm. Decision making support by artificial intelligence is of the most challenging trends in modern computer science. Currently various strategies exist and are increasingly improved in order to meet practical needs of user-oriented platforms like Microsoft, Google, Amazon, etc.
Article
This article comprehensively surveys Arabic Online Handwriting Recognition (AOHR). We address the challenges posed by online handwriting recognition, including ligatures, dots and diacritic problems, online/offline touching of text, and geometric variations. Then we present a general model of an AOHR system that incorporates the different phases of an AOHR system. We summarize the main AOHR databases and identify their uses and limitations. Preprocessing techniques that are used in AOHR, viz. normalization, smoothing, de-hooking, baseline identification, and delayed stroke processing, are presented with illustrative examples. We discuss different techniques for Arabic online handwriting segmentation at the character and morpheme levels and identify their limitations. Feature extraction techniques that are used in AOHR are discussed and their challenges identified. We address the classification techniques of non-cursive (characters and digits) and cursive Arabic online handwriting and analyze their applications. We discuss different classification techniques, viz. structural approaches, Support Vector Machine (SVM), Fuzzy SVM, Neural Networks, Hidden Markov Model, Genetic algorithms, decision trees, and rule-based systems, and analyze their performance. Post-processing techniques are also discussed. Several tables that summarize the surveyed publications are provided for ease of reference and comparison. We summarize the current limitations and difficulties of AOHR and future directions of research.
Article
The widely-used PDAs, touch screens, tablet-PCs are alternatives to keyboards with the advantages of being more friendly, easy, and natural. A framework for Arabic online character recognition is developed. The framework integrates the different phases of online Arabic text recognition. The used data poses several challenges such as delayed strokes handling, connectivity problems, variability, and style change of text. We process the delayed strokes at the different phases differently to improve the overall performance. This work includes feature extraction of many features, including several novel statistical features. Experimental results on challenging online Arabic characters show encouraging results. 2016
Conference Paper
We present an approach for on-line recognition of handwritten math symbols using adaptations of off-line features and synthetic data generation. We compare the performance of our approach using four different classification methods: AdaBoost. M1 with C4.5 decision trees, Random Forests and Support-Vector Machines with linear and Gaussian kernels. Despite the fact that timing information can be extracted from on-line data, our feature set is based on shape description for greater tolerance to variations of the drawing process. Our main datasets come from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 and 2013. Class representation bias in CROHME datasets is mitigated by generating samples for underrepresented classes using an elastic distortion model. Our results show that generation of synthetic data for underrepresented classes might lead to improvements of the average per-class accuracy. We also tested our system using the Math Brush dataset achieving a top-1 accuracy of 89.87% which is comparable with the best results of other recently published approaches on the same dataset.
Article
Handwriting recognition is the ability of a computer to understand handwritten inputs from users. Generally it includes preprocessing, feature extraction, and classifier training. In this paper, we will develop a handwriting digit recognition system by using Deep Boltzmann Machine (DBM) together with the Support Vector Machine (SVM). DBM is a deep learning technique to learn high level features from the training data, while SVM is a method to train non-linear classifiers from the learn features. Such a framework is a promising way to build up a powerful digit recognition system. Our experimental result shows that our system can achieve desired performance.
Article
Full-text available
Handwriting is an important modality for Human-Computer Interaction. For medical professionals, handwriting is (still) the preferred natural method of documentation. Handwriting recognition has long been a primary research area in Computer Science. With the tremendous ubiquity of smartphones, along with the renaissance of the stylus, handwriting recognition has become a new impetus. However, recognition rates are still not 100% perfect, and researchers still are constantly improving handwriting algorithms. In this paper we evaluate the performance of entropy based slant- and skew-correction, and compare the results to other methods. We selected 3700 words of 23 writers out of the Unipen-ICROW-03 benchmark set, which we annotated with their associated error angles by hand. Our results show that the entropy-based slant correction method outperforms a window based approach with an average precision of ±6·02 ∘ for the entropy-based method, compared with the ±7·85 ∘ for the alternative. On the other hand, the entropy-based skew correction yields a lower average precision of ±2·86 ∘ , compared with the average precision of ±2·13 ∘ for the alternative LSM based approach.
Article
Full-text available
This paper highlights a novel strategy for online Arabic text recognition using a hybrid Genetic Algorithm (GA) and Harmony Search algorithm (HS). The strategy is divided into two phases: text segmentation using dominant point detection, and recognition-based segmentation using GA and HS. At first, the pre-segmentation algorithm uses a modified dominant point detection algorithm to mark a minimal number of points which defines the text skeleton. The generated text skeleton from this process is expressed as directional vector, using 6-directional model, to minimize the effect of character body on segmentation process. Then, GA and HS algorithms are used as recognition-based segmentation phase for text and character recognition respectively. For the segmentation based recognition, binary GA is used to explore different combinations of segmentation points which gives the best score, while HS is integrated inside the GA segmentation to explore the best character score produced from matching the character with different characters stored in the database. In order to initially calibrate and test the system, a locally collected text dataset was used that contains 4500 Arabic words. The algorithm scored a 93.4% successful word recognition rate. Finally, the system was tested on the benchmark ADAB dataset 2 consist of 7851 Arabic words and it scored a successful recognition rate in the range of 94–96%.
Article
Online handwriting recognition of Arabic script is a difficult problem since it is naturally both cursive and unconstrained. The analysis of Arabic script is further complicated due to obligatory dots/stokes that are placed above or below most letters and usually are written delayed in order. This paper introduces a Hidden Markov Model (HMM) based system to provide solutions for most of the difficulties inherent in recognizing Arabic script. A preprocessing for the delayed strokes to match the structure of the HMM model is introduced. The used HMM models are trained with Writer Adaptive Training (WAT) to minimize the variance between writers in the training data. Also the models discrimination power is enhanced with Discriminative training. The system performance is evaluated using an international test set from the ADAB completion and shows a promising performance compared with the state-of-art systems.
Article
Noise in on-line hand written characters due to natural shaking of the hand and noise due to the process of digitization is inherent and this can lead to a degraded performance of character recognition system. In this paper, we propose a noise removal technique based on knotless spline. We first show that its noise removal property is independent of the amount of noise unlike the traditional Gaussian smoothing and further show that the noise removal can significantly enhance the performance of the character recognition algorithm. We specifically perform experiments with Devanagari data and show that the noise removal can enhance the performance of character recognition by as much as 10%.
Conference Paper
Full-text available
Arabic script presents a challenge complexity and variability for handwriting recognition. The first on line Arabic Database called ADAB is known as a standard benchmark in the ICDAR competition of 2009. This paper describes the Online Arabic handwriting recognition competition held at ICDAR 2011. 3 groups with 5 systems are participating in the competition. The systems were tested on known data (sets 1 to 4) and on two test datasets which are unknown to all participants (set 5 and set 6). The systems are compared on the most important characteristic of classification systems, the recognition rate. Additionally, the relative speed of every system was compared. A short description of the participating groups, their systems, the experimental setup, and the performed results are presented.
Article
Full-text available
Chinese handwriting identification has become a hot research inpattern recognition and image processing. In this paper, we presentoverview of relevant papers from the previous related studies until tothe recent publications regarding to the Chinese HandwritingIdentification. The strength, weaknesses, accurateness andcomparison of well known approaches are reviewed, summarizedand documented. This paper provides broad spectrum of patternrecognition technology in assisting writer identification tasks, whichare at the forefront of forensic and biometrics based on identificationapplication.
Article
Full-text available
this paper presents an auto-regressive network called the Auto-Regressive Multi-Context Recurrent Neural Network (ARMCRN), which forecasts the daily peak load for two large power plant systems. The auto-regressive network is a combination of both recurrent and non-recurrent networks. Weather component variables are the key elements in forecasting because any change in these variables affects the demand of energy load. So the AR-MCRN is used to learn the relationship between past, previous, and future exogenous and endogenous variables. Experimental results show that using the change in weather components and the change that occurred in past load as inputs to the AR-MCRN, rather than the basic weather parameters and past load itself as inputs to the same network, produce higher accuracy of predicted load. Experimental results also show that using exogenous and endogenous variables as inputs is better than using only the exogenous variables as inputs to the network.
Article
Full-text available
this paper presents a multi-context recurrent network for time series analysis. While simple recurrent network (SRN) are very popular among recurrent neural networks, they still have some shortcomings in terms of learning speed and accuracy that need to be addressed. To solve these problems, we proposed a multi-context recurrent network (MCRN) with three different learning algorithms. The performance of this network is evaluated on some real-world application such as handwriting recognition and energy load forecasting. We study the performance of this network and we compared it to a very well established SRN. The experimental results showed that MCRN is very efficient and very well suited to time series analysis and its applications.
Conference Paper
Full-text available
The National Archives of Singapore keeps a large number of double- sided handwritten archival documents. Over long periods of storage, ink sipped through the pages of these documents, resulting in interfering images of hand- writing coming from the back of the page. This paper addresses this problem of segmenting handwriting from both sides of a document by means of a wavelet approach. We first match both sides of a document page such that the interfer- ing strokes are mapped with the corresponding strokes originating from the re- verse side. This allows the identification of the foreground and interfering strokes. A wavelet reconstruction process then iteratively enhances the fore- ground strokes and smears the interfering strokes so as to strengthen the dis- criminating capability of an improved Canny edge detector against the interfer- ing strokes. Experimental results confirm the validity of the wavelet approach.
Article
Full-text available
Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered
Article
Full-text available
We present a general wavelet-based denoising scheme for functional magnetic resonance imaging (fMRI) data and compare it to Gaussian smoothing, the traditional denoising method used in fMRI analysis. One-dimensional WaveLab thresholding routines were adapted to two-dimensional (2-D) images, and applied to 2-D wavelet coefficients. To test the effect of these methods on the signal-to-noise ratio (SNR), we compared the SNR of 2-D fMRI images before and after denoising, using both Gaussian smoothing and wavelet-based methods. We simulated a fMRI series with a time signal in an active spot, and tested the methods on noisy copies of it. The denoising methods were evaluated in two ways: by the average temporal SNR inside the original activated spot, and by the shape of the spot detected by thresholding the temporal SNR maps. Denoising methods that introduce much smoothness are better suited for low SNRs, but for images of reasonable quality they are not preferable, because they introduce heavy deformations. Wavelet-based denoising methods that introduce less smoothing preserve the sharpness of the images and retain the original shapes of active regions. We also performed statistical parametric mapping on the denoised simulated time series, as well as on a real fMRI data set. False discovery rate control was used to correct for multiple comparisons. The results show that the methods that produce smooth images introduce more false positives. The less smoothing wavelet-based methods, although generating more false negatives, produce a smaller total number of errors than Gaussian smoothing or wavelet-based methods with a large smoothing effect.
Conference Paper
Full-text available
In this paper, principal component analysis (PCA) is applied to the problem of online handwritten character recognition in the Tamil script. The input is a temporally ordered sequence of (x,y) pen coordinates corresponding to an isolated character obtained from a digitizer. The input is converted into a feature vector of constant dimensions following smoothing and normalization. PCA is used to find the basis vectors of each class subspace and the orthogonal distance to the subspace is used for classification. Pre-clustering of the training data and modification of distance measure are explored to overcome some common problems in the traditional subspace method. In empirical evaluation, these PCA -based classification schemes are found to compare favorably with nearest neighbour classification.
Conference Paper
Full-text available
We report the status of the UNIPEN project of data exchange and recognizer benchmarks started two years ago at the initiative of the International Association of Pattern Recognition (Technical Committee 11). The purpose of the project is to propose and implement solutions to the growing need of handwriting samples for online handwriting recognizers used by pen-based computers. Researchers from several companies and universities have agreed on a data format, a platform of data exchange and a protocol for recognizer benchmarks. The online handwriting data of concern may include handprint and cursive from various alphabets (including Latin and Chinese), signatures and pen gestures. These data will be compiled and distributed by the Linguistic Data Consortium. The benchmarks will be arbitrated the US National Institute of Standards and Technologies. We give a brief introduction to the UNIPEN format. We explain the protocol of data exchange and benchmarks
Article
Full-text available
In this correspondence, a digital filter that allows the computation of a smoothing cubic spline for equispaced data with a constant control parameter is proposed. Filters to compute its first and second derivatives are also presented. Derived from the classical matrix solution, these filters offer an efficient way to calculate smoothed data and its derivatives, especially when the length of data is long. Moreover, these filters have been found to possess several interesting properties. For instance, the smoothing filter is a low-pass filter with the maximum flatness property. In addition, a useful relation between the filter bandwidth and the control parameter is established, which can be used for its optimal choice in practice. The proposed filters can easily be implemented either with a recursive structure for off-line processing or with a nonrecursive implementation for real-time processing
Article
Full-text available
This survey describes the state of the art of online handwriting recognition during a period of renewed activity in the field. It is based on an extensive review of the literature, including journal articles, conference proceedings, and patents. Online versus offline recognition, digitizer technology, and handwriting properties and recognition problems are discussed. Shape recognition algorithms, preprocessing and postprocessing techniques, experimental systems, and commercial products are examined
Article
Functional data analysis techniques are used to analyze a sample of handwriting in Chinese. The goals are (a) to identify a differential equation that satisfactorily models the data's dynamics, and (b) to use the model to classify handwriting samples taken from differential individuals. After preliminary smoothing and registration steps, a second-order linear differential equation, for which the forcing function is small, is found to provide a good reconstruction of the original script records. The equation is also able to capture a substantial amount of the variation in the scripts across replication. The cross-validated classification process is 100% effective for the samples analyzed.
Conference Paper
Many feature selection models have been proposed for online handwriting recognition. However, most of them require expensive computational overhead, or inaccurately find an improper feature set which leads to unacceptable recognition rates. This paper presents a new efficient feature selection model for handwriting symbol recognition by using an improved sequential floating search method coupled with a hybrid classifier, which is obtained by combining hidden Markov models with multilayer forward network. The effectiveness of proposed method is verified by comprehensive experiments based on UNIPEN database
Article
Preprocessing and normalization techniques for on-line handwriting analysis are crucial steps that usually compromise the success of recognition algorithms. These steps are often neglected and presented as solved problems, but this is far from the truth. An overview is presented of the principal on-line techniques for handwriting preprocessing and word normalization, covering the major difficulties encountered and the various approaches usually used to resolve these problems. Some measurable definitions for handwriting characteristics are proposed, such as baseline orientation, character slant and handwriting zones. These definitions are used to measure and quantify the performance of the normalization algorithms. An approach to enhancing and restoring handwriting text is also presented, and an objective evaluation of all the processing results.
Conference Paper
This paper presents a combined approach for online handwriting symbols recognition. The basic idea of this approach is to employ a set of left-right HMMs to generate a new feature vector as input, and then use SNN as a classifier to finally identify unknown symbols. The new feature vector consists of global features and several pairs of maximum probabilities with their associated different model labels for an observation pattern. A recogniser based on this method inherits the practical and dynamical modeling abilities from HMM, and robust discriminating ability from SNN for classification tasks. This hybrid technique also reduces the dimensions of feature vectors significantly, complexity, and solves size problem when using only SNN. The experimental results show that this approach outperforms several classifiers reported in recent research, and can achieve recognition rates of 97.41%, 91.81% and 91.63% for digits and upper/lower case characters respectively on the UNIPEN database benchmarks.
Conference Paper
This paper presents a combined approach for online handwriting symbols recognition. The basic idea of this approach is to employ a set of left-right HMMs as a feature extractor to produce HMM features, and combine them with global features into a new feature vector as input, and then use SVM as a classifier to finally identify unknown symbols. The new feature vector consists of the global features and several pairs of maximum probabilities with their associated different model labels. A recogniser based on this method inherits the practical and dynamical modeling abilities from HMM, and robust discriminating ability from SVM for classification tasks. This technique also reduces the dimensions of feature vectors significantly and solves the speed and size problem when using only SVM. The experimental results show that this combined hybrid approach outperforms several classifiers reported in recent researches, and could achieve recognition rates of 97.48%, 91.99% and 91.74% for digits and upper/lower case characters respectively on the UNIPEN database benchmarks
Article
This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described