Conference Paper

Database Development and Recognition of Handwritten Devanagari Legal Amount Words.

Dept. of IT, PICT, Pune, India
DOI: 10.1109/ICDAR.2011.69 In proceeding of: 2011 International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, September 18-21, 2011
Source: IEEE Xplore

ABSTRACT A dataset containing 26,720 handwritten legal amount words written in Hindi and Marathi languages (Devanagari script) is presented in this paper along with a training-free technique to recognize such handwritten legal amounts present on Indian bank cheques. The recognition of handwritten legal amount words in Hindi and Marathi languages is a challenging because of the similar size and shape of many words in the lexicon. Moreover, many words have same suffixes or prefixes. The recognition technique proposed is a combination of two approaches. The first approach is based on gradient, structural and cavity (GSC) features along with a binary vector matching (BVM) technique. The second approach is based on vertical projection profile (VPP) feature and dynamic time warping (DTW). A number of highly matched words in both the approaches are considered for the recognition step in the combined approach based on a ranking scheme. Syntactical knowledge related to the languages is also used to achieve higher reliability. To the best of our knowledge, this is the first work of its kind in recognizing handwritten legal amounts written in Hindi and Marathi. Researchers interested in the dataset can contact the authors to get it through a shared link.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labor and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed: clusters with occurrences of the same word in a collection are established using image matching. By annotating "interesting" clusters, an index can be built automatically. We present an algorithm for matching handwritten words in noisy historical documents. The segmented word images are preprocessed to create sets of 1-dimensional features, which are then compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.
    Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on; 07/2003
  • [Show abstract] [Hide abstract]
    ABSTRACT: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.
    IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews) 12/2011; · 2.55 Impact Factor
  • IJPRAI. 01/1997; 11:827-844.

Full-text (3 Sources)

Available from
May 20, 2014