October 2024
·
15 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
October 2024
·
15 Reads
September 2024
·
141 Reads
November 2022
·
616 Reads
·
3 Citations
We present a CNN-BiLSTM system for the problem of offline English handwriting recognition, with extensive evaluations on the public IAM dataset, including the effects of model size, data augmentation and the lexicon. Our best model achieves 3.59% CER and 9.44% WER using CNN-BiLSTM network with CTC layer.Test time augmentation with rotation and shear transformations applied to the input image, is proposed to increase recognition of difficult cases and found to reduce the word error rate by 2.5% points. We also conduct an error analysis of our proposed method on IAM dataset, show hard cases of handwriting images and explore samples with erroneous labels. We provide our source code as public-domain, to foster further research to encourage scientific reproducibility.
November 2022
·
300 Reads
With the ever increasing speed of the digitization process, a large collection of Ottoman documents is accessible to researchers and the general public. But, the majority of the users interested in these documents can not read these documents unless they are transcripted to the modern Turkish script which use an extended version of the Latin alphabet. Manual transcription of such a massive amount of documents is beyond the capacity of human experts. As a solution, we propose an automatic recognition system for printed Ottoman documents which transcribes Ottoman texts directly to the modern Turkish script. We evaluated three decoding strategies including the Word Beam Search decoder that allows to use a recognition lexicon and n-gram statistics during the decoding phase. The system achieves 2.25% character error rate and 6.42% word error rate on a test set of 1.4K samples, using the test set transcriptions as the recognition lexicon. Using a general purpose, large lexicon of the Ottoman era (260K words and 77% test coverage), the performance is measured as 3.68% character error rate and 16.61% word error rate.
September 2022
·
41 Reads
Journal of the Ottoman and Turkish Studies Association