Fig 2 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Multiple glyphs for three Arabic symbols with the DecoType Naskh font, all shown with red color. On the left; glyphs for isolated , in the middle; glyphs for medial and on the right; glyphs for medial .Best viewed in color.
Source publication
Ottoman script is an Arabic alphabet-basedscript as well. It was a writing system of theTurkish language for several centuries until it was replaced with the modern Turkish script,which is based on the Latin alphabet, in 1928. With the ever increasing digitization campaigns, millions of Ottoman documents are coming to light. But, their contents are...
Context in source publication
Context 1
... font types provide multiple shapes for the same contextual forms of some characters, which increases the size of the glyph set significantly. Figure 2 shows shape variants provided by the DecoType Naskh font for three such characters. 'Harakat' are tiny diacritic marks used to show some unrepresented vowels especially in manuscripts. ...
Similar publications
With the ever increasing speed of the digitization process, a large collection of Ottoman documents is accessible to researchers and the general public. But, the majority of the users interested in these documents can not read these documents unless they are transcripted to the modern Turkish script which use an extended version of the Latin alphab...
Optical Character Recognition (OCR) technology automates the extraction and recognition of text from scanned documents or images, leveraging machine learn- ing models trained on standardized datasets. Historical Arabic manuscripts, housed in national libraries and archives around the world, hold immense cultural, religious, and historical significa...
In recent years, Optical character recognition (OCR) has experienced a resurgence of interest especially for contemporary Arabic data. In fact, OCR development for printed and handwritten Arabic script is still a challenging task. These challenges are due to the specific characteristics of the Arabic script. In this work, we attempt to address thes...
Optical character recognition (OCR) for Arabic presents unique challenges due to the script's cursive nature, contextual letter forms, multiple ligatures, the presence of diacritics, and the high variability in handwritten styles. This work introduces an enhanced Arabic handwritten word recognition architecture that integrates the attention mechani...
Deep learning-based character recognition of Tamil inscriptions plays a significant role in preserving the ancient Tamil language. The complexity of the task lies in the precise classification of the age-old Tamil letters (Vattezhuthu) into modern-day Tamil letter structures. Various methodologies and pre-processing techniques have been used for de...
Citations
In this study, a deep learning-based method is developed for character detection and recognition in printed Ottoman documents. The character detection and recognition problem are considered as an object detection problem and for this purpose, an Ottoman character recognition model is developed based on the YOLO model, which is one of the most successful methods in object detection. In addition, in this study, a dataset consisting of Ottoman document images is created in which each character in the document images is marked. Data augmentation techniques are applied to improve the accuracy of character recognition and the robustness of the method. The Ottoman character recognition network was then trained using this dataset. The trained network model was tested with the test images in the dataset. The performance evaluation of the model was performed by calculating the average precision metric, which is frequently used in the literature. The average precision value was calculated for 34 character classes in the dataset and the results were interpreted in terms of the pros and cons of the method. The results show that the proposed method can detect and recognize characters in printed Ottoman documents with great accuracy, with a weighted average precision of 98.71%.