Evolution of Tamil characters depicting the historical variation in their forms and structure over the period of time

Evolution of Tamil characters depicting the historical variation in their forms and structure over the period of time

Source publication
Article
Full-text available
Deep learning-based character recognition of Tamil inscriptions plays a significant role in preserving the ancient Tamil language. The complexity of the task lies in the precise classification of the age-old Tamil letters (Vattezhuthu) into modern-day Tamil letter structures. Various methodologies and pre-processing techniques have been used for de...

Similar publications

Article
Full-text available
In recent years, Optical character recognition (OCR) has experienced a resurgence of interest especially for contemporary Arabic data. In fact, OCR development for printed and handwritten Arabic script is still a challenging task. These challenges are due to the specific characteristics of the Arabic script. In this work, we attempt to address thes...
Preprint
Full-text available
Handwritten Arabic script recognition is a challenging task due to the script's dynamic letter forms and contextual variations. This paper proposes a hybrid approach combining convolutional neural networks (CNNs) and Transformer-based architectures to address these complexities. We evaluated custom and fine-tuned models, including EfficientNet-B7 a...
Article
Full-text available
While searching on the internet, the OCR keyword will return a thousand research papers on optical character recognition. These papers are ranging from Latin language scripts, Cyrillic, Devanagari, Korean, Japanese, Chinese and Arabic scripts. Sindhi and many other languages extend the Arabic script in which base characters are same while the other...
Article
Full-text available
Optical character recognition (OCR) for Arabic presents unique challenges due to the script's cursive nature, contextual letter forms, multiple ligatures, the presence of diacritics, and the high variability in handwritten styles. This work introduces an enhanced Arabic handwritten word recognition architecture that integrates the attention mechani...

Citations

... They applied various preprocessing techniques to remove noise in Tamil scripts and to improve images. The authors reported that obtaining strong features is important in character recognition and compared the segmentation of characters, recognition rates and accuracy [26]. Batra et al. discuss the benefits of the OCR approach in the healthcare system, especially OCR applications that enable the digitisation of medical laboratory records [27]. ...
Article
Full-text available
In pattern recognition, the automatic identification of image-based input patterns by machine is an important problem. The need to convert images taken with tools such as cameras into formats usable by computers is increasing day by day. The output and efficiency of manual data entry processes are low and error rates are high. Outsourcing these processes to companies dealing with professional information entry is not preferred due to reasons such as security and lack of continuous service quality. For these and similar reasons, a system that enables automatic recognition of optical characters is aimed. With the help of this system, it is aimed to provide the opportunity to serve more people in a shorter time. In line with the stated objectives, a unique dataset has been created by performing ID card segmentation. This dataset was combined with the standard OCR dataset and classification was performed with ANN and proposed CNN methods on the extended dataset. The proposed CNN method, inspired by the Inception V3 model, is a deep learning model consisting of 15 layers. The ANN model uses features obtained from different wavelet types to increase discrimination. Competitive results were obtained from both ANN and proposed CNN models. In the proposed CNN model, in the extended version of the dataset, 99.49%, 98.87%, 99.48% and 99.50% values were obtained in the same order for training. Similarly, for validation, 99.13%, 98.43%, 99.21% and 99.27% values were obtained in the same order.
... Jo et al. [15] developed a method for handwritten character segmentation using CNN. The Optical Character Recognition (OCR) based Convolutional Neural Network (CNN) is used for character segmentation in Krithiga et al. [16]. ...
Article
Full-text available
Achieving precise character segmentation is vital for accurate character recognition. This paper introduces an innovative character segmentation algorithm specifically designed to address the unique properties of Malayalam characters found in Palm-Leaf Manuscripts (PLMs). The initial phase involves conducting an in-depth survey of existing character segmentation algorithms. This study thoroughly examines the advantages and challenges associated with various methods. The research specifically focuses on the significant hurdles encountered in segmenting Malayalam characters from palm scripts, including issues like the presence of 'Chandrakkala,' the characteristics and morphology of Malayalam alphabets, and the existence of broken characters. The novel algorithm, after evaluating results and considering relevant parameters, demonstrates superior performance in effectively segmenting characters written in Malayalam palm scripts.
Article
Full-text available
Vatteluttu (Vattezhuthu) script, prevalent between the 3rd and 8th century CE, played a crucial role in documenting early Chola and Pandya history. However, despite its historical significance, thousands of inscriptions remain untranslated, hindering a comprehensive understanding of early South Indian heritage. This research addresses these challenges by developing a deep learning-based approach for digitizing and recognizing Vatteluttu characters. Using a dataset of 1,800 segmented images representing 28 characters, the study integrates advanced preprocessing techniques and a Siamese CNN-RNN architecture to classify ancient scripts with an overall accuracy of 98%. The findings demonstrate the feasibility of automated transcription for ancient scripts, offering a robust framework for preserving and enhancing research on South Indian heritage.
Article
Full-text available
Tamil is the oldest language spoken in Tamil Nadu, India, with inscriptions dating back to the third century BCE found in caves, temples, and archaeological sites. The style and content of these inscriptions have evolved over time, reflecting changes in society, governance, and language usage. They provide valuable insights into rulers, dynasties, administrative systems, religious practices, and societal norms of their era. However, the diverse fonts and styles of these inscriptions necessitate an efficient method for alphabet and word recognition. Existing algorithms primarily recognize Tamil words and characters from the nineteenth century and do not address the language and styles used in the third century. This study proposes a novel DR-LIFT framework specifically designed for recognizing Tamil inscriptions from this earlier period, overcoming the limitations of current methods. The dataset used consists of third-century Tamil inscriptions. The algorithms within the DR-LIFT method specifically designed to detect text with intricate features such as curves, loops, and lines, significantly enhancing detection accuracy. The proposed framework achieves impressive outcomes, with a recognition accuracy of 99% and a recognition rate of 98.8%.