Javanese character is one of Indonesia's noble culture, especially in Java. However, the number of Javanese people who are able to read the letter has decreased so that there need to be conservation efforts in the form of a system that is able to recognize the characters. One solution to these problem lies in Optical Character Recognition (OCR) studies, where one of its heaviest points lies in feature extraction which is to distinguish each character. Shape Energy is one of feature extraction method with the basic idea of how the character can be distinguished simply through its skeleton. Based on the basic idea, then the development of feature extraction is done based on its components to produce an angular histogram with various variations of multiples angle. Furthermore, the performance test of this method and its basic method is performed in Javanese character dataset, which has been obtained from various images, is 240 data with 19 labels by using K-Nearest Neighbors as its classification method. Performance values were obtained based on the accuracy which is generated through the Cross-Validation process of 80.83% in the angular histogram with an angle of 20 degrees, 23% better than Shape Energy. In addition, other test results show that this method is able to recognize rotated character with the lowest performance value of 86% at 180-degree rotation and the highest performance value of 96.97% at 90-degree rotation. It can be concluded that this method is able to improve the performance of Shape Energy in the form of recognition of Javanese characters as well as robust to the rotation.
... Javanese script or better known as Hanacaraka is often used to write literature and daily writing in Javanese from the mid-15th century to the mid-20th century, but over time it is also used to write in various regional languages. Javanese script has written in the form of letters [1] [2][3] [4]. Along with the development of the times, Javanese script seems to be forgotten and rarely recognized by the public, especially the younger generation today. ...
... This image form is known as a digital image. Image processing [2]- [5] is divided into two types of methods, namely analog image processing, and digital image processing. Analog image processing is an image that is continuous, for example, such as digital cameras, paintings, photographic images, the human eye, and so on. ...
Javanese script (Hanacaraka) is one of the cultures owned by Indonesia. Javanese script is found in temples, inscriptions, cultural and prehistoric sites, ancient Javanese manuscripts, Gulden series banknotes, street signage, and palace documents. Javanese script has a form with an article, and the use of reading above the script is a factor that affects the character detection process. Punctuation marks, clothing, Swara script, vowels, and consonants are parts of the script that are often found in Javanetest scripts. Preserving Javanese script in the digital era, of course, must use technology that can support the digitization of Javanese script through the script detection process. The concept of script image is the image of Javanese script in ancient manuscripts. The process of character detection using certain techniques can be carried out to extract characters so that they can be read. Detection of Javanese characters can be found by finding a testing image. Here, we had been used 10 words images consisting of 3 to 5 syllables with the vowel aiu. Dataset process by Linear Binary Pattern (LBP) feature extraction, which is used to characterize images and describe image textures locally. LBP has been used in r=4 and preprocessing is also done by thresholding with d=0.3. This process can be done using the K-Nearest Neighbor algorithm. In 10 datasets of Javanese script words, an average accuracy value of 90.5% was obtained. The accuracy value of 100% is the highest and 50% is the lowest.
... There are Javanese script letters that have the same shape, and each letter also has different complexity or complexity [6] [7]. Therefore, most people have difficulty writing Javanese scripts, not even a few people who have not recognized or read Javanese script letters [8]. ...
... Where the black pixel '0' as the background and the white pixel '1' indicates the character of the Javanese script. In this process, Otsu's thresholding will be used to produce a black and white image because this technique can determine the threshold value automatically [8]. 3) Median filter for image enhancement, such as smoothing the image and removing noise at the same time [27]. ...
Purpose: The Javanese script generally has a basic script or is commonly referred to as the "carakan" script. The script consists of 20 letters with different levels of difficulty. Some letters have similarities, so research is needed to make it easier to detect the image of Javanese characters. Methods: This study proposes recognizing Hiragana's writing characters using the K-Nearest Neighbor (K-NN) method. In the preprocessing stage, the segmentation process is carried out using the thresholding method to perform segmentation, followed by the Histogram of Gradient (HOG) feature extraction process and noise removal using median filtering. Histogram of Gradient (HoG) is one of the features used in computer vision and image processing in detecting an object in the form of a descriptor feature. There are 1000 data divided into 20 classes. Each class represents one letter of the basic Javanese script. Result: Based on data collection using the writings of 50 respondents where each respondent writes 20 basic Javanese characters, the highest accuracy was obtained at K = 1, namely 98.5%. Novelty: Using several preprocessing such as cropping, median filtering, otsu thresholding and HOG feature extraction before do classification, this experiment yields a good accuracy.
... Another research conducted by Wibowo et al [16], about Javanese character feature extraction based on Shape Energy. This research uses the Shape Energy method for feature extraction, which is one of the methods with the basic idea of how the character can be distinguished simply through its skeleton. ...
Indonesia is a country rich in a variety of regional cultures. Regional airspace needs to be preserved so as not to become extinct. One of them is the local culture of Central Java Province, namely Javanese Character. In this modern era, globalization is growing in every country. The impact of globalization is increasingly widespread and developing in society. One effect of globalization is local people prefer foreign language skills to learn local languages. This study, applies the method of character recognition using a new combination workflow that contains Local Binary Pattern (LBP) and Information Gain. Then compare Support Vector Machine (SVM), k-Nearest Neighbor and Naïve Bayes. The LBP method is used to obtain an image's texture or shape characteristics. Information Gain is used for the feature selection algorithm, whereas SVM, k-Nearest Neighbor and Naïve ayes is used for the classification method. From previous research, the information gain method succeeded in increasing the accuracy by 2%. This research compares the SVM classification with another classification method, and the result shows that our proposed can improve classification performance. The best accuracy result using SVM classification gets 87,86%, at ten folds and cell size 64x64.
... Research conduted by Lady (2018, p. 416) saing that, Javanese is one of the most widely spoken languages in Indonesia, because Javanese is still growing rapidly in everyday conversation. In Javanese learning, students are required to master five competency standards, namely: (1) listening, (2) speaking, (3) reading, (4) writing, (5) literary and non-literary appreciation within the framework of Javanese culture. Translating is included in the standard of writing competence, because it changes the form of writing from Latin writing to Javanese script. ...
... Indonesia is a country that has a variety of cultures; one of the cultures heritages that must preserve is the Javanese letters [1], [2]. Javanese letter is an ancient Javanese character, used since the 17th century [3]. ...
One of the essential things in research engaged in the field of Computer Vision is image classification, wherein previous studies models were used to classify an image. Javanese Letters, in this case, is a basis of a sentence that uses the Javanese language. The problem is that Javanese sentences are often found in Yogyakarta, especially the use of name tourist attractions, making it difficult for tourists to translate these Javanese sentences. Therefore, in this study, we try to create a Javanese character classification model hoping that this model will later be used as a basis for developing research into the next stage. One of the most popular methods lately for dealing with image classification problems is to use Deep Learning techniques, namely using the Convolutional Neural Network (CNN) method using the KERAS framework. The simplicity of the training model and dataset used in this work brings the advantage of computation weight and time. The model has an accuracy of 86.68% using 1000 datasets and conducted for 50 epochs based on the results. The average inference time with the same specification mentioned above is 0.57 seconds, and again the fast inference time is because of the simplicity of the model and dataset toolbar. This model's advantages with fast and light computation time bring the possibility to use this model on devices with limited computation resources such as mobile devices, familiar web server interface, and internet-of-things devices.
... The working principle of the high-boost filter is to substract the image with the results of the low pass filter from the image. The use of lowpass filters could be either a median filter [12] or a Gaussian filter [13]. High-boost filter can be expressed as ( , ) ...
Every person has different location of veins, some veins are easily detected because it is visible due to thin tissue, and the other are invisible. This different location of veins causes intravenous access procedures and the procreas of intravenous therapy become longer. Multi-distance vein projections aim to simplify the measurement process where the device and object do not have to be at a certain distance. Some research that has been done especially for real-time vein projection does not conduct how the characteristics of projection at different distances. In this paper, we propose a method for performing multi-distance real-time back-projection by using the intersection between camera and projector. This method equiped with an ultrasonic distance sensor to identify the projection characteristic in any distance. In its implementation, this method is able to project at a distance of 20-40 cm with a maximum projection error of 0.6 mm. The measurement angle tolerance between the object and the device is ±5 degrees with a maximum error of 0.7 mm.
... Many literary works, inscriptions, histories, fibers are written using Aksara Jawa which contains a lot of history, science, various secrets that have not been revealed that happened in the past. Therefore, the importance of preserving Aksara Jawas for the younger generation, one of them is getting successors to uncover the secrets behind literary or historical works written using Aksara Jawa [10], [11]. ...
... For MNIST dataset [6], CNN can led to the recognition rate to 99.28% [7]. The success story of CNN in handwriting digit recognition (MNIST) has elevated many researchers to classify other scripts such as Javanese [8], Chinese [9], Bangla [10], and Khmer [2]. Actually, they proposed deep CNN architectures with the combination of multiple convolution layer, pooling, dropout and fully connected layer. ...
Handwriting recognition is still a real challenge in classification tasks. Not as in modern documents, the isolated glyph images in ancient document have various random noises, non uniform background color, and smudge. Convolution neural network (CNN) is one of successful method in pattern recognition and machine learning to classify the objects. The evaluation of some CNN architectures with several different convolution layers classifying the isolated glyph image are presented in this paper. The experimental study is tested on 60 classes of glyph from the ancient Sundanese dataset that published in ICDAR 2017. Beside, the batch normalization is also investigated to measure the performance of the learning process. The results shown that the recognition rate was affected by multi convolutional layers, multi fully connected layer and batch normalization. Based on the experimental study, model 8C2F could achieve 86.15% of recognition rate.
... In addition, OCR can help people or big efforts in recognizing ancient digital literature and ancient in Letters. Other benefits can be used in education as a media of learning, especially on the local content of Java language [7]. The method used in this OCR system is Template Matching. ...
In this modern age, the impact of globalization is increasingly entering and expanding into most societies. One impact of globalization makes people prefer to learn the language and use a foreign language rather than the local language, especially the Java language. It is very influential on the knowledge of the community about the existence or the existence of Javanese Letter, especially in the field of education. In this study, In this research will be made an application to recognize the writing of Javanese Letter based on Optical Character Recognition (OCR). Matching templates correlation can be used as pattern recognition methods. How the Template Matching Algorithm works by matching the template image with the test image after going through the Pre-processing and segmentation process. From the research that has been done by using 10 character template and 20 data testing get accuracy equal to 93.44% and error rate 6.56%. So the Matching Template Algorithm can well recognize the Javanese Letter pattern.
Salah satu peninggalan budaya Indonesia dari tanah jawa yaitu Aksara Jawa. Aksara Jawa telah digunakan oleh masyarakat sejak jaman dulu untuk menulis sastra dan menulis sehari-hari. Karena memiliki bentuk yang rumit aksara jawa menjadi jarang dikenali. Oleh karena itu pada penelitian ini akan dijadikan salah satu upaya untuk belajar sekaligus melestarikan budaya khususnya Aksara Jawa yakni melakukan suatu transliterasi Aksara Jawa. Metode yang digunakan pada penelitian ini adalah dengan menggunakan Optical Character Recognition (OCR) berbasis Template matching dan ekstraksi fitur Linear Binary Pattern (LBP). OCR akan dapat membantu dalam proses pengkonversian gambar yang berisi tulisan Aksara Jawa yang nantinya akan dilakukan proses transliterasi. Cara kerja Template Matching adalah dengan mencocokkan tiap bagian pada citra dengan citra template yang telah ditentukan. LBP akan membantu dalam pengenalan huruf aksara yang memiliki objek terpisah. Berdasarkan dari pengujian diperoleh rata-rata akurasi sebesar 89,4%. Nilai akurasi tertinggi sebesar 100% dan nilai akurasi terendah sebesar 50%. Tingkat keberhasilan dalam proses transliterasi bergantung pada kejelasan objek huruf Aksara Jawa pada citra uji.
The term "Songket" comes from the Malay word "Sungkit", which means "to hook" or "to gouge". Every motifs names and variations was derived from plants and animals as source of inspiration to create many patterns of songket. Each of songket patterns have a philosophy in form of rhyme that refers to the nature of the sources of songket patterns and that philosophy reflects to the beliefs and values of Malay culture. In this research, we propose a system to facilitate an understanding of songket and the philosophy as a way to conserve Songket culture. We propose a system which is able to collect information in image songket motif variations based on feature extraction methods. On each image songket motif variations, we extracted philosophy of rhyme into impressions, and extracting color features of songket images using a histogram 3D-Color Vector quantization (3D-CVQ), shape feature extraction songket image using HU Moment invariants. Then, we created an image search based on impressions, and impressions search based on image. We use techniques of search based on color, shape and aggregation (combination of colors and shapes). The experiment using impression as query : 1) Result based on color, the average value of true 7.3, total score 41.9, 2) Result based on shape, the average value of true 3, total score 16.4, 3) Result based on aggregation, the average value of true 3, total score 17.4. While based using Image Query : 1) Result based on color, the average precision 95%, 2) Result based on shape, average precision 43.3%, 3) Based aggregation, the average precision 73.3%. From our experiments, it can be concluded that the best search system using query impression and query image is based on the color.
Keyword : Image Search, Philosophy, impression, Songket, cultural computing, Feature Extraction, Analytical aggregation.
Javanese characters are traditional characters that are used to write the Javanese language. The Javanese language is a language used by many people on the island of Java, Indonesia. The use of Javanese characters is diminishing more and more because of the difficulty of studying the Javanese characters themselves. The Javanese character set consists of basic characters, numbers, complementary characters, and so on. In this research we have developed a system to recognize Javanese characters. Input for the system is a digital image containing several handwritten Javanese characters. Preprocessing and segmentation are performed on the input image to get each character. For each character, feature extraction is done using the ICZ-ZCZ method. The output from feature extraction will become input for an artificial neural network. We used several artificial neural networks, namely a bidirectional associative memory network, a counterpropagation network, an evolutionary network, a backpropagation network, and a backpropagation network combined with chi2. From the experimental results it can be seen that the combination of chi2 and backpropagation achieved better recognition accuracy than the other methods.
Feature extraction is one of the fundamental problems of character
recognition. The performance of character recognition system is depends on
proper feature extraction and correct classifier selection. In this article, a
rapid feature extraction method is proposed and named as Celled Projection (CP)
that compute the projection of each section formed through partitioning an
image. The recognition performance of the proposed method is compared with
other widely used feature extraction methods that are intensively studied for
many different scripts in literature. The experiments have been conducted using
Bangla handwritten numerals along with three different well known classifiers
which demonstrate comparable results including 94.12% recognition accuracy
using celled projection.
A modified version of the fast parallel thinning algorithm proposed by Zhang and Suen is presented in this paper. It preserves the original merits such as the contour noise immunity and good effect in thinning crossing lines; and overcomes the original demerits such as the serious shrinking and line connectivity problems.
One effort to maintain documents or records is to make changes in the form of digital images. The drawings further processing needs to be done so that the text or sentences therein can be operated as do the search, analysis, or manipulation of the contents of the text. The treatment process is known as optical character recognition (OCR) and continues to develop. OCR is generally divided into three main stages, namely preprocessing, feature extraction and classification. Feature extraction is one of the essential or fundamental processes in character recognition. The purpose of feature extraction is to obtain the characteristics of each character. The results at this stage can affect the quality of character recognition. Generally, feature extraction on character is done by a complex calculation so as to cause the necessary time computing is not a little, especially in real time recognition case. In this paper, feature extraction can be done simply proposed as an alternative, called Shape Energy. This method uses the approach of how humans are able to distinguish between characters or numbers in a simple. It results in three elements which are elasticity, curvature, and texture. The elasticity is first derivative and the curvature is second derivative of each pixel in the frame of the character, which is obtained from thinning. While the texture value is 4-direction chain-codes. This method testing has been done on some type of character by using back propagation neural network as a method on classification stage. This testing resulted in average value accuracy rate of success in identifying these characters by 90.3%.
Javanese language is the language used by the people on the island of Java and it has its own form of letters called Java characters. Recognition of Java characters is quite difficult because it consist of basic characters, numbers, complementary characters, and so on. In this research we developed a system to recognize Java characters and compared two methods of neural network namely evolutionary neural network and combination of Chi2 and backpropagation neural network. Input for the system is a digital image of Java characters. Before entering into the neural network, the digital image is processed by reducing noise, segmentation and thinning and feature extraction. From experimental result, evolutionary neural network has 60% average recognition accuracy, while combination of Chi2 and backpropagation network has 70% average recognition accuracy.
Pattern recognition has become more and more popular and important to us since 1960's and it induces attractive attention coming from a wider areas. In this paper Pattern recognition was introduced including concept, method, application and integration. At the same time, ten definitions and more than ten methods of pattern recognition were summarized. On the end, the structure and classification of PR and its related fields and application areas were introduced in detail.
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.
The machine replication of human reading has been the subject of intensive research for more than three decades. A large number of research papers and reports have already been published on this topic. Many commercial establishments have manufactured recognizers of varying capabilities. Handheld, desk-top, medium-size and large systems costing as high as half a million dollars are available, and are in use for various applications. However, the ultimate goal of developing a reading machine having the same reading capabilities of humans still remains unachieved. So, there still is a great gap between human reading and machine reading capabilities, and a great amount of further effort is required to narrow-down this gap, if not bridge it. This review is organized into six major sections covering a general overview (an introduction), applications of character recognition techniques, methodologies in character recognition, research work in character recognition, some practical OCRs and the conclusions.