Conference Paper

Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

What happens if we see a suitable font for our design work but we do not know its name? Visual Font Recognition (VFR) systems are used to identify the font typeface in an image. These systems can assist graphic designers in identifying fonts used in images. A VFR system also aids in improving the speed and accuracy of Optical Character Recognition (OCR) systems. In this paper, we proposed the first publicly available datasets in the field of Persian font recognition and employed Convolutional Neural Networks (CNN) to address the Persian font recognition problem. The results show that the proposed pipeline obtained 78.0% top-1 accuracy on our new datasets, 89.1% in the IDPL-PFOD dataset, and 94.5% in the KAFD dataset. Furthermore, the average time spent in the entire pipeline for one sample of our proposed datasets is 0.54 and 0.017 seconds for CPU and GPU, respectively. We conclude that CNN methods can be used to recognize Persian fonts without the need for additional pre-processing steps such as feature extraction, binarization, normalization, etc.
Content may be subject to copyright.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper, we designed a methodology to classify facial nerve function after head and neck surgery. It is important to be able to observe the rehabilitation process objectively after a specific brain surgery, when patients are often affected by face palsy. The dataset that is used for classification problems in this study only contains 236 measurements of 127 patients of complex observations using the most commonly used House–Brackmann (HB) scale, which is based on the subjective opinion of the physician. Although there are several traditional evaluation methods for measuring facial paralysis, they still suffer from ignoring facial movement information. This plays an important role in the analysis of facial paralysis and limits the selection of useful facial features for the evaluation of facial paralysis. In this paper, we present a triple-path convolutional neural network (TPCNN) to evaluate the problem of mimetic muscle rehabilitation, which is observed by a Kinect stereovision camera. A system consisting of three modules for facial landmark measure computation and facial paralysis classification based on a parallel convolutional neural network structure is used to quantitatively assess the classification of facial nerve paralysis by considering facial features based on the region and the temporal variation of facial landmark sequences. The proposed deep network analyzes both the global and local facial movement features of a patient’s face. These extracted high-level representations are then fused for the final evaluation of facial paralysis. The experimental results have verified the better performance of TPCNN compared to state-of-the-art deep learning networks.
Article
Full-text available
In the past years, traditional pattern recognition methods have made great progress. However, these methods rely heavily on manual feature extraction, which may hinder the generalization model performance. With the increasing popularity and success of deep learning methods, using these techniques to recognize human actions in mobile and wearable computing scenarios has attracted widespread attention. In this paper, a deep neural network that combines convolutional layers with long short-term memory (LSTM) was proposed. This model could extract activity features automatically and classify them with a few model parameters. LSTM is a variant of the recurrent neural network (RNN), which is more suitable for processing temporal sequences. In the proposed architecture, the raw data collected by mobile sensors was fed into a two-layer LSTM followed by convolutional layers. In addition, a global average pooling layer (GAP) was applied to replace the fully connected layer after convolution for reducing model parameters. Moreover, a batch normalization layer (BN) was added after the GAP layer to speed up the convergence, and obvious results were achieved. The model performance was evaluated on three public datasets (UCI, WISDM, and OPPORTUNITY). Finally, the overall accuracy of the model in the UCI-HAR dataset is 95.78%, in the WISDM dataset is 95.85%, and in the OPPORTUNITY dataset is 92.63%. The results show that the proposed model has higher robustness and better activity detection capability than some of the reported results. It can not only adaptively extract activity features, but also has fewer parameters and higher accuracy.
Conference Paper
Full-text available
As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem [4], and advance the state-of-the-art remarkably by developing the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting of both labeled synthetic data and partially labeled real-world data. Next, to combat the domain mismatch between available training and testing data, we introduce a Convo-lutional Neural Network (CNN) decomposition approach, using a domain adaptation technique based on a Stacked Convolutional Auto-Encoder (SCAE) that exploits a large corpus of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. Moreover, we study a novel learning-based model compression approach, in order to reduce the DeepFont model size without sacrificing its performance. The DeepFont system achieves an accuracy of higher than 80% (top-5) on our collected dataset, and also produces a good font similarity measure for font selection and suggestion. We also achieve around 6 times compression of the model without any visible loss of recognition accuracy.
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Article
Full-text available
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Conference Paper
Full-text available
in this paper, a new method for Farsi font recognition based on combination of features is proposed. The features are extracted and combined from textures of size 128×128 using SRF and Wavelet transform. Wavelet and SRF are naturally different methods of feature extraction, so their errors have low correlation. In this condition, the combination of these features which are both applicable for texture recognition was expected to reduce total error and the experimental results approved this hypothesis. The proposed algorithm is tested on 21000 samples provided from 10 common Farsi fonts. In the method presented here, the font characteristics are extracted well and this is clear in the results. We achieved the recognition rate of 95.56% using MLP classifier which is 2.37% and 11.79% more than SRF and Wavelet transform respectively
Conference Paper
Full-text available
In this paper a directional filter is proposed to describe the curvedness of textures. The proposed filter is inspired by the basic Gabor filter and has an elliptic form. Thus, they are called directional elliptic Gabor (DEG) filters. Characters and subwords in Farsi machine-printed texts are constructed from both straight and curved segments. Moreover, the amounts of curvedness of various Farsi fonts are different. Therefore, the features based on the proposed filter can be useful in Farsi font recognition. Better describing straightness and curvedness of text components increases the separability among various fonts. Experiments demonstrate that using both Gabor filters and the proposed DEG filters for texture features extraction improves the Farsi font recognition accuracy.
Article
Full-text available
Most of the presented papers about font recognition, are in block or line level. In this paper a new approach is presented which is able to recognize font of a Farsi document image in letter level. In this approach using the Euclidean distance between spatial descriptors and gradient value in each boundary point of some special Farsi letters in a document image its font is recognized. To implement and evaluate this approach we constructed a dataset consisting of some templates in 25 widely used Farsi fonts and another dataset including 500 Farsi document images. Obtained recognition rate was 98.7%.
Article
Full-text available
The cursive nature of Persian alphabet, and the com-plex and convoluted rules regarding this script cause ma-jor challenges to segmentation as well as recognition of Persian words. We propose a new segmentation algo-rithm for the main stroke of online Persian handwritten words. Using this segmentation, we present a perturba-tion method which is used to generate artificial samples from handwritten words. Our recognition system is com-posed of three modules. The first module deals with the preprocessing of the data. We propose a wavelet-based smoothing technique which enhances the recognition per-formance compared to the conventional widely used tech-nique. The second module is word segmentation into con-vex portions of the global shape which we call Convex Curve Sectors (CCSs). The third module is to analyze those CCSs and use the information for recognition per-formed by Dynamic Time Warping (DTW) technique. Us-ing CCSs provides the DTW-based classifier with a com-pact word representation which makes comparison much faster.
Article
Full-text available
A new approach for the recognition of Farsi fonts is proposed. Font type of individual lines with any font size is recognized based on a new feature. Previous methods proposed for font recognition are mostly based on Gabor filters and recognize font type of a block of text rather than a line or a phrase. Usually all text lines of the same block or paragraph do not have the same font, e.g. titles usually have different fonts. On the other hand although the Gabor filter does this task fairly, but it is very time consuming, so that feature extraction of a texture of size 128*128 takes about 178ms on a 2.4GHz PC. In this paper we perform font recognition in line level using a new feature based on Sobel and Roberts gradients in 16 directions, called SRF. We break each line of text into several small parts and construct a texture. Then SRF is extracted as texture features for the recognition. This feature requires much less computation and therefore it can be extracted very faster than common textural features like Gabor filter, wavelet transform or momentum features. Our experiments show that it is about 50 times faster than an 8-channel Gabor filter. At the same time, SRF can represent the font characteristics very well, so that we achieved the recognition rate of 94.16% on a dataset of 10 popular Farsi fonts. This is about 14% better than what an 8-channel Gabor filter can perform. If we ignore the errors between very similar fonts, the recognition rate of about 96.5% will be achieved.
Article
Full-text available
Font recognition is one of the fundamental tasks in document recognition, because it is an important factor in optical character recognition. Classical supervised methods need lot of labeled data to train a classifier. Since it is very costly and time consuming to label large amounts of data, it is useful to use data sets without labels. So many different semi-supervised learning methods have been studied recently. Among the semi-supervised methods, self-training is one of the important learning algorithms that classify the unlabeled samples with small amount of labeled ones and add the most confident samples to the training set. In this paper, we apply majority vote approach to classify the unlabeled data to reliable and unreliable classes. Then, we add the reliable data to training set and classify the remaining data including unreliable data in iterative process. We test this method on the extracted features of ten common Persian fonts. Experimental result indicates that proposed method improves the classification performance and it’s effective.
Article
Full-text available
Most optical font recognition (OFR) methods have been designed to recognize the font in non-cursive documents. However, the recognition of cursive font scripts like Farsi/Arabic texts has its own challenges. Thus, most of the currently proposed algorithms fail to exhibit an appropriate recognition rate when facing cursive documents. In this paper, a new method for Farsi/Arabic automatic font recognition is proposed which is based on scale invariant feature transform (SIFT) method. As SIFT features are scale-invariant, the final system is robust against variation of size, scale and rotation. The system does not need a pre-processing stage but in the case of low quality images some noise removal processes can be used. Using a database of 1400 text images, an excellent recognition rate of nearly 100% is obtained.
Article
Image segmentation is a key task in computer vision and image processing with important applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among others, and numerous segmentation algorithms are found in the literature. Against this backdrop, the broad success of Deep Learning (DL) has prompted the development of new image segmentation approaches leveraging DL models. We provide a comprehensive review of this recent literature, covering the spectrum of pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the relationships, strengths, and challenges of these DL-based segmentation models, examine the widely used datasets, compare performances, and discuss promising research directions.
Article
The field of machine learning is witnessing its golden era as deep learning slowly becomes the leader in this domain. Deep learning uses multiple layers to represent the abstractions of data to build computational models. Some key enabler deep learning algorithms such as generative adversarial networks, convolutional neural networks, and model transfers have completely changed our perception of information processing. However, there exists an aperture of understanding behind this tremendously fast-paced domain, because it was never previously represented from a multiscope perspective. The lack of core understanding renders these powerful methods as black-box machines that inhibit development at a fundamental level. Moreover, deep learning has repeatedly been perceived as a silver bullet to all stumbling blocks in machine learning, which is far from the truth. This article presents a comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications. It was also undertaken to review the issues faced in deep learning such as unsupervised learning, black-box models, and online learning and to illustrate how these challenges can be transformed into prolific future research avenues.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Conference Paper
This paper addresses the large-scale visual font recogni- tion (VFR) problem, which aims at automatic identification of the typeface, weight, and slope of the text in an image or photo without any knowledge of content. Although vi- sual font recognition has many practical applications, it has largely been neglected by the vision community. To address the VFR problem, we construct a large-scale dataset con- taining 2, 420 font classes, which easily exceeds the scale of most image categorization datasets in computer vision. As font recognition is inherently dynamic and open-ended, i.e., new classes and data for existing categories are constantly added to the database over time, we propose a scalable so- lution based on the nearest class mean classifier (NCM). The core algorithm is built on local feature embedding, lo- cal feature metric learning and max-margin template se- lection, which is naturally amenable to NCM and thus to such open-ended classification problems. The new algo- rithm can generalize to new classes and new data at lit- tle added cost. Extensive experiments demonstrate that our approach is very effective on our synthetic test images, and achieves promising results on real world test images.
Chapter
In spite of important role of font recognition in document image analysis, only a few researchers have addressed the issue. This work presents a new approach for font recognition of Farsi document images. In this approach using two types of features, font and font size of Farsi document images are recognized. The first feature is related to holes of letters of text of document image. The second feature is related to horizontal projection profile of text lines of document image. This approach has been applied on 7 widely used Farsi fonts and 7 font sizes. A dataset of 10*49 images and another dataset of 110 images were used for testing and recognition rate more than 93.7% obtained. Images have been made using paint software and are noiseless and without skew. This approach is fast and is applicable for other languages that are similar to Farsi, such as Arabic language.
Article
Font Recognition (FR) is useful in improving optical text recognition accuracy and time. In addition, it can be used to restore the original document text fonts, styles and sizes. In this paper, we survey the literature of Arabic and Farsi FR research and used databases. The main phases of FR systems are surveyed (viz. preprocessing, classification techniques and used features). All published work of Arabic and Farsi FR, which the authors are aware of, are surveyed. To our knowledge, this is the first survey of Arabic/Farsi FR and used databases. In addition, the paper addresses the strengths and limitations of the presented techniques and specified areas of research that are not, so far, addressed in Arabic/Farsi FR as well as areas of possible improvement.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Conference Paper
In this paper, a new method based on fractal geometry is proposed for Farsi/Arabic font recognition. The feature extraction does not depend on the document contents which considers font recognition problem as texture identification task The main features are obtained by combining the BCD, DCD, and DLA techniques. Dataset includes 2000 samples of 10 typefaces, each containing four sizes. The average recognition rates obtained for these 10 fonts and 4 sizes (40 classes) using RBF and KNN classifiers are 96% and 91% respectively. The dimension of feature vectors extracted by the proposed fractal approach is very low. This property obviates the need for numerous training samples. Experimental results show that this algorithm is robust against skew. Simultaneously identifying type and size of the font is the most important innovation of this paper.
Article
In this paper we examine the use of global texture analysis based approaches for the purpose of Persian font recognition in machine-printed document images. Most existing methods for font recognition make use of local typographical features and connected component analysis. However derivation of such features is not an easy task. Gabor filters are appropriate tools for texture analysis and are motivated by human visual system. Here we consider document images as textures and use Gabor filter responses for identifying the fonts. The method is content independent and involves no local feature analysis. Two different classifiers Weighted Euclidean Distance and SVM are used for the purpose of classification. Experiments on seven different type faces and four font styles show average accuracy of 85% with WED and 82% with SVM classifier over typefaces
Conference Paper
In this paper we introduce a model-based omnifont Persian OCR system. The system uses a set of 8 primitive elements as structural features for recognition. First, the scanned document is preprocessed. After normalizing the preprocessed image, text rows and sub-words are separated and then thinned. After recognition of dots in sub-words, strokes are extracted and primitive elements of each sub-word are recognized using the strokes. Finally, the primitives are compared with predefined set of character identification vectors in order to identify sub-word characters. The separation and recognition steps of the system are concurrent, eliminating unavoidable errors of independent separation of letters. The system has been tested on documents with 14 standard Persian fonts in 6 sizes. The achieved precision is 97.06%.
Article
Hitherto communication theory was based on two alternative methods of signal analysis. One is the description of the signal as a function of time; the other is Fourier analysis. Both are idealizations, as the first method operates with sharply defined instants of time, the second with infinite wave-trains of rigorously defined frequencies. But our everyday experiences¿especially our auditory sensations¿insist on a description in terms of both time and frequency. In the present paper this point of view is developed in quantitative language. Signals are represented in two dimensions, with time and frequency as co-ordinates. Such two-dimensional representations can be called ¿information diagrams,¿ as areas in them are proportional to the number of independent data which they can convey. This is a consequence of the fact that the frequency of a signal which is not of infinite duration can be defined only with a certain inaccuracy, which is inversely proportional to the duration, and vice versa. This ¿uncertainty relation¿ suggests a new method of description, intermediate between the two extremes of time analysis and spectral analysis. There are certain ¿elementary signals¿ which occupy the smallest possible area in the information diagram. They are harmonic oscillations modulated by a ¿probability pulse.¿ Each elementary signal can be considered as conveying exactly one datum, or one ¿quantum of information.¿ Any signal can be expanded in terms of these by a process which includes time analysis and Fourier analysis as extreme cases. These new methods of analysis, which involve some of the mathematical apparatus of quantum theory, are illustrated by application to some problems of transmission theory, such as direct generation of single sidebands, signals transmitted in minimum time through limited frequency channels, frequency modulation and time-division multiplex telephony.
Conference Paper
Font Recognition is one of the Challenging tasks in Optical Character Recognition. Most of the existing methods for font recognition make use of local typographical features and connected component analysis. In this paper, English font recognition is done based on global texture analysis. The main objective of this proposal is to employ support vector machines (SVM) in identifying various fonts. The feature vectors are extracted by making use of Gabor filters and the proposed SVM is trained using these features. The method is found to give superior performance over neural networks by avoiding local minima points. The SVM model is formulated tested and the results are presented in this paper. It is observed that this method is content independent and the SVM classifier shows an average accuracy of 93.54%.
Conference Paper
In this paper we present a multi-font OCR system to be employed for document processing, which performs, at the same time, both the character recognition and the font-style detection of the digits belonging to a subset of the existing fonts. The detection of the font-style of the document words can guide a rough automatic classification of documents, and can also be used to improve the character recognition. The system uses the tangent distance as a classification function in a nearest neighbour approach. We have to discriminate among different digits and, for the same character, we have to discriminate among different font-styles. The nearest neighbour approach is always able to recognize the digit, but the performance in font detection is not optimal. To improve the performance of the system, we have used a discriminant model, the TD-Neuron, which is employed to discriminate between two similar classes. Some experimental results and prospective use in document processing applications are presented
Article
We describe a novel texture analysis-based approach toward font recognition. Existing methods are typically based on local typographical features that often require connected components analysis. In our method, we take the document as an image containing some specific textures and regard font recognition as texture identification. The method is content-independent and involves no detailed local feature analysis. Experiments are carried out by using 14000 samples of 24 frequently used Chinese fonts (six typefaces combined with four styles), as well as 32 frequently used English fonts (eight typefaces combined with four styles). An average recognition rate of 99.1 percent is achieved. Experimental results are also included on the robustness of the method against image degradation (e.g., pepper and salt noise) and on the comparison with existing methods
Idpl-pfod: An image dataset of printed farsi text for ocr research
  • hosseini
Understanding geometry of encoder-decoder cnns
  • J C Ye
  • W K Sung
J. C. Ye and W. K. Sung, "Understanding geometry of encoder-decoder cnns," 2019. [Online]. Available: https://arxiv.org/abs/1901.07647
The official unsplash api
  • Unsplash
Unsplash. The official unsplash api. Accessed Mar. 15, 2022. [Online]. Available: https://unsplash.com/developers
Persian text of the shahnameh book
  • Dataheart
Dataheart. Persian text of the shahnameh book. Accessed Mar. 12, 2022. [Online]. Available: http://dataheart.ir
Idpl-pfod: An image dataset of printed farsi text for ocr research
  • F S Hosseini
  • S Kashef
  • E Shabaninia
  • H Nezamabadi-Pour
F. s. Hosseini, S. Kashef, E. Shabaninia, and H. Nezamabadi-pour, "Idpl-pfod: An image dataset of printed farsi text for ocr research," in Proceedings of The Second International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2021) co-located with ICNLSP 2021. Trento, Italy: Association for Computational Linguistics, 12-13 Nov. 2021, pp. 22-31.