ArticlePDF Available

Novel Approach for Baseline Detection and Text Line Segmentation

Authors:

Abstract and Figures

Baseline detection and line segmentation are essential preprocessing steps of any OCR system. In this paper we have proposed a robust and fast method for base lines detection based on projected pattern analysis of Radon Transform. The algorithm have been tested on more than 350 samples including both printed and handwriting of Persian/Arabic, English and also multilingual documents. Obtained results indicate that in spite of narrow interline spaces and noisy components our method is capable to extract baseline in documents precisely. In addition, in the case of multi-frequencies pattern, it has been shown that proposed method can reach its performance to accurate detection of base lines.
Content may be subject to copyright.
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
9
Novel Approach for Baseline Detection and Text Line
Segmentation
ABSTRACT
Baseline detection and line segmentation are essential
preprocessing steps of any OCR system. In this paper we
have proposed a robust and fast method for base lines
detection based on projected pattern analysis of Radon
Transform. The algorithm have been tested on more than 350
samples including both printed and handwriting of
Persian/Arabic, English and also multilingual documents.
Obtained results indicate that in spite of narrow interline
spaces and noisy components our method is capable to
extract baseline in documents precisely. In addition, in the
case of multi-frequencies pattern, it has been shown that
proposed method can reach its performance to accurate
detection of base lines.
General Term
Image Processing, Document Analysis.
Keyword
Optical Character Recognition, Document Analysis,
Multilingual Documents, Radon Transform, Neural
Networks
1. INTRODUCTION
Analysis of document images for information extraction has
become very prominent in recent years. Wide variety of
information, which has been conventionally stored on paper,
is now being converted into electronic form for better storage
and intelligent processing [7, 6, 4].In any OCR system,
preprocessing including scanning, image enhancement, skew
estimation and correction, base line extraction are the
primary steps that playan important role[2,3,7] in
performance of a OCR system.Baseline is the virtual line on
which semi cursive orcursive text are aligned/ joined.
Generally baseline is kept in mind during both writing and
reading. Baseline detection is not only used for automatic
character recognition but it is also necessary for human
reading. Without baseline detection it is very difficult and big
issue to read the text even for humanand error rate increase
up to 10%[14] while the context sensitive interpretation is
involved.On the other hand, inaccurate segmented text lines
will cause errors in the recognition stage.The rest of the
paper is organized as follows: Section 2 introduces
backgrounds of related works
while section 3 remark properties of Persian/Arabic scripts.
Afterwards, we briefly describe the Radon Transform and its
features. We then in section5 develop the principal of
proposed algorithm to estimate baseline in Persian/Arabic
and text line segmentation in English documents base on
strong features of Radon Transform in details. Finally, our
conclusions are presented in Section 6.
2. RELATED WORKS
Generally, text line segmentation approaches can be
grouped into the different strategies such as projection based,
smearing, grouping, Hough-based, graph-based and Cut Text
Minimization (CTM) approach and etc.
In projection-based approach the vertical projection profile is
obtained by summing pixel values along the horizontal axis
for each y value. From the vertical profile, the vertical gaps
between the text lines can be determined. A profile curve can
be obtained by projecting black/white transitions or the
number of connected components. The profile curve is then
analyzed to find its maxima and minima [15, 16, 17].
In smearing technique, consecutive black pixels along the
horizontal direction are smeared. If the distance between the
white space is within a predefined threshold, it is filled with
black pixels. The bounding boxes of the connected
components in the smeared image are considered as text
lines[18].
On comparison, grouping approach involves building
alignments by aggregating units in a bottom-up approach.
Units such as pixels, connected components, or blocks are
then joined together to form alignments.
Likforman-Sulem and Faure [19] proposed an approach
based on perceptual grouping of connected components of
black pixels. Text lines are iteratively constructed by
grouping neighboring connected components based on
certain perceptual criteria such as similarity, continuity and
proximity. Therefore local constraints on the neighboring
components are combined with global quality measures. To
handle conflicts, thetechnique merges a refinement procedure
combining aglobal and a local analysis. According to the
authors theproposed technique cannot be used on degraded or
poorlystructured documents, such as modern
authorialmanuscripts.
Furthermore, the Hough transform is used for locating
straight lines in images. In [21] an iterative hypothesis
validation strategybased on Hough transform was proposed.
Based on the authors, this technique is able to detect text line
in handwritten documents which may contain lines oriented
in different directions, erasures and annotations between
main lines.
Mahdi Keshavarz Bahaghighat
Electrical and Computer
Department,Raja University,
Qazvin, Iran.
Javad Mohammadi
Islamic Azad university of
Takestan,Electrical and Computer
Department
Takestan, Iran.
j_m59i@yahoo.com
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
10
In the case of graph-based approach, a method based on a
shortest spanning tree search was presented in [20]. The
principle of the method consisted ofbuilding a graph of main
strokes of the document imageand searching for the shortest
spanning tree of this graph.
This method assumes that the distance between the wordsin a
text line is less than the distance between two adjacenttext
lines.
On the other hand, in[22] a new two-stage method for
estimating and correcting the baseline of handwritten sub
words in Farsi and Arabic text lines was introduced. It based
on the template matching algorithm; the candidate baseline
pixels were detected. The writing path and the baseline of the
sub words are estimated in the first and second stages of the
proposed algorithm, respectively. After the estimation in
each stage, the baseline then was adjusted in the correction
phase. Experimental results show the effectiveness of this
approach in adjusting the baseline close to the correct
position.
Finally, a novel piece-wise painting scheme was proposed by
[8,9] to prepare patches of black and white blocks all along
the text line, identify some candidate points, regress a curve
through these candidate points to trace the baseline which is
subsequently stretched straight horizontally and subsequently
we de-tilt the characters to align the text-line with the
horizontal imaginary baseline properly.
3. PROPERTIES OF PERSIAN
/ARABIC
In general Persian/Arabic text either machine printed or
handwritten is written cursively and from right to left. These
letters are normally connected to the baseline.
According to Table1 an Arabic letter might have up to four
different shapes, depending on its relative position in the
word and this increases the number of classes from 28 to
100.Furthermore, Persian languages has four additional
symbols which are shown in Table2 . In fact, Persian writing
uses letters which consist of 32 basic letters, ten numerals,
punctuation marks, spaces, and special symbols.
In order to make quick comparison between Persian/Arabic
and English documents, Table3 have been drawn as well.
The problem of recognizing off-line Persian handwritten
words is important in office automation, as well as in many
other applications. Using the analytical approach to extract
features included in Persian characters seems to be most
appropriate due to the nature of Persian handwritten
characters.
The handwritten Persian character has no fixed pattern, but
has fixed geometrical features. The shapes of handwritten
Persian characters differbetween writers, but fortunately the
geometrical features are always the same.
4. RADON TRANSFORM
The radon function computes projections of an image matrix
such as f along specified directions.
In fact, a projection of f(x,y) is a set of line integrals. The
Radon function computes the line integrals from multiple
sources along parallel paths, or beams, in a certain direction.
The beams are spaced 1 pixel unit apart. To represent an
image, the radon function takes multiple, parallel-beam
projections of the image from different angles by rotating the
source around the center of the image [1, 5, 10]. The Figure1
shows a single projection at a specified rotation angle.
Table1. Arabic alphabet in all its form
Name
Isolated
Initial
Medial
Final
Alif


Ba



Ta



Tha



Jeem



Ha



Kha



Dal


Zal


Ra


Zay


Sin



Shin



Sad



Dhad



Tta



Az



Ain



Ghain



Fa



Qaf



Kaf



Lam



Mim



Nun



Ha




Waow

----

Ya



Name
Isolated
Initial
Medial
Final
Pe
Che
Gaf
Je
Table3. Differences between Latin and Persian Writing
English
Persian
Direction
from left to right
from right to left
Connection
In general each
character is
connected to the
next character with
diagonal strokes
Persian letters are
normally connected
tothe baseline with
horizontal strokes
Character
English characters
Persian letter might
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
11
versions
have few shape
variations
haveup to four
differentshapes,
depending on its
relative position in
theword
Features
English Writing
has specific
geometrical
features
Persian writing has a
unique feature for
each
character, especially
curves and dots
Segmentation
Any analytical
segmentation
approach can
segment the
handwriting into
different letters or
sub-letters
The letters or
segmented
sub-letters are
different
from segments in
English
Fig1.Parallel-Beam Projection at Rotation Angle Theta
Projections can be computed along any angle. In general, the
Radon transform of f(x,y) is the line integral of f parallel to
the y´-axis[11, 12, 13]:
12
, , , ,
1,,
( ) ( , ) (1)
cos( ) sin( ) (2)
x y x y
xy
R x f dy
xy
 

 




2,, sin( ) cos( ) (3)
cos( ) sin( ) (4)
sin( ) cos( )
xy xy
xx
yy
 




 
 
 
Figure2 indicates geometry of Radon Transform .
Fig2: Geometry of Radon Transform
Usually local region in the document image has a consistent
orientation and frequency. Therefore, it can be modeled as a
surface wave that is characterized completely by its dominate
orientation and frequency pattern. This approximation model
is useful enough for our purpose to evaluate the features of
Radon Transform. A local region of the image can be
modeled as a surface wave[25] according to Eq.5:
( , ) cos(2 ( cos( ) sin( ))) (5)I x y A f x y
  

Then an example image and its projection by Radon
Transform are shown in figure3.
(a)
(b)
Fig 3: (a) A well-defined
65 65
synthetic image with
60
.
(b) Radon Transform of image, R(x).
As we can see in the projection function in figure3.(b),
providing that it have been projected on correct orientation
which is parallel to the local orientation of input image, it
can approximately treat as a sinusoidal plane wave.
Experiments were then conducted with a Gaussian noisy
elements applied to the images.Although SNR of noisy
image is relatively high, figures 4 And 5 indicate that the
presence of noise does not considerably impact the
projected pattern. This means that it can noticeably tolerate
added noise
(a)
(b)
Fig4:(a) Noisy image, Gaussian Noise with SNR=-5dB.
(b) Radon Transform
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
12
(a)
(b)
Fig5:(a) Radon Transform Map of noise free image
(b) Radon Transform Map of noisy image with
SNR=-5dB.
5. PROPOSED ALGORITHM
After introducing the theory of Radon Transform, now we
have tendency to proposethe main steps of our developed
algorithm to baseline detection and text line segmentation in
the following stages:
1) Binarisation: This process involves examining the grey-
level value of each pixel in the enhanced image, and, if the
value is greater than the global threshold, then the pixel value
is set to a binary valueone; otherwise, it is set to zero. The
outcome is a binary image containing two levels of
information, the foreground ridges and the background
valleys[3, 7].
2) Orientation estimation: The skew is estimated from the
binary image by applying the method mentioned in [2].
3) Rotation Compensation: it is necessary to shift the angle
in anticlockwise direction by Nearest Neighborhood
Method[2](see Figure6(b)).
4) Base line extraction:Detecting baseline is one of the main
majorities in preprocessing OCR system.
We propose novel approach for base line detection base on
extraction local minima in projected pattern of Radon
Transform. An example of a projected waveform is shown in
Figure6(c). This projection forms an almost sinusoidal-shape
wave with the local minimum points corresponding to the
base line in the document image (note encircled point in
Figure6(c)). Therefore, the base line easily is then computed
by counting the number of pixelsbetween consecutive
minima points and their locations in the projected waveform.
(a)
(b)
(c)
Fig6: Base line extraction base on finding local minimum in projected pattern of Radon transform
(a) Multilingual handwritten Persian-English Image(b)Skew corrected image (c) Radon Transform of skew corrected image
To make more evaluation, further examples are shown in
Figure7. Obtained results clearly indicate that proposed
method compared to other well-known existing methods not
only is capable to extract text lines in any document easily
and precisely without requiring to large numbers of
processing modules but also it can extract the positions of all
lines in a given document simultaneously which it means that
it will strongly reduce the total process time.
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
13
(a) (b)
(c) (d)
Fig7:Some examples of baselinedetection and line segmentation for printed documents
(a),(b)Persian documentand itsRadon Transform
(c),(d)Low-quality English document imageand itsRadon Transform
Now consider the situation in which noise was added to input
image (see Figure8(a)).
As shown in Figure 8(b), noisy elements in the projection
will result in the production of false local minima, which
may impact the location of the true minimum points. These
false minima can then cause an inaccurate estimation of base
line. However, as illustrated by Figure 8(c), if the
projection is smoothed by Savitzky-Golay digital smoothing
filter[24] prior to estimating baselines,the noisy
elements will be eliminated, so leaving only the true local
minimum points.
This additional step has shown to be useful in reducing noise
effects in the projection, and consequentlyit will be essential
to improve the accuracy of the process.
(a)
(b)
(c)
Fig8: The effect of smoothing the projection prior to inter- line detection.
(a) Noisy image (b) Radon Transform before smoothing (c) Radon Transform after smoothing
Furthermore, sometimes document images may have several
frequencies in their patterns. This fact has been illustrated in
Figure9(a) as an example of low-quality Persian handwritten
input image . It can be viewed in Figure9(b) that its Radon
Transform
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
14
(a) (b)
Fig9: Base line detection in low quality and multi-frequencies Persian handwritten document
(a) Low-qualityPersian handwritten input image (b) Radon Transform
projection has two dominate frequencies according to upper
and lower parts of input image in Figure9(a). In the flowing
stage we intend to distinguish between these parts by using
Distributed Time-Delay Neural Network(TDNN)[23].
The FTDNN had the tapped delay line memory only at the
input to the first layer of the static feed forward network. The
distributed TDNN was first introduced
in [23] for phoneme recognition.It is possible for distributed
TDNN to distribute the tapped delay lines throughout the
network. The original architecture was very specialized for
that particular problem. Figure10 shows general two-layers
distributed TDNN.
Fig10: Network structure for Distributed Time-Delay Neural Network
This network can be used for a simplified problem that is
similar to phoneme recognition. The network will attempt to
recognize the frequency content of an input signal.
ConsiderFigure9(b) or its equivalent Figure 11(a) as a input
signal to defined network in which one of two frequencies is
present at given range corresponding to multi-frequencies
pattern of document image input in fjgure9(a).
Target network output are defined to be 1 when the input is
at the low frequency and −1 when the input is at the high
frequency( according to two parts of input document image
in figure9(a)) .
After training networks with epochs = 120 and simulation
with obtained weights,Figure11have been shown for
monitoring the trained network output. The result indicates
that network is able to distinguish between two and
morephonemes accurately.
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
15
(a)
(b)
Fig11: Monitoring the trained network output.
(a) Projected pattern of Radon Transform
(b) Output of Neural Network
6. CONCLUSION
In this literature, the baseline detection and line segmentation
method are discussed in detail. The proposed method was
based on strong features of Radon Transform in terms of
robustness and time efficiency. It just depends on
analyzingprojected pattern of Radon Transform to find local
minima in it.
The algorithm have been tested on more than 350 samples
including both printed and handwriting of Persian/Arabic,
English and also multilingual documents.The main
prominence of our approach rather than the other methods is
that in our algorithm, the baseline is estimated for each line
simultaneously which can noticeably increase the timing
performance. In addition, in the case of multi-frequencies
pattern, it has been shown that proposed method can reach its
performance to accurate detection of base lines.
7. ACKNOWLEDGMENTS
Authors would like to thank Mehri Noormohammadi for her
supports, helps and considerations.
REFERENCES
[1] Z.A. Khan, W. Sohn,.” Real Time Human Activity
Recognition System basedon Radon Transform”, IJCA
Special Issue on “Artificial Intelligence Techniques -
Novel Approaches & Practical Applications” AIT,
2011.
[2] H.K.Chethan, G.Hemantha Kumar,2010. Graphics
separation and skew correction for mobile captured
documents and comparative analysis with existing
methods, International Journal of Computer
Applications(IJCA) (0975 8887), Volume 7 No.3.
[3] N. Priyanka, S.Pal, R. Mandal,”Line and Word
Segmentation Approach for Printed DocumentsIJCA
Special Issue on “Recent Trends in Image Processing
and Pattern Recognition” , RTIPPR, 2010.
[4] S.Akram, M.U.Din Dar, A.Quyoum, “Document Image
Processing - A Review”, International Journal of
Computer Applications (IJCA) (0975 8887) Volume
10 No.5, November 2010.
[5] A.Bouchemha, A.Nait-Ali,N. Doghmane,”A Robust
Technique to Characterize the Palmprint usingRadon
Transform and Delaunay Triangulation, International
Journal of Computer Applications (0975
8887)Volume 10- No.10, November 2010.
[6] S.Prasad,V.K.Singh,A. Sapre, “Handwriting Analysis
based on Segmentation Method for Prediction of Human
Personality using Support Vector Machine”,
International Journal of Computer Applications (0975
8887)Volume 8 No.12, October 2010
[7] M.Hangarge,B.V.Dhandra,”Offline Handwritten Script
Identification in Document Images”, International
Journal of Computer Applications (0975 8887)
Volume 4 No.6, July 2010.
[8] P. Nagabhushan, A.Alaei, “Tracing and Straightening
the Baseline inHandwritten Persian/Arabic Text-line: A
NewApproach Based on Painting-technique”, (IJCSE)
International Journal on Computer Science and
EngineeringVol. 02, No. 04, 2010, 907-916.
[9] A.Alaei, P. Nagabhushan, U.Pal, “A New Text-line
Alignment Approach Based on Piece-wise Painting
Algorithm for Handwritten Documents”, 2011
International Conference on Document Analysis and
Recognition, IEEE DOI 10.1109/ICDAR.2011.73.
[10] VahidKiani, Reza Pourreza& Hamid Reza Pourreza,
Offline Signature Verification Using Local Radon
Transform and Support Vector Machines,International
Journal of Image Processing (IJIP) Volume(3), Issue(5).
[11] J.Mohammadi, M.Keshavarz Bahaghighat ,R.Akbari ,
,Vehicle Speed Estimation Based On The Image Motion
Blur Using RADON Transform , 2010 2nd International
Conference on Signal Processing Systems (ICSPS).
[12] F. Hjouj, D.W. Kammler. “Identification of Reflected,
Scaled, Translated, and Rotated Objects From Their
Radon Projections”. IEEE Transactions on Image
Processing, 17(3):301-310,2008.
[13] M. R. Hejazi, G. Shevlyakov, Y-S Ho. “Modified
Discrete Radon Transforms and Their Application to
Rotation-Invariant Image Analysis”. IEEE 8th
Workshop on Multimedia Signal Processing, 2006.
[14] M.I.Razzak, M.Sher ,S. A.Hussain,” Locally baseline
detection for online Arabic scriptbased languages
character recognition”, International Journal of the
Physical Sciences Vol. 5(7), pp. 955-959, July 2010.
[15] A. Zahour, B. Taconet, P. Mercy, and S. Ramdane,
“ArabicHand-written Text-line Extraction”, in
Proceedings of theSixth International. Conference on
International Journal of Computer Applications (0975 8887)
Volume 51 No.2, August 2012
16
Document Analysis andRecognition, ICDAR 2001,
Seattle, USA, September 10-132001, pp. 281285.
[16] M. Arivazhagan, H. Srinivasan, S. N. Srihari, "A
Statistical Approach to Handwritten Line
Segmentation", in Proceedings of SPIE Document
Recognition and Retrieval XIV , San Jose, CA,
February 2007.
[17] G. Tímár, K. Karacs, Cs. Rekeczky, "Analogic
Preprocessing and Segmentation Algorithms For Offline
Handwriting Recognition", Proceedings of IEEE
CNNA’02,World Scientific 2002, pp.407-414.
[18] Y. Li, Y. Zheng, D. Doermann, and S. Jaeger, “A new
algorithm for detecting text line in handwritten
documents,” in International Workshop on Frontiers in
Handwriting Recognition, 2006, pp. 3540.
[19] L. Likforman-Sulem, C. Faure, "Extracting text lines in
handwritten documents by perceptual grouping",
Advances in handwriting and drawing: a
multidisciplinary approach,C. Faure, P. Keuss, G.
Lorette and A. Winter Eds, Europia, Paris, 1994, pp.
117-135.
[20] I.S.I. Abuhaiba, S. Datta, M.J.J. Holt, ''Line Extraction
and Stroke Ordering of Text Pages'', Proceedings of the
Third International Conference on Document Analysis
and Recognition, Montreal, Canada, 1995, pp. 390-393.
[21] L. Likforman-Sulem, A. Hanimyan, C. Faure, “A
Hough based algorithm for extracting text lines in
handwritten documents", Third International
Conference on Document Analysis and Recognition,
Vol. 2, August 1995, pp. 774-777.
[22] M. Ziaratban and K. Faez, “A Novel Two-Stage
Algorithm forBaseline Estimation and Correction in
Farsi and Arabic Handwritten Text line,” Proc. of
International Conference on Pattern
Recognition(ICPR’08) , 2008, pp. 1-5.
[23] Waibel, A., T. Hanazawa, G. Hilton, K. Shikano, and K.
J. Lang, "Phoneme recognition using time-delay neural
networks," IEEE Transactions on Acoustics, Speech,
and Signal Processing, Vol. 37, 1989, pp. 328339.
[24] J.Luo, K.Ying, P.He, J.Bai ,”Properties of Savitzky–
Golay digital differentiators”, Elsevier Inc. Digital
Signal Processing 15 (2005) 122136.
[25] Sharat S. Chikkerur, Alexander N. Cartwright and Venu
Govindaraju, "Fingerprint Image Enhancement Using
STFT Analysis", PHD thesis , Center for Unified
Biometrics and Sensors University at Buffalo, NY, USA
(2003).
... On the other hand, there are some approaches to predict BTC price based on Natural Language Processing (NLP) and sentiment analysis. Today, NLP as an AI (artificial intelligence) technology and Deep learning [9,2] are used together in advanced text mining/analytic tools [23,4,26,8,7]. These approaches get social media text data from Twitter, Facebook, and etc., as the input and try to draw a link between the content of daily messages and the BTC price. ...
Article
Full-text available
Cryptocurrencies are digital assets that can be stored and transferred electronically. Bitcoin (BTC) is one of the most popular cryptocurrencies that has attracted many attentions. The BTC price is considered as a high volatility time series with non-stationary and non-linear behavior. Therefore, the BTC price forecasting is a new, challenging, and open problem. In this research, we aim the predicting price using machine learning and statistical techniques. We deploy several robust approaches such as the Box-Jenkins, Autoregression (AR), Moving Average (MA), ARIMA, Autocorrelation Function (ACF), Partial Autocorrelation Function (PACF), and Grid Search algorithms to predict BTC price. To evaluate the performance of the proposed model, Forecast Error (FE), Mean Forecast Error (MFE), Mean Absolute Error (MAE), Mean Squared Error (MSE), as well as Root Mean Squared Error (RMSE), are considered in our study.
... On the other hand, there are some approaches to predict BTC price based on Natural Language Processing (NLP) and sentiment analysis. Today, NLP as an AI (artificial intelligence) technology and Deep learning [9,2] are used together in advanced text mining/analytic tools [23,4,26,8,7]. These approaches get social media text data from Twitter, Facebook, and etc., as the input and try to draw a link between the content of daily messages and the BTC price. ...
Article
Full-text available
Cryptocurrencies are digital assets that can be stored and transferred electronically. Bitcoin (BTC) is one of the most popular cryptocurrencies that has attracted many attentions. The BTC price is considered as a high volatility time series with non-stationary and non-linear behavior. Therefore, the BTC price forecasting is a new, challenging, and open problem. In this research, we aim the predicting price using machine learning and statistical techniques. We deploy several robust approaches such as the Box-Jenkins, Autoregression (AR), Moving Average (MA), ARIMA, Autocorrelation Function (ACF), Partial Autocorrelation Function (PACF), and Grid Search algorithms to predict BTC price. To evaluate the performance of the proposed model, Forecast Error (FE), Mean Forecast Error (MFE), Mean Absolute Error (MAE), Mean Squared Error (MSE), as well as Root Mean Squared Error (RMSE), are considered in our study.
... There are various problems in computer vision and image processing [36][37][38][39][40]. Object detection, as one of the most applicable problems in computer vision, is the process of categorizing especial objects such as cars, pedestrian, bicycles, traffic signs, faces, etc. [36,37]. ...
Article
Full-text available
Today, energy issues are more important than ever. Because of the importance of environmental concerns, clean and renewable energies such as wind power have been most welcomed globally, especially in developing countries. Worldwide development of these technologies leads to the use of intelligent systems for monitoring and maintenance purposes. Besides, deep learning as a new area of machine learning is sharply developing. Its strong performance in computer vision problems has conducted us to provide a high accuracy intelligent machine vision system based on deep learning to estimate the wind turbine angular velocity, remotely. This velocity along with other information such as pitch angle and yaw angle can be used to estimate wind farm energy production. For this purpose, we have used SSD (Single Shot Multi-Box Detector) object detection algorithm and some specific classification methods based on DenseNet, SqueezeNet, ResNet50, and InceptionV3 models. The results indicate that the proposed system can estimate the rotational speed with about 99.05% accuracy.
... Bahaghighat et al. [7] use Radon transform to integrate pixel values along parallel beams in a speciĄc direction. Calculating radon projections from multiple directions can address the problem of skewed lines. ...
Thesis
Even in the era of digital contracts and online applications, certain processes are not 100% paperless. Many data-driven applications require information extracted from modern documents and forms, invoices, and contracts. This thesis work proposes a robust text line detection solution for day-to-day document analysis needs. The proposed solution exploits the state-of-the-art document analysis techniques to provide a better performing text line detector than the ones currently being used at omni:us. Two backbone semantic segmentation networks dhSegment and ARU-Net are selected for implementation and performance comparison. A 'universal' post-processing solution is also implemented for text line detection. The post-processing provides additional functionality of line joining that further improves the primary line detection hypothesis generated by the neural network. This work also puts forward two out-of-the-box height estimation methods for detected lines. Both these methods calculate initial line-height estimates using other detected lines. The lines for which primary estimation fails are shortlisted. The height estimates for these lines could be re-calculated using either morphological analysis or neural network trained on x-height annotation. A novel evaluation script is also implemented to compare the height estimation hypotheses with the ground-truth. This work also aims to test the theory that a powerful neural network trained on challenging data (like historical documents), should perform well on different but less challenging data. Various experiments conducted for the implemented models, indeed assert this theory. The implemented models show promising performance on both handwritten and typewritten text and work well for different test sets. Implemented models also give consistently better performance when compared to Abbyy FineReader Engine and Tesseract OCR.
... There are a lot of image processing approaches to enhance quality of images and videos [32,[86][87][88][89][90] but at the first, its metric should be defined. Now, well-known metrics for assessing the quality of images and videos are VQM (Video quality measurement), PVQM (Perceptual video quality measure), MPQM (Moving picture quality metrics), SSIM (Structural Similarity Index), MSSIM (Mean SSIM), FSIM (feature-similarity), PSNR (Maximum signal to noise ratio), and HVS (human visual system) [85,[91][92][93]. ...
Article
Full-text available
Today, Smart Grids (SGs), as the goal of the next-generation power grid system, span extremely wide aspects from power generation to end-user utilities. In smart grids, Energy and Information flows are mutually dependent and performance degradation of one side may have a high impact on the other side. In this work, we introduce our architecture for monitoring of Wind Turbine (WT) farms in smart grids. In our proposed system an industrial camera is embedded on a Wireless Cognitive Radio node for each WT to capture appropriate images and stream videos to the cognitive coordinator. Any packet loss in transmission between an embedded cognitive node and the coordinator can degrade peak signal-to-noise ratio (PSNR) of the received images. The image streaming is a delay sensitive transmission which should be done in harsh environments in SGs. To tackle these challenging issues, we introduce our efficient model, called JOPSS, for joint optimization of both packet size and Number of Spectrum Sensing Iterations (NSSI) during image transmission in time-restricted conditions. We define our proposed objective function as the quotient of the Overhead Time and the Effective Transmission Time (ETT). In addition, we introduce our methods based on the Minimum of Overhead Time Channel Selection (MOTS) for the efficient channel selection along with Dynamic Parameter Updating Procedure (DPUP) to benefit different strategies in Mandatory and Proactive Handoffs (MHO/PHO). The obtained results show that noticeable improvements in both PSNR and feature-similarity (FSIM) can be achieved on our models JOPSS and JOPSS-SAFE, respectively.
... Diacritics segmentation using projection method is explained by Modi et al. [9]. Bahaghighat et al. [10] have presented a method for baseline detection and line segmentation. For baseline detection, projected pattern analysis of Radon Transform is implemented. ...
... In the case of applying the normal Radon transform R θ (x ) to the image described by the function f (x, y), the result of the transformation is a set of linear integrals of the function f (x, y), obtained by rotating the X-axis (Fig. 5 [7]). In other words, to represent an image through the normal Radon transform, it is necessary to take several parallel-beam image mappings, calculated by rotating the original line (X-axis) around the center of the image at different angles [3]. ...
Article
Full-text available
Printed documents protection problem against leakage is still one of the relevant. Existing security tools allow us to protect electronic text documents, however are ineffective in protecting their printed versions. This research presents the marking approach for text electronic documents invariant to the print-and-scan transformation. During marker embedding source text line spacing values are changing to the specified values within perceptual invisibility. The watermark extraction is carried out from the images containing text and based on normal Radon transform and Gaussian mixture model. This marking approach is robust to various image transformations and distortions. The accuracy of embedded information extraction was more than 0.98 for 200 DPI images and line spacing value change about 490 micrometers.
Article
The successful use of deep learning solutions for document image segmentation typically relies on a large number of manually labeled groundtruth examples, which is expensive to obtain for historical document images that have significant noise effects and variation. At the same time, successful applications of deep learning solutions for document image segmentation have rich potential to facilitate greater levels of description in archival collections (e.g., at and below the item-level). These greater levels of description are critical to increasing access and use of archival collections across an array of research domains. In response, this paper investigates whether an augmentation-based approach to generating pseudo-groundtruth can be effective with a limited number of labeled images in a document segmentation application. The rationale is that if we can decrease the cost of generating groundtruth through augmentation-based approaches, we can use these approaches as part of description and access pipelines for historical library and archival collections. In this initial exploration, we first generate synthetic images and corresponding pseudo-groundtruth using a set of existing degradation-based augmentation models from a small number of labeled actual images. When generating synthetic images, we control the visual quality distortion based on OCR word-level confidence to avoid generating images unlikely to be present in the dataset. Then, we perform several investigations to examine the impact of incorporating pseudo-groundtruth data in the training of the deep learning network dhSegment and further evaluate the use of multiple combinations of degradation models. We also assess the generalizability of the approach by applying the trained network on a larger dataset. Our investigations primarily use real-world datasets known to have significant noise effects. Results show that augmentation-based pseudo-groundtruth generation is capable of improving segmentation performance with the use of the full original dataset and requires only 30% of the original dataset. Results also show that using more than three degradation models is likely to cause overfitting during training. Furthermore, we show that a segmentation network trained on pseudo-groundtruth data has generalization capability.
Article
Full-text available
Today, power generation from clean and renewable resources such as wind and solar is of great salience. Smart grid technology efficiently responds to the increasing demand for electric power. Intelligent monitoring, control, and maintenance of wind energy facilities are indispensable to increase the performance and efficiency of smart grids (SGs). Integration of state-of-the-art machine learning algorithms and vision sensor networks approaches pave the way toward enhancing the wind farms’ performance. The generating power in a wind turbine farm is the most critical parameter that should be measured accurately. Produced power is highly related to weather patterns, and a new farm in a near area is also likely to have similar energy generation. Therefore, accurate and perpetual prediction models of the existing wind farms can be led to develop new stations with lower costs. The paper aims to estimate the angular velocity of turbine blades using vision sensors and signal processing. The high wind in the wind farm can cause the camera to vibrate in successive frames, and the noise in the input images can also strengthen the problem. Thanks to couples of solid computer vision algorithms, including FAST (Features from Accelerated Segment Test), SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features), BF (Brute-Force), FLANN (Fast Library for Approximate Nearest Neighbors), AE (Autoencoder), and SVM (support vector machines), this paper accurately localizes the Hub and track the presence of the Blade in consecutive frames of a video stream. The simulation results show that determining the hub location and the blade presence in sequential frames results in an accurate estimation of wind turbine angular velocity with 95.36% accuracy.
Article
Full-text available
Baseline detection is one of the most important step in character recognition and has direct influence on recognition result. Due to the complexity of the Urdu scripts based languages, handwritten character recognition is a very difficult task as compared to other languages. Baseline detection is one of the main issue and basic step of mostly preprocessing operations that is, normalization, skewness, secondary strokes segmentation and also in feature extraction. This paper presents a novel method of baseline detection for cursive handwritten Urdu script. The proposed approach is divided into three steps: diacritical marks segmentation, primary baseline estimation and local baseline estimation. The local baseline extraction is estimated using the features extracted from ending shape of the words. Due to structural difference between Nasta'liq and Naskh style, different rules are formed for baseline estimation.
Article
Full-text available
Curvilinear text line detection and segmentation in handwritten documents is a significant challenge for handwriting recognition. Given no prior knowledge of script, we model text line detection as an image segmen-tation problem by enhancing text line structure using a Gaussian window, and adopting the level set method to evolve text line boundaries. Experiments show that the proposed method achieves high accuracy for detecting text lines in both handwritten and machine printed doc-uments with many scripts.
Conference Paper
Full-text available
This paper presents two novel transforms based on the discrete Radon transform. The proposed transforms smartly solve two inherent problems of the Radon transform in rotation estimation in digital images, i.e., direction-dependency and nonhomogeneity, that come from the different numbers of pixels projected on a line for different directions and/or coordinates of a direction. While the first transform considers the sample mean operator on the same sets of pixels for a direction instead of summation in the discrete Radon transform, the second transform uses the mean operator on sets of pixels with the equal number of elements. In order to show the efficiency of the proposed transforms, we apply them on image collections from the Brodatz album for estimating the directional information. Experimental results show a significant increase in correct estimation as well as in the processing time compared to the conventional Radon transform
Conference Paper
Full-text available
This paper presents a new two-stage method for estimating and correcting the baseline of handwritten subwords in Farsi and Arabic text lines. Based on the template matching algorithm, the candidate baseline pixels are detected. The writing path and the baseline of the subwords are estimated in the first and second stages of the proposed algorithm, respectively. After the estimation in each stage, the baseline is adjusted in the correction phase. Experimental results show the effectiveness of this approach in adjusting the baseline close to the correct position.
Conference Paper
Full-text available
Because of writing styles of different individuals, some of the text-lines may be curved in shape. For recognition of such text-lines, their proper alignment is necessary. In this paper, we propose a text-line alignment technique based on painting algorithm. Here at first, Piece-wise Painting Algorithm (PPA) is used to get a number of black and white rectangular patches all along the text-line for text-line alignment. Identifying the degree of oscillation of the input text-line, some candidate pixels are also obtained based on horizontal projection and center points of the black patches. Using the degree of oscillation of the input text image and the candidate pixels a curve or straight line is fit to trace the baseline. Subsequently, all components of the text-line are deskewed based on analyzing the characteristic of the fit curve or line to align the components with respect to the horizontal imaginary baseline. The proposed algorithm was evaluated with 128 Persian handwritten text-lines containing 4317 sub words. Experimental analysis showed that 92.31% of the sub words were accurately aligned. Further, the proposed algorithm was tested with another Persian handwritten text-lines dataset [6] and remarkable results were achieved.
Conference Paper
Full-text available
In this paper we propose a novel approach for vehicle speed estimation based on the motion blur occurring in the image taken by still camera. when the camera shutter remains open for an extended period of time, because of existence relative motion between camera and object, Motion blur is occurred. Blurred image has two different parameters. One of them is direction of motion and the other is length of motion. For extracting blurred image parameter, first, the Fourier transform is taken from image and then Radon transform is used for extracting direction of motion. For extracting length of motion, we used our purposed method. There is a relation between motion parameters and vehicle speed. Finally Speed estimation is performed by imaging geometry, extracted blur parameters and the camera parameters. Proposed method improved accuracy of measurement motion blur parameters and then accuracy of speed estimation than the others purposed methods.
Conference Paper
Full-text available
Although detecting text lines in machine printed docu- ments is typically considered a solved problem, it is still a challenge to segment handwritten text lines in the general sense given no prior knowledge of script. This paper mod- els text line detection as an image segmentation problem by enhancing text line structures using a Gaussian window and adopting the level set method to evolve text line bound- aries. Experiments show that the method, which is script independent, achieves high accuracy for detecting text lines in heterogeneous handwritten documents.
Conference Paper
Full-text available
Contrary to popular belief, despite decades of research in fingerprints, reliable fingerprint recognition is an open problem. Extracting features out of poor quality prints is the most challening problem faced in this area. This paper introduces a new approach for fingerprint enhancement based on Short Time Fourier Transform(STFT) Analysis. STFT is a well known technique to analyze non-stationary signals. We extend its application to 2D images. The algorithm simultaneously estimates all the intrinsic properties of the fingerprints such as the foreground region mask, local ridge orientation and local frequency orientation. Furthermore we propose a probabilistic approach of robustly estimating these parameters. We compare the proposed approach to other filtering approaches and show that our tech- nique performs favorably. We also objectively measure the improvement in recognition rate due to our enhancement. We obtain a 17% improvement in the recognition rate on a set of 800 images from the FVC2002 database.
Article
A new technique to segment a handwritten document into distinct lines of text is presented. Line segmentation is the first and the most critical pre-processing step for a document recognition/analysis task. The proposed algorithm starts, by obtaining an initial set of candidate lines from the piece-wise projection profile of the document. The lines traverse around any obstructing handwritten connected component by associating it to the line above or below. A decision of associating such a component is made by (i) modeling the lines as bivariate Gaussian densities and evaluating the probability of the component under each Gaussian or (ii)the probability obtained from a distance metric. The proposed method is robust to handle skewed documents and those with lines running into each other. Experimental results show that on 720 documents (which includes English,Arabic and children's handwriting) containing a total of 11, 581 lines, 97.31% of the lines were segmented correctly. On an experiment over 200 handwritten images with 78, 902 connected components, 98.81% of them were associated to the correct lines.