Content uploaded by Mustafa Dhiaa Al-Hassani
Author content
All content in this area was uploaded by Mustafa Dhiaa Al-Hassani on Feb 28, 2014
Content may be subject to copyright.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
48
OPTICAL CHARACTER RECOGNITION (OCR) SYSTEM FOR MULTI-
FONT ENGLISH TEXTS USING DCT & WAVELET TRANSFORM
Dr. MUSTAFA DHIAA AL-HASSANI
Computer Science/Mustansiriyah University, Baghdad, Iraq
ABSTRACT
Optical Character Recognition (OCR) is a type of computer software designed to translate
images of handwritten or typewritten text (usually captured by a scanner or a camera) into machine-
editable text by recognizing characters at high speeds one at a time. OCR began as a field of research
in pattern recognition, artificial intelligence and machine vision. It is becoming more and more
important in the modern world according to economic reasons and business requirements. It helps
humans ease their jobs and solve more complex problems by eliminating the time-consuming spent
by human operators to re-type the documents and reduce error-prone processes.
The presence of any type of noise or a combination of them can severely degrade the
performance of OCR system. Though, a number of preprocessing techniques are considered in the
present work in order to improve the obtained accuracy of the recognized text. An OCR system for
3185 training samples and 13650 testing samples is presented for multi-font English texts.
Experiments have shown that wavelet features produce better recognition rates 96% than DCT
features 92%. An improvement overall recognition rates (about 3%) are obtained after classifying
characters according to the proportion of Height to Width feature to produce 99% for wavelet and
95% for DCT.
Keywords: DCT, Feature Extraction, Optical Character Recognition (OCR), Pattern Recognition,
Segmentation, Wavelet Transform.
1. INTRODUCTION
Since the start of the computing era, information has been represented digitally so that it can
be processed by computers. Approximately, more than 200 million paper books are being published
yearly. Paper books and documents were abundant and widely being published at that time; and
hence, there was a need to convert them into digital format. OCR was invented to translate the
traditional paper-based books into digital e-books(i.e., electronic files). It was estimated that over 2
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 6, November - December (2013), pp. 48-61
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
49
million e-books are available for download on the Internet. E-books require less storage space than
paper books, they can also be replicated many times, shared online, and digitally processed easily,
mainly searched, translated, edited, and annotated. OCR systems are not that perfect as they are
erroneous and exhibit spelling errors in the recognized output text, especially when the images being
scanned are of poor printing quality [1, 2].
OCR is one of the most fascinating and challenging areas of pattern recognition with various
practical applications: Automated postal address reading, ZIP code reading, checks, payment slips,
income tax forms, business forms, automatic car plate-number recognition, and it can be used as an
aid for visually handicapped people when combined with speech synthesizer [1, 3, 4].
Automatic character recognition is a subfield of pattern recognition and can either to be on-
line or off-line. On-line recognition refers to those systems where the data to be recognized is input
through a tablet digitizer, which acquires in real-time the position of the pen tip as the user writes. In
contrast, off-line systems input the data from a document through an acquisition device, such as a
scanner or a camera. Off-line character recognition is moreover divided into two categories:
machine printed and handwritten [5].
The printed texts include all the printed materials such as: books, newspapers, magazines, and
documents which are the outputs of typewriters, printers or plotters. OCR systems of machine-
printed documents can be classified into [5, 6]:
• Mono-font OCR systems that deals with documents written with one specific font,
• Multi-font OCR systems that handles a subset of the existing fonts (recognition of more than
one font),
• Omni-font OCR systems that allows the recognition of characters in any font.
Today many types of OCR software available in the markets like: Desktop OCR, Server OCR,
Web OCR etc. Accuracy rate of any OCR tool varies from 71% to 98% [7].
2. PROBLEM DEFINITION
In modern society, we rely heavily on computers to process huge volumes of data. Related to
this and for economic reasons or business requirements, there is a great demand for quickly
converting the printed information in a document into an edited text in the computer. Often these
data exist on paper and they have to be typed into the computer by human operators. Such time-
consuming and error-prone processes have been lightened by the invention of OCR systems [1].
Unfortunately, OCR systems are still erroneous and inaccurate, especially when the source document
is of low printing quality [2]. Therefore, accuracy of these systems can be dependent on text
preprocessing and segmentation algorithms. Sometimes it is difficult to retrieve text from the image
because of different size, style, orientation, complex background of image … etc which produce
misspellings in the recognized text [2, 7].
OCR technology allows machine to recognize the text automatically in the same way as the
combination of eye and mind of human body. In development of computerized OCR system, few
problems can occur [7]:
•
There is very little visible difference between some letters and digits for computers to
understand. For example it might be difficult for the computer to differentiate between digit
'0' and letter 'o' / 'O',
•
It might be very difficult to extract text, which is embedded in very dark background or
printed on other words or graphics.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
50
3. AIM OF THE WORK
This paper aims to build an OCR Multi-font system which converts the printed English texts
in a paper document (optical patterns exist in a digital image) into an edited text (its corresponding
alphanumeric form or other symbols) in the computer in order to:
• eliminate the time-consuming spent by human operators to re-type the documents (the huge
volumes of data),
• reduce the possible errors occurred in the typing process,
• save money (by cancelling the need of human typists),
• preserve space needed for paper books.
4. THE PROPOSED OCR SYSTEM
The following block diagram, shown in Figure (1), illustrates the proposed OCR system model.
Figure (1): Block-Diagram of the Proposed OCR System Model
The input scanned image text is passed through a sequence of preprocessing steps (noise
removal, foreground/ background separation, normalization, and binarization) prior to characters
segmentation phase. Then, feature extraction methods (Discrete Cosine Transform (DCT) followed
by Zigzag process, or Wavelet Transform (WT)) are applied to the segmented characters. The
obtained feature-set is either stored in database as templates or references during the training phase
when building database (DB) of characters features-set or is compared directly during the testing
phase to those DB references in a pattern matching stage. Finally, decision rule is applied to produce
recognition results beside the best matched characters.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
51
4.1 The Input Scanned Image
The input documents used for training and testing are scanned and digitized by a page
scanner at 300 dpi resolution connected to a computer system and saved in BMP format of 256 gray-
levels. A number of scanned images that are used as inputs to our OCR system model are shown in
Figure (2) below:
Figure (2): 3–Samples of Scanned Images
4.2 Document Image Preprocessing
Digital Images are generally corrupted by noise during the process of acquisition and
transmission. This noise degrades the quality of digital image which produces several tiny dots
scattered due to uneven gray-scale intensities which causes poor recognition rates. Consequently the
performance of any system manipulating these images is also decreased. Therefore, removal of noise
in document images corrupted by Gaussian and Impulse noises before OCR is important to guarantee
better accuracy of characters recognition [1, 8].
Thus, image enhancement techniques adopted in this system model in the following sequence
are employed prior to segmentation in order to simplify the process of characters segmentation [9,
10]:
a) Noise Removal: is applied to the scanned document images for two primary purposes: to
eliminate the noise and to give an image a softer effect. The spatial convolution mask of
Gaussian filter used for low-pass filtering is shown in Figure (3).
Figure (3): Gauss core – weighted on distance (Gaussian filter)
Gaussian filter smoothen the image to match the pixelsnearby in a way that no point in the image
differ from its surroundings to agreater extent. Image smoothing is accomplished in the spatial
domain toremove errors, incorrect data and simplify the acquisition process of patterns
[8, 10, 11].
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
52
b) Foreground/ Background Separation:is the process of separating the foreground regions (the
area of interest containing the printed-text) in the image from the background regions (the
useless area outside the borders of printed-text). The background regions generally exhibit a very
low gray-scale variance value, whereas the foreground regions have a very high variance. Hence,
a method based on variance thresholding can be used to perform the separation. Firstly, the
image is divided into blocks and thegray-scale variance is calculated for each block in the image.
If the variance is less than the global threshold, then the block is assigned to be a background
region; otherwise, it is assigned to be part of the foreground. The gray-level variance for a block
of size W × W is defined as [9, 11]:
…… (1)
where
V(k)
is the variance for
k
th
block,
I(i, j)
is the gray-level value at pixel(i, j), and
M(k)
is the
mean gray-level value for the k
th
block.
c) Normalization: is utilized to reduce the effect of non-uniform intensities and improving image
quality by stretching its histogram. To be able to normalize an image the area which is to
normalize within has tobe known. Thus it is necessary to find the highest and the lowest pixel
value ofthe current image. Every pixel is then evenly spread out along the scale by thefollowing
equation [10, 11]:
…… (2)
where
I(i, j)
is the gray-level value at pixel (i, j),
I
min
is the smallest gray-level value found in the
image,
I
max
is the largest gray-level value found in the image,
M
represents the new maximum
gray-level value of the scale (i.e.,
M=255
), and
N(i, j)
represent the normalized gray-level value
at pixel (i, j).
d) Binarization:Is the process of turning a gray scale image into a binary image (only two levels of
interest 0 and 1) in order to improve the contrast, and consequently facilitates the feature
extraction process.It is impossible to find a working global threshold value that can be used
efficiently on every image because of the variations among the scanned images.Therefore
algorithms to find the optimal value, based on
localized thresholds
, must be applied separately
on each image to get a functional binarization. The image is partitioned into smaller blocks and
threshold values are then calculated for each of these blocks. This enables adaptations that are
not possible with global calculations. Localized thresholds demand a lot more calculations but
mostly compensate it with better results [9, 11].The local mean threshold for k
th
block of size W
× W is computed below:
…… (3)
where
Block (i, j)
is the gray-level value at pixel (i, j).If the pixel value is lower than the threshold
then the pixel is assigned to be part of the
Printed-text
; otherwise, it is assigned to be part of
Background
.
M
II
IjiI
jiN ×
−
−
=
minmax
min
),(
),(
∑ ∑
−
=
−
=
=
1
0
1
0
2
),(
1
)(
W
i
W
j
jiBlock
W
kMeanLocal
( )
∑ ∑
−
=
−
=
−=
1
0
1
0
2
2
)(),(
1
)(
W
i
W
j
kMjiI
W
kV
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
53
Figure (4) illustrates the effect of applying the document image preprocessing techniques to a
scanned document sample.
Figure (4): The Effects of the proposed preprocessing steps on a scanned document image
sample
From Figure (4.c), it is noticeable that background regions are specified by black color,
which is only to indicate the effect of this process despite the fact that in practical application the
background is represented by white color.
4.3 Characters Segmentation
Segmentation is an important phase and the accuracy of any OCR heavily depends on it,
where incorrect segmentation leads to reduction in recognition accuracy. Segmentation is the process
thatdivides the whole document into smaller components, which include [6, 12]:
• Line,
• Word, and Character segmentation
The procedure adopted in this work for analyzing imagesto detect characters, as shown in
Figure (5), is listed in the following sequence:
Step1: Perform "Row Scan" (Line segmentation) to find the number of lines and boundaries of
each line in any input document image within which the detection can proceed.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
54
Algorithm1
a) Get image data from the processed image.
b) Initialize lines boundaries (T: Top, B: Bottom) to -1.
c) Perform row scan (from 1
st
to last row) for pixels value= 0 (i.e, black).
d) Set the number of lines to 0.
e) If black pixel is detected then register T as the top of the current line and move the pointer to
the next row & 1
st
column. Otherwise, continue to the next pixel (from left to right).
f) If black pixel is found and T <> -1 then update B as the bottom of the current line and move
the pointer to the next row & 1
st
column.
g) If no black pixel is found in the row and T <> -1 then increment the number of lines by 1.
h) Start below the bottom of the last line found and repeat steps e) – g) to detect subsequent lines
(Stop when the bottom of image is reached).
i) Process dot "." problem spacing found in (i and j) characters by merging lines below certain
space threshold according to the font type & size and decrement the number of lines by 1 (if
such case occurred).
j) Print out the number of lines detected and draw line boundaries on the processed image for
each detected lines (for: T–1 and B+1).
Step2: Perform "Column Scan" (orthogonally from 1
st
to last column) only for detected lines.
Thus, detecting characters in an image does not necessarily involve scanning the whole image all
over again.
Algorithm2
a) Initialize characters boundaries (L: Left, R: Right) to -1.
b) Perform column scan (from 1
st
to last column) for pixels value= 0 (i.e, black).
c) Set the number of characters to 0.
d) If black pixel is detected then register L as the left of the current character and move the
pointer to the next column & 1
st
row. Otherwise, continue to the next pixel (from top to
bottom).
e) If black pixel is found and L <> -1 then update R as the right of the current character and
move the pointer to the next column & 1
st
row.
f) If no black pixel is found in the column and L <> -1 then increment the number of characters
by 1.
g) Scan up to the right of the character found and repeat steps d) – f) to detect subsequent
characters (Stop when the right-end of the last line is reached).
h) Print out the number of characters detected and draw line boundaries on the image for each
detected characters (for:L–1 andR+1).
Step3: Perform "Row Scan"once more on the results obtained from the previous step in order to
detect the actual character (top & bottom) boundaries.
Step4: Bitmap imagesare created on the hard-disk for each segmented character relative to its
boundaries; and its header information is generated from the original scanned image header besides
updating dimensions.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
55
Figure (5): Lines and Characters boundary detection for a scanned image sample
From the above figure, it is obvious that the detected lines-bound (top & bottom) stated by
the "red-color" might not necessarily be the actual bounds for the characters in the same line because
the characters have different heights. Hence, a confirmation of top and bottom boundaries for each
character is needed as stated by the "green-color". The "blue-color" illustrates the detected
characters-bound (left & right) for different characters widths.
4.4 Database Construction
In general, any OCR system depends on training samples as input data. In this work, database
of (91) samples were collected from the data sets shown in Table (1) below for different font types
("Arial", "Calibri", "Courier New", "Lucida Sans" and "Times New Roman") and for different font
sizes (8, 10, 12, 14, 16, 18, 20) to produce (3185) samples.
Table (1): Data-Set Samples
No.
Data Sets No. of
Samples Samples
1 Digits 10 0 1 2 … 9
2 Capital English
Letters 26 'A' 'B' 'C' … 'Z'
3
Small English
Letters 26 'a' 'b' 'c' … 'z'
4 Some common
ASCII Symbols 29 . , ; ' " : [ ] - + * / = ( ) { }
<> ! @ # $ % ^ & ? \ _
4.5 Feature Extraction
Feature extraction is part of the data reduction process by forming a new “smaller” set of
features from the original feature set of the patterns. This can be done by extracting some numerical
Line Top
Line Bottom
1
2
3
4
Step 1
Step 3
Character Top
Character Bottom
Step 2
Character bounds
Left & Right
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
56
measurements from raw input patterns. Image features are of major importance in the isolation of
regions of common property within an image[9, 10, 13]. In this work, two-sets of features were
extracted from the segmented characters either by the use of Discrete Cosine Transform (DCT) or
the spectral properties of Wavelet transform.
a) Discrete Cosine Transform (DCT)
DCT has become a standard method for many image processing& video compression
algorithms. The two-dimensional DCT can be computed using the one-dimensional DCT
horizontally (row-wise) and then vertically (column-wise) across the image because DCT is a
function that separates the image into frequencies with large variance. The two-dimensional Forward
DCT (2D FDCT) coefficients
F(u, v)
of M x N block of image pixels
f(x , y)
are formulated as [11, 13,
14, 15]:
…… (4)
…… (5)
where
C(k)
is the normalization constant, u = 0,1,. . . , N – 1 and v = 0,1,. . . , M – 1.
The DCT coefficients (D
i,j
and i, j = 0,1, … 7) of the corresponding image block of size 8×8,
as example, are then ordered in a particular irregular sequence as shown in Figure (6). This irregular
ordering of the coefficients is called Zig-zag ordering.
Figure (6): Zig-zag ordering of DCT Coefficients
The above sequence is broken into runs of nonzero (the early coefficients that contains the
important "low-frequency" image information) and zero coefficients (the later coefficients in a block
that contains the less-important "high-frequency" image information) [14, 15].Therefore, the final
DCT feature-set considered in this work is generated only from the number of significant (nonzero)
coefficients denoted by N in Table (2) that starts from D
0,0
.
b) Wavelet Transform
The wavelet transform has been found very useful for the time-scale representation and has
been widely used in signal processing and computer vision. The Wavelet transform is a multi-
resolution technique that cut up data into different frequency components, and then analyzes each
component with a resolution matched to its scale. The forward and inverse continuous wavelet
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
57
transform of x(t)"the signal to be analyzed" with respect to the basis function or wavelet
(t)
ψ
k,j
at
scale j (j>0) and time delay k is written as follows [16, 17, 18]:
…… (6)
…… (7)
where
waveletmother the is and
(t))
j
kt
(
j
1
(t)
ψψ
ψ
j,k
−
=
…… (8)
This multiresolution can also be obtained using filter banks, resulting in the Discrete Wavelet
Transform (DWT) that are well suited to the digital computer because there are no derivatives or
integrals, just multiplication and addition operations which correspond to the mathematical
convolution operation. The procedure starts with passing the signal (sequence) through a half-band
digital low-pass & high-pass filters. The DWT is computed by successive low pass and high pass
filtering of the discrete time-domain signal X[n], as shown in Figure (7), and each result is down-
sampled by two (↓2) where the low-pass filter is denoted by G
0
while the high-pass filter is denoted
by H
0
. At each level, the high-pass filter produces detail information d[n] while the low-pass filter
associated with scaling function produces coarse approximations a[n].The DWT of the original
signal is then obtained by concatenating all the coefficients, a[n] and d[n], starting from the last level
of decomposition [10, 17, 18].
Figure (7): Three Levels DWT Decomposition Tree
After the conversion of input image from its lowest-level of pixel data in spatial domain I(x,
y) into higher-level representation of wavelet coefficients W(x, y), a set of wavelet features (the
energy of each band as stated in Eq.(9))can be extracted by recursively decomposing sub-images in
the low frequency channels as shown in Figure (8).The number of wavelet features for the 1
st
level is
4, and each progressing in wavelet level will correspond an increasing in features length by 3.
…… (9)
∫
=dt(t)x(t))W(
ψj,k
j,k
CWT:Forward
∫
∫
=
k j
djdk(t)W(j,k)x(t)
ψ
kj,
:CWTInverse
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
58
Figure (8): Three-Level Wavelet Multiresolution Decomposition Image
Wavelet transform breaks an image down into four sub-sampled images. The results consist
of one image that has been high pass filtered in both horizontal and vertical directions (HH), one that
has been high pass filtered in the vertical and low pass filtered in the horizontal (LH), one that has
been low passed in the vertical and high passed in the horizontal (HL), and one that has been low
pass filtered in both directions(LL).
Numerous filters can be used to implement the wavelet transform. Daubechies (D4) wavelet
is one of the most commonly used due to its efficiency. The Daubechies basis vectors are [10, 16,
18]:
…… (10)
…… (11)
4.6 Pattern Matching
The resulting test template, which is an N-dimensional feature vector, is compared against the
stored reference templates to find the closest match. The process is to find which unknown class
matches a predefined class or classes. For the OCR task, the unknown character is compared to all
references in the database. This comparison can be done through Euclidean (E.D.) distance measure,
shown below [9, 10]:
…… (12)
where A and B are two vectors, such that A=[a
1
a
2
… a
N
]andB=[b
1
b
2
… b
N
].
In our approach the minimum distance classifier is used to measure the difference between
the two patterns (feature vectors). This classifier assigns the unknown pattern to the nearest
predefined pattern. The bigger distance between the two vectors, is the greater difference [9, 10].
5. EXPERIMENTAL RESULTS
OCR accuracy is definedas the ratio of correct recognized characters to the total number of
characters (samples) tested,as shown by Eq. (13):
∑
=
−=
N
iii
1
2
)ba(.D.E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
59
..… (13)
A number of experiments and test conditions were accomplished on (13650) samples to
measure the performance of the proposed OCR system on various types of document images of
different dimensions, font types and font sizes. As a result, the database size of training samples is
computed as follows:
..… (14)
A more appropriate comparison can be made if both DCT & Wavelet methods are measured
under identical conditions. Based on the results shown in Table (2), one can deduce that all wavelet
features produce better recognition rates 96% (from 92 – 98%) than DCT features 92% (from 87 –
95%). Different number of DCT coefficients (N) and wavelet decomposition levels values are
examined according to the recognition rates. It is clearly indicated that two decomposition levels are
most appropriate for wavelet feature vector construction, whereas 10-DCT coefficients is the suitable
number of DCT features (N=10).
Table (2): OCR Accuracy for different Testing images using DCT &Wavelet features
Test Image
File
Font
Type
No. of
Chars
DCT
Wavelet Transform
N=5
N=10
N=15
Level1
Level2
Level3
Test1.bmp
Arial
525
77.52
91.62
85.90
71.43
94.29
80.95
Test2.bmp
581
79.00
93.46
87.44
74.87
96.73
87.78
Test3.bmp
791
81.29
90.90
85.59
80.28
94.82
85.59
Test4.bmp
371
75.74
94.88
83.83
64.15
97.30
80.32
Test5.bmp
462
71.65
87.88
78.35
69.05
93.51
83.77
Test6.bmp
Calibri
480
76.46
92.29
82.29
75.63
95.00
83.75
Test7.bmp
633
79.78
93.84
85.47
77.09
95.10
80.73
Test8.bmp
415
78.31
95.66
84.10
68.43
96.63
83.37
Test9.bmp
692
79.05
90.75
83.38
75.29
96.68
85.55
Test10.bmp
510
77.06
92.75
84.31
79.02
97.25
84.90
Test11.bmp
Courier
New
395
75.70
92.41
82.28
67.34
96.46
81.27
Test12.bmp
704
81.82
95.60
86.36
73.86
97.30
84.52
Test13.bmp
620
79.35
92.58
83.23
75.97
96.61
83.71
Test14.bmp
557
79.35
93.36
84.02
74.33
98.20
89.23
Test15.bmp
454
75.11
91.63
81.28
68.06
96.92
82.60
Test16.bmp
Lucida
Sans
664
80.72
93.67
86.30
75.75
95.63
84.19
Test17.bmp
325
72.92
94.46
82.15
68.92
92.62
79.38
Test18.bmp
653
78.56
92.04
82.54
76.11
94.49
82.24
Test19.bmp
341
68.62
89.15
77.71
69.21
95.60
78.01
Test20.bmp
747
79.65
91.30
82.33
78.71
94.38
83.53
Test21.bmp
Times
New
Roman
720
76.11
90.28
79.44
77.64
96.81
85.00
Test22.bmp
373
71.05
87.13
80.43
72.12
98.93
86.33
Test23.bmp
475
73.68
89.05
79.79
68.63
95.37
80.63
Test24.bmp
525
79.05
93.14
82.86
74.10
96.95
85.52
Test25.bmp
637
78.02
90.74
81.48
78.96
97.02
86.81
Total
13650
77.64
92.05
83.16
74.25
96.01
83.89
SizesFontof.NoFontsof.NoSamplesof.NoSamplesTrainingof.No
×
×
=
Samples3185 = 7×5× 91=
%100
TestedCharactersofNo.Total
CharacterscognizedReCorrectlyofNo.
AccuracyOCR ×=
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
60
A good feature set has a major importance in the isolation of regions of common property
within an image, and it should represent characteristic of a class that help distinguish it from other
classes, as shown in Figure (9).The proportion of (
⁄) was taken in our consideration for all text
components (characters, digits, and symbols) for two reasons:
• it is invariant to different font sizes of the same font type,
• to speed-up the recognition process because the matching processes are limited only to the
text components in the same class.
Consequently, an improvement overall recognition rates (about 3%) are obtained after classifying
characters according to the addition of a new discriminating feature (the proportion of Height to
Width) that produce 99% for wavelet and 95% for DCT.
Figure (9): The Proportion of (
⁄) Feature for the character sample 'E' in different Font
types & sizes
6. CONCLUSIONS
A multi-fonts English texts OCR system for 3185 training samples and 13650 testing samples
is presented that relies on DCT and wavelet features. Image enhancement techniques (noise removal,
foreground/ background separation, normalization and binarization) are adopted in this work prior to
segmentation in order to improve the recognition rates and simplify the process of characters
segmentation.
It is found that wavelet method is appropriate for feature vector construction where all the
recognition rates (96%) outperform the DCT based recognition method (92%). To enhance the
recognition rates further and speeds up the recognition process, text-components (characters, digits,
and symbols) are classified according to the proportion of (
⁄) feature that produce 99% accuracy
for wavelet based method because it help distinguish a class from other classes which is invariant to
different font sizes of the same font type.
REFERENCES
[1]
Mohamed Cheriet, NawwafKharma, Cheng-Lin Liu, ChingY.Suen, "Character Recognition
Systems: A Guide for Students and Practitioners", John Wiley & Sons, Inc., Canada, 2007.
[2]
Youssef Bassil, Mohammad Alwani, "OCR Context-Sensitive Error Correction Based on
Google Web 1T 5-Gram Data Set", American Journal of Scientific Research, ISSN 1450-223X,
Issue. 50, February 2012.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
61
[3]
R. JagadeeshKannan, R. Prabhakar, "An Improved Handwritten Tamil Character
Recognition System using Octal Graph", Journal of Computer Science 4 (7): 509-516, ISSN
1549-3636, 2008.
[4]
R. C. Tripathi, Vijay Kumar, "Character Recognition: A Neural Network Approach",
National Conference on Advancement Technologies – Information Systems & Computer
Networks (ISCON) published in IJCA
®
, 2012.
[5]
Nafiz Arica, "An Off-Line Character Recognition System for Free Style Handwriting",
M.Sc. thesis submitted to the department of computer engineering, the graduate school of natural
and applied sciences of the Middle East Technical University, 1998.
[6]
Ibrahim Abuhaiba, "Arabic Font Recognition Based on Templates", the International Arab
Journal of Information Technology, Vol. 1, No. 0, July 2003.
[7]
Chirag Patel, Atul Patel, Dharmendra Patel, "Optical Character Recognition by Open Source
OCR Tool Tesseract: A Case Study", International Journal of Computer Applications (0975 –
8887) Vol. 55 – No.10, October 2012.
[8]
C. Patvardhan, A. K. Verma, C. V. Lakshmi, "Denoising of Document Images using Discrete
Curvelet Transform for OCR Applications", International Journal of Computer Applications
(0975 – 8887), Vol. 55 – No.10, October 2012.
[9]
Rafael C. Gonzalez, Richard E. Woods, “Digital Image Processing”, Second Edition, Prentice-
Hall, Inc., New Jersey, U.S.A., 2007.
[10]
S. E. Umbaugh, “Computer Vision and Image Processing”, Prentice-Hall, Inc., U.S.A., 1998.
[11]
TinkuAcharya, Ajoy K. Ray, "Image Processing: Principles and Applications", John Wiley &
Sons, Inc., New Jersey, U.S.A., 2005.
[12]
Ashu Kumar, Simpel Rani Jindal, "Segmentation of handwritten Gurmukhi text into lines",
International Conference on Recent Advances and Future Trends in Information Technology
(iRAFIT) published in IJCA
®
, 2012.
[13]
William K. Pratt, "Digital Image Processing", Fourth Edition, John Wiley & Sons, Inc., New
Jersey, U.S.A., 2007.
[14]
Milan Sonka, Vaclav Hlavac, Roger Boyle, "Image Processing, Analysis and Machine Vision",
Third International Student Edition, Thomson Corporation, U.S.A., 2008.
[15]
David Salomon, "Data Compression: The Complete Reference", Fourth Edition, Springer-
Verlag London Limited 2007.
[16]
C. S. Burrus, R. A. Gopinath and H. Guo, “Introduction to Wavelets and Wavelet
Transforms”, Prentice-Hall, Inc., U.S.A, 1998.
[17]
M. Kociołek, A. Materka, M. Strzelecki P. Szczypiński, "Discrete Wavelet Transform –
Derived Features for Digital Image Texture Analysis", Proc. of International Conference on
Signals and Electronic Systems, pp. 163-168, Poland, 2001.
[18]
V. Jeengar, S.N. Omkar, A. Singh, "A Review Comparison of Wavelet and Cosine Image
Transforms", I.J.Image, Graphics and Signal Processing, DOI, 2012.
[19]
M. M. Kodabagi, S. A. Angadi and Chetana. R. Shivanagi, “Character Recognition of Kannada
Text in Scene Images using Neural Network”, International Journal of Graphics and Multimedia
(IJGM), Volume 4, Issue 1, 2013, pp. 9 - 19, ISSN Print: 0976 – 6448, ISSN Online: 0976 –6456.
[20]
Dr. Mustafa Dhiaa Al-Hassani and Dr. Abdulkareem A. Kadhim, “Design a Text-Prompt Speaker
Recognition System using Lpc-Derived Features”, International Journal of Information
Technology and Management Information Systems (IJITMIS), Volume 4, Issue 3, 2013,
pp. 68 - 84, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413.
[21]
Dr. Mustafa Dhiaa Al-Hassani, Dr. Abdulkareem A. Kadhim and Dr. Venus W. Samawi,
“Fingerprint Identification Technique Based on Wavelet-Bands Selection Features (WBSF)”,
International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013,
pp. 308 - 323, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.