Review on image enhancement methods of old manuscript with the damaged background
ABSTRACT Quite often that old documents are suffering from background damage. Examples of background damages are varying contrast, ancient document age and the documents have degraded over time due to storage conditions and the quality of the written parchment. These have damaged background for example such as: have varying contrast, smudges, dirty, presence of seeping ink from the other side of the document, uneven background. In order to make them readable, image processing offers a selection of approaches. The aim of this paper is to provide comprehensive review methods to enhance old document images with damaging background. Three kinds of enhancement methods are: (a) Image enhancement methods using binarization method or thresholding method, (b) Image enhancement methods using binarization method or thresholding method and other methods, (c) Image enhancement methods using other methods only. As conclusion, the second method has becoming more popular and has a great potential to improve in future.
Article: Text binarization in color documents[show abstract] [hide abstract]
ABSTRACT: This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray-scale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006International Journal of Imaging Systems and Technology 12/2005; 16(6):262 - 274. · 0.78 Impact Factor
[show abstract] [hide abstract]
ABSTRACT: In this paper, we describe an input sensitive thresholding algorithm for ancient Hebrew calligraphy documents. Usually, historical document images are of poor quality since the documents have degraded over time due to storage conditions. However, the distribution of noise in one document is not uniform and the characters quality may vary. We develop tools to identify noisy characters and apply more sophisticated tools to process them. First, we use a global thresholding method to obtain an initial binary image. This suffices for noise free characters. Then we evaluate the document characters and invoke an accurate local method only on the noisy characters. Results show that our method detects a very high percent of the noisy characters, and that the local method achieves very accurate results.Pattern Recognition Letters.
IJISTA. 01/2006; 1:263-279.
International Journal on Electrical Engineering and Informatics - Volume 2, Number 1, 2010
Review on Image Enhancement Methods of Old Manuscript
with Damaged Background
Sitti Rachmawati Yahya1, S. N. H. Sheikh Abdullah2, K. Omar3, M. S. Zakaria4,
and C. -Y. Liong5
1,2,3,4Center for Artificial Intelligence Technology, Faculty of Information Science and Technology,
5School of Mathematical Sciences, Faculty of Science and Technology,
Universiti Kebangsaan Malaysia (UKM)
43600 UKM Bangi, Selangor D.E., Malaysia
email@example.com, firstname.lastname@example.org, email@example.com,
Abstract: Quite often old documents are subject to background damage. Examples of
background damages are varying contrast, smudges, dirty background, ink through page,
outdated paper and uneven background. The old Malay manuscripts which are a few hundred
years of age, for example, are not legible even after preservation process by the library. Image
processing offers a selection of approaches to counter these quality degradations and make the
manuscripts readable. This paper provides a comprehensive review of the methods for
enhancing old document images with damaged background. Three types of enhancement
methods have been identified which are (a) image enhancement using binarization/
thresholding method, (b) image enhancement using a hybrid of binarization/thresholding and
other methods, and (c) image enhancement using non-threshold based methods. Finally we
found that the second method is becoming more popular and has a great potential for
improvement in future.
Keywords: Image processing, Image enhancement, Damaged background, Binarization
method, Thresholding method.
Ancient manuscripts aging from hundreds to thousands of years are often in bad or
damaged background. Several causing factors are ill age, environmental influence, ink quality
worn and ancient. An example is the old Malay manuscripts which are normally written in the
Jawi scripts. These documents, which were written in the period of 16th to 19th century, have
still survived until today but the quality has been degraded. Besides the environmental issue,
human negligence also contributes towards the destruction of the Malay manuscripts. In
conjunction with this crucial issue, the manuscripts must be securely preserved and kept in soft
or hard copies. With that, the ancient Jawi manuscripts are guaranteed of their life and quality
and could be shared by interest parties and the future generations. A proactive measure has
been taken by the National Library of Malaysia (PNM) to preserve the old manuscripts.
Preservation process has been done through several levels of action. Among the actions are
cleaning, testing the acid content, deacidification, drying, traditional repair, repair using the
leaf casting machine, and binding. However, not all manuscripts which have gone through the
preservation process are clear and readable. Sometimes, the documents have contrast problems
such as the foregrounds are usually having damaged ink with different background color.
Hence, to improve the validity of the manuscripts, the interested parties must deploy digital
image processing approaches .
Based on the above problems, many researchers have discovered different methods to
facilitate the readings and reproduction of manuscripts which possess damages in the
antecedent images, such as -, -, , -, -, . In general, some
Received: November 3, 2009. Accepted: January 5, 2010
researchers initially repair the manuscript image from a clean background or free of noise
character by using binarization or thresholding process, and then apply thresholding and
stretching processes onto the grayscale image to obtain a noise free background (-, -
, -, -, , , ). They developed a procedure to separate the letters
from the background in the manuscripts.
Another research has attempted streak elimination in both sides of the document by using
an edge detection method with a double threshold values in the foreground image . On the
other hand, other research separated the background and foreground by applying the clearing
normalization algorithm and transformation histogram, and segmentation techniques such as
the K-Means algorithm and the Maximum Likelihood algorithm .
In this paper, we classify the methods to enhance old manuscripts into three image
enhancement methods, which are (a) Image enhancement method I – binarization/ thresholding
method, (b) Image enhancement method II – a hybrid of binarization/thresholding and other
methods, and (c) Image enhancement method III – non-threshold based methods. Figure 1
shows a flowchart of image enhancement steps for old manuscripts.
This paper is divided into five parts. Following this introduction is the image enhancement
methods in Section 2. Difficulties in image enhancement for Malay manuscript are discussed in
Section 3, and the benefits and limitations of image enhancement are given in Section 4.
Section 5 gives the conclusions and suggestions for future work.
Figure 1. The flowchart of image enhancement methods for old manuscripts which
have damaged background
Sitti Rachmawati Yahya, et al.
2. Image Enhancement Methods
A. Image Enhancement Method I
A. 1. Binarization/Thresholding Method
1) Entropy-Based Method
There are a few useful methods to calculate background and foreground regions. A
prominent method to segment the original image into binary image is called Entropy-based
Generally, this method is initiated by some threshold values in the range of two variables,
namely T1 (background) and T2 (foreground). Some researchers proposed different parameter
value arrangements for producing binary image which were also designed for document
enhancement purposes , , . The preserved gray level value is pixel at gray level
below T2 value and gray level value within T1 value and T2, whereas the gray level values
which are greater than T1 value, are directly eliminated .
On the other hand, other research groups have discovered different approaches to encounter
color documents , , . Firstly, they transformed the color documents into gray scale
before the binarization process, or directly binarized the color documents.
Beside that, another approach called Parameter Estimation Algorithm can also be used to
automatically detect the best value for parameter setting (PS). This method executed results
from a binary transformed image. It could estimate the different PS values based on any
particular image analysis technique .
2) Locally Adaptive Thresholding
Using this method, every pixel within a locality is applied an adaptive threshold value
based on the local image characteristic . Firstly, the gray level value g(x,y) for every pixel
(x,y) of a particular document image is calculated. The gray level is an intensity value within
the range of [0, 255]. Then, the local adaptive threshold, t(x,y) for every pixel (x,y), is
formulated as Eq. 1:
) , (
y x f
In addition, a local adaptive thresholding, such as Sauvola’s binarization method, the
threshold t(x,y) is computed using the mean, m(x,y) and standard deviation, s(x,y) of the pixel
intensities in a
window which is centered around the (x,y) pixel , . The formula
is as follows (Eq. 2):
where R is the maximum value of the standard deviation (R = 128 for gray scale document),
and k is a parameter consists of positive values in between 0.2 and 0.5. The local mean m(x,y)
and standard deviation s(x,y) are adapted threshold value pertaining to the contrast of pixels of
the local neighborhood. If there is a contrast in some particular region of the image, then s(x,y)
= R is selected and it would produce t(x,y) = m(x,y).
B. Image Enhancement Method II
B. 1. Hybrid of Binarization/Thresholding Method and Other Methods
Perantonis et al.  proposed a scheme for image binarization and improvement, which
contains different measures. Firstly, a low-pass Wiener filter was used as thepre-processing
if g(x,y) ≤ t (x,y)
,( ) ,(
x m y x t
Review on Image Enhancement Methods of Old Manuscript with Damaged Background
procedure. Secondly, the Niblack’s approach was used for a rough estimation of the foreground
regions . Thirdly, the interpolating neighboring background intensities are used a
background surface calculation process then the combination of the calculated background
surface and the original image are applied as a thresholding process. Finally, a post-processing
step is performed to improve the quality of text regions and preserve the stroke connectivity.
The majority of characters and its ligatures are calculated based on the existence of closed
cavity regions by proposing a novel segmentation-free, fast and efficient technique. It assists
the recognition procedure by tracing and recognizing the most frequently appearing characters
or character ligatures . He also tried out Sauvola’s approach for the adaptive thresholding
process. After executing the preprocessing task, they estimated the foreground by using the
rough estimation approach. It processes the image in order to extract the binary image where
black or white pixel corresponds to the rough estimated foreground regions.
Document analyses such as to identify dates, locations, and writers with different writing
styles are conducted by . They used a multi-stage algorithm. At the first step, a variable is
initially set as a character. After transforming the image into gray scale, the threshold process
is executed subsequently. Next step, the evolution process is applied for connected component
labeling on that particular image. It could separate characters from the damaged background.
Let SDi becomes the seed image of CCi using mi as a local threshold, we have:
Apart from that, an extension of Otsu automatic threshold method is introduced to separate
character from a damaged manuscript background . They discovered this method in order
to solve several conventional threshold value techniques: such as handling overlapping
characters. They had to scale down the image and apply a recursive labeling method for word
Initially, they approximated the background of a gray scale image using one of the two
models such as piece-wise linear or nonlinear models. With the purpose of overcoming
unevenness of document, background are designed by using background approximations.
Then, the background normalization algorithm is applied to the component channel images of a
color palm leaf image. They also proposed two local adaptive normalization algorithms for
extracting enhanced gray scale images from color palm leaf images. They partitioned an image
into m by n smaller regions whereby each region approximated a flat surface of the background
using piece-wise linear model. In each such region, a second linear function in the form of Eq.
4 is applied:
Ax + By – z + D = 0
For image normalization phase, any pixel at location (x,y) with pixel value zorig, the
normalized pixel value is computed by a linear translation as below (Eq. 5),
znew = zorig + zback + C
where, zback = Ax + By + D for the linear approximation case and zback = Value(x,y) for the
nonlinear approximation and Value(x,y) is a value taken from the nonlinear approximation at
(x,y). C is a constant fixed to some number closed to the white color value of 255, and for a
global shifting, the suggested value is 220. After that, adjustment is made by stretching. For
any pixel at (x, y) the enhanced pixel value is derived by Eq. 6.
if CCi ≤ = mi
Sitti Rachmawati Yahya, et al.
exceed 255. They proposed for extraction of pre-specified letters using a segmentation-free
approach based on the well-known erosion operator , composed of several stages in the
extraction process: structuring element generation, character extraction, character validation
and structuring element adaptation.
Wang et al.  suggested that by applying a directional wavelet transform in double side
mapping, it is able to distinguish the reverse side strokes and the foreground much better than
the conventional wavelet transform. The restored images are transformed into binary using
Niblack’s approach to improve the final appearance .
On the other hand, other research has identified script at word level in a bilingual document
containing Roman and Tamil scripts . After pre-processing the binary image, they applied
skew detection and correction processes. They achieved an accuracy rate of 86-98%.
Moreover, they also implemented segmentation process at line and sentence levels. Gabor
filter, well-known as a Gaussian modulated sinusoid is used while in the feature extraction
process. With orientation Ѳ, and centered at frequency F, a complex 2-D Gabor function is
given as below (Eq. 7) :
In the x and y directions, the spatial spreads of the Gaussian are given by:
))12 ( 2( / ) 12 ( 2ln
))2/(tan2 ( /2ln
where ΩF and ΩѲ are the frequency and the angular bandwidth, respectively. The necessary
parameters to model the human visual system were provided by changing the frequency and
scaling of the Gabor function.
An alternative method to separate foreground from a background in colored image of Arabic
historical manuscript is to use a normalization algorithm of the lighting intensity and
classification using K-means . The normalization algorithm of the lighting intensity solves
the irregular and low-contrast image background problem. This technique can transform
colored image into binary image based on histogram point average.
Subsequently, recursive estimation is calculated with regards to gray scale pixels. It would also
alter the window size of that particular image. In order to estimate pixel strength of value in
foreground, two rules are used as below (Eq. 10 and 11),
To ensure a white background, C is set to 255 (usually) to as the gray level can never
Review on Image Enhancement Methods of Old Manuscript with Damaged Background