Technical ReportPDF Available

Automated Image Segmentation Methods for Digitally-Assisted Palaeography of Medieval Manuscripts

Authors:

Abstract and Figures

We explore methods of automating the digital palaeographic process, using a divide and conquer approach. Firstly, image noise is reduced using a combination of colour removal, and varied blurring and thresholding techniques. Initial values for these processes are calculated by the system based on the average greyscale colour of the image upon initial importation. By combining these algorithms, the system is able to achieve high levels of noise reduction. The process of segmenting the script into letters is also divided. First, blocks of text are detected in the noise-reduced image, by measuring the proportion of black pixels within pre-defined sized blocks of pixels, comparing these values to the average colour values of not only the entire image, but the surrounding blocks (minimising false positive rates). These blocks of text are split into individual lines through detection of whitespace, and then further segmented into individual letters, through a similar technique. In order to verify the integrity of the letters, the sizing of each segment is compared to the letter average (since most letters within manuscripts are of a similar width). Any letters excessively differential to this average, are then rechecked , by re-performing the segmentation algorithms in these specific locations with thresholding set to both lighter and darker levels. The results of these segmentations are then merged, with each box finally being expanded to fit the letter more precisely.
Content may be subject to copyright.
Automated Image Segmentation Methods for Digitally-Assisted
Palaeography of Medieval Manuscripts
Department of Informatics & Department of Digital Humanities
King’s College London
Brian Maher, Kathleen Steinh¨ofel & Peter Stokes
April 2013
Abstract
We explore methods of automating the digital palaeographic process, using a divide and
conquer approach. Firstly, image noise is reduced using a combination of colour removal, and
varied blurring and thresholding techniques. Initial values for these processes are calculated by
the system based on the average greyscale colour of the image upon initial importation. By
combining these algorithms, the system is able to achieve high levels of noise reduction.
The process of segmenting the script into letters is also divided. First, blocks of text are
detected in the noise-reduced image, by measuring the proportion of black pixels within pre-
defined sized blocks of pixels, comparing these values to the average colour values of not only
the entire image, but the surrounding blocks (minimising false positive rates). These blocks of
text are split into individual lines through detection of whitespace, and then further segmented
into individual letters, through a similar technique.
In order to verify the integrity of the letters, the sizing of each segment is compared to
the letter average (since most letters within manuscripts are of a similar width). Any letters
excessively differential to this average, are then re-checked, by re-performing the segmentation
algorithms in these specific locations with thresholding set to both lighter and darker levels.
The results of these segmentations are then merged, with each box finally being expanded to
fit the letter more precisely.
Contents
1 Introduction & Motivation 5
1.1 Digital Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Palaeography ...................................... 5
1.3 Problems with Digital Palaeography . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Motivation ....................................... 6
1.5 CurrentMethods .................................... 6
1.5.1 Image Pre-Processing & Noise Reduction . . . . . . . . . . . . . . . . . . 6
1.5.2 Digital Image Segmentation Methods . . . . . . . . . . . . . . . . . . . . . 8
2 Methodology 11
2.1 Algorithms & Segmentation Process . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Parallelisation.................................. 11
2.1.2 ColourRemoval ................................ 12
2.1.3 Thresholding .................................. 12
2.1.4 Blurring..................................... 13
2.1.5 Threshold Level Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.6 Area of Interest (Text Block) Detection . . . . . . . . . . . . . . . . . . . 14
2.1.7 LineSegmentation ............................... 15
2.1.8 LetterSegmentation .............................. 16
2.2 DataStructures..................................... 18
2.2.1 Manuscript Library Data Structure . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Manuscript Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Letter Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Experimental Results 21
3.1 SegmentationProgress................................. 21
3.1.1 Segmentation Process & Results . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Effect of Parametric Setting Adjustments . . . . . . . . . . . . . . . . . . 22
3.2 TechnicalPerformance................................. 23
3.2.1 ResourceUsage................................. 23
3.2.2 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Effectiveness of Parallelisation on Running Time . . . . . . . . . . . . . . 24
4 Conclusions & Future Research 26
4.1 Conclusions ....................................... 26
4.1.1 Performance .................................. 26
4.1.2 NoiseRemoval ................................. 26
4.1.3 UserInteraction ................................ 26
4.2 Future Research & Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1
4.2.1 K-Means Clustering Implementation . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 Interaction with Existing and Future Tools . . . . . . . . . . . . . . . . . 27
5 References 28
2
List of Algorithms
1 K-Means Heuristic Algorithm on Image of Size (w*h) . . . . . . . . . . . . . . . 9
2 Basic Greyscale Conversion on Image of Size (w*h) . . . . . . . . . . . . . . . . 12
3 Thresholding Algorithm on Greyscale Image of Size (w*h) . . . . . . . . . . . . . 13
4 Colour Blurring an Image of Size (w*h) and threshold t . . . . . . . . . . . . . . 14
5 Line Segmentation within Area of Interest size w*h . . . . . . . . . . . . . . . . . 16
6 Letter segmentation within Area of Interest size w*h and array of lines L . . . . 17
3
List of Figures
1.1 Greyscale Contrast Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Edge Detection Example Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Converting RGBA Integers to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Predicting Thresholding Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Area of Interest Detection Example . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 The Effect of Re-Segmenting using Differing Threshold Levels . . . . . . . . . . . 17
2.5 Getting a Manuscript from the Library . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Finding the Letter at any given Point . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Checking if a Point (x,y) is within the Boundary Box of a Letter . . . . . . . . . 19
2.8 Comparison of Letters using Labels and Notes . . . . . . . . . . . . . . . . . . . 20
3.1 Segmentation Process Sample Output . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Area of Interest Detection Sample Output . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Effect of Threshold Level Adjustment . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Effect of Blur Level Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Average Multithreaded Algorithm Performance (Relative) . . . . . . . . . . . . . 24
3.6 Algorithm Performance Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4
Chapter 1
Introduction & Motivation
This project aims to analyse and contrast existing image segmentation methods, and develop
algorithms capable of performing automatic image segmentation and detection of letters in
medieval manuscripts.
1.1 Digital Image Segmentation
Image segmentation is the process of splitting an image into several partitions, based on a set
of reference points or points of interest. Image segmentation can be performed in many ways
- for example, an image could be split into equal sections (simplest method), a partition could
be the contents of (or pixels within) the boundary box a certain padding around a particular
point of interest, or a freeform partition map could be drawn anywhere within an image (the
hardest method). Image segmentation is used in several fields, from medical diagnosis (e.g.
from segmenting C.T. scan results[18] to identify active regions, to determining the location of
a particular object within a map). As well as segmenting the image in different ways, several
methods have been devised which attempt to solve the image segmentation problem, however,
these methods are largely heuristic, and the problem itself is yet to be solved - there is no
devised algorithm which can simply segment any image with any requirements.
Image segmentation algorithms are highly susceptible to noise[4] within an image, which
can result in large amounts of pre-processing being needed to get the image to a workable
state, this can include adjusting contrast levels, digital blurring and/or sharpening, cropping
and straightening, and highlighting points of interest or edges within the image. Since image
segmentation algorithms generally have to analyse every pixel of an image several times over,
they often run in exponential time, meaning they are very slow on large images.
1.2 Palaeography
Palaeography is the study of ancient written text - including the time consuming process
of not only trying to understand and interpret the written words, but retrieving these words
from often degraded scripts. In recent years, has been aided by recent advances in computation
through the use of digital image storage and manipulation, and organisations such as DigiPal[1]
have been set up by academic institutions in an attempt to digitise this taxing process. The
primary goal of palaeography is to identify key features of a manuscript - its author, the location
at which it was written, the content itself, and to improve estimations of the date the manuscript
was initially created.
5
These teams of Digital Palaeographers have used computational methods to make storing,
retrieving, and inputting manuscript data easier, however, the systems currently in use are still
largely non-automated - meaning that the most time consuming part of the process, drawing
out and labelling letters, has to be done manually. Albeit digitally.
1.3 Problems with Digital Palaeography
As stated, many of the current systems for Digital Palaeography are based around the manual
inputting of data, and do not automate the detection process - for one main reason - if we
combine the features described in Sections 1.1 and 1.2, we highlight a vital clash - ancient
manuscripts, by nature, are often incredibly degraded and poor in quality, and the scanned
images are often incredibly large. Image segmentation algorithms’ two main flaws are the
inability to process noisy images correctly, and that they are often slow to run on very large
images.
1.4 Motivation
The aim of this project is to automate the process of detecting and ”boxing” letters in ancient
manuscripts - not by simply running a segmentation algorithm on the raw scanned image of
a manuscript, but by automating the process of pre-processing the images, and utilising data
about known letters in previously analysed manuscripts to help detect shapes and points of
interest within the images, and automatically label any images found with the correct details.
This learning system should be self improving - since the more instances of a letter it finds, the
better the ”fingerprint” of the letter it will hold, which should improve detection rates in any
further analysed manuscripts.
1.5 Current Methods
1.5.1 Image Pre-Processing & Noise Reduction
As discussed in Section 1.5.2, pre-processing of the image is vital to ensure optimal perfor-
mance of any segmentation algorithms, since these algorithms are normally highly susceptible
to noise.
Noise is a critical problem in digital palaeography, since it can be introduced in many ways
- from stains and visible defects (which are highly likely) on ancient manuscripts, to defects
introduced in the digitisation process, such as compression artefacts, and undesired shadows on
scanned images (an example is given in Appendix I). Digitally introduced noise can vary greatly
in style, from gaussian (where the probability density function of the average (minor) variance
applied to each pixel takes an approximate normal distribution), to pixel corruption, where
a single pixel (or group of pixels) are corrupted and given an erroneous colour (most image
algorithms will repair these minor corruptions with either black or white pixels, resulting in
bright or darks spots within an image), or even as whole image corruption, where the majority
of image data is lost.
Gaussian Noise Reduction
The simplest gaussian noise reduction algorithms work[5] by slightly blurring the image, re-
ducing the effect of noise against any one pixel relative to all other pixels (if the noise was
6
distributed normally, blurring the image would have the effect of flattening the normal distri-
bution). The image is, however, only blurred slightly, in order to avoid distorting any features
or the image, which could hinder any edge or feature detection algorithms.
Spot Noise Reduction
Spot noise removal is more difficult than gaussian noise[6] - since it only affects specific pixels
(or a group thereof) - meaning that before the noise can be removed, it has to be adequately
detected before removal - a process which usually involves a local variant of the 1.5.2 algorithm
with a very high threshold - thus marking any pixels with a high variance in either colour or
luminance to their neighbours as a possible outlier. Once target pixels have been identified, the
noise can be removed by modifying the colour of the pixel to be closer to the average colour of
its neighbouring pixels, or, in the case of a group of pixels, the neighbouring pixels to the group
as a whole. Reducing spot noise on groups of pixels, however, but be performed with caution -
since a small group of pixels could be a feature instead of noise (for example, a full stop written
on a manuscript).
A safer method of removing these spots would be to apply a minor contrast adjustment
(Section 1.5.1) on the pixels surrounding an anomaly, in order to reduce the variance of the
pixel without affecting its colour properties dramatically.
Image Corruption
Image corruption can occur in several forms (or intensities) - usually taking the form of minor
pixel artefacts occurring through the image (which can sometimes be reduced as described in
Section 1.5.1, sections of missing data within the image (visible as blocks of pixels in one solid
colour - normally a greyscale colour from 00 to FF), or a complete loss of all image data.
These corruptions are, however, usually unrecoverable since no common image formats contain
redundant data from which the lost image data can be rebuilt.
Image corruption can, however, manifest itself in less destructive ways which can be recov-
ered[7], albeit with a loss of image fidelity. An example of this could be loss or corruption of the
embedded colour profile, which can cause incorrect colours to be displayed (or even a complete
loss of colour information). Although this might seem destructive, the image data still there,
so algorithms can still be run on the data (in this case a simple thresholding (Section 1.5.2
algorithm could be used to restore a black and white outline of the image).
Contrast Enhancement
Contrast is defined as a measure of the variance in the brightness and colour of neighbouring
areas (or pixels) within an image. Enhancing this contrast can help segmentation algorithms
detect edges and features of interest, since edges usually feature high levels of contrast with their
neighbouring pixels. It does however, need to be used with caution, increasing the contrast of
an image excessively could fool edge detection algorithms, especially once an image has been
converted to greyscale (as per Section 1.5.1).
Increasing the contrast of an image can be performed, at a basic level, by looping through
each pixel within an image, comparing its luminosity and colour to that of its neighbouring
pixels, reducing the brightness and colour strength of the pixels with the lower brightness and
colour strength, and vice versa - increasing the variance between pixels.
7
Figure 1.1: Demonstration of different levels of contrast in a greyscale image. [3]
Colour Saturation Adjustment & Greyscale
Colour saturation is a measure (on an arbitrary and/or relative scale), of the amount of colour
a pixels contain, abstaining from black and white - for example, using 8-bit Red, Green and
Blue (RGB) channels, the colour (128, 0, 255) could be said to be highly saturated with red,
whilst (0, 0, 50) could be described as moderately saturated with blue. Adjusting the levels
of saturation in an image could be performed by checking which colour(s) each pixels has high
levels of saturation in already, and slightly increasing the amount of that particular colour in
that pixel.
The polar opposite of increasing colour saturation would be converting the image to greyscale
- where all image detail is retained, except colour information - the brightness of the grey (a
colour in which red, green and blue channels have the same value) used for a pixel relates to
the brightness (or amount of colour) of that pixel (not the saturation).
1.5.2 Digital Image Segmentation Methods
Thresholding
Thresholding[8] is possibly the simplest method of image segmentation whose goal is to seg-
ment the image into two partitions based on thresholds of colour. In order to partition an image
based on colour, you would first need to set the target threshold. For each pixel in the image,
the colour is checked against the threshold value - if it meets the threshold it is coloured black,
otherwise it’s coloured white (or vice versa). This will output a simple black and white image
based on the threshold level set at the start.
Basic thresholding algorithms are highly simplistic, and therefore can run in O(hw) where
h, w represents the height and width of the image. More complicated thresholding techniques
have been devised[20], which use fuzzy sets to process threshold levels.
Although the operation performed by threshold segmentation is basic (is is, in essence, simply
a sharp contrast adjustment of the image), it can be very useful, with the correct threshold
values, in identifying regions of interest (for example, letters against paper on a manuscript),
since it gives a binary output (it is possible to detect only two colours on a thresholded image).
This would therefore make a good preprocessor for identifying areas of interest used in other
segmentation algorithms.
8
K-Means Clustering
Clustering algorithms attempt to cluster (group) pixels of an image together by linking them
to neighbouring pixels[9]. K-Means clustering groups pixels by adding them to the partition
which has the nearest mean (the set of k initial means are given at the start of the algorithm
- either being randomly generated, or based on the areas of interest detected in previous pre-
processing). In the first instance, each pixel is clustered with the nearest initial mean. The
means of each cluster are then recalculated, and again, each pixel is reallocated to the cluster
whose mean it is nearest. This step is repeated until no pixels have changed group (at which
point the clusters have converged). A basic pseudo-code version of this algorithm is given in
Algorithm 1.
Algorithm 1 K-Means Heuristic Algorithm on Image of Size (w*h)
1: m0...k1Set of means
2: changed T RU E
3: while changed =T RU E do
4: calculateM eans()
5: changed F ALS E
6: i0
7: j0
8: for ifrom 0 to w1do
9: for jfrom 0 to h1do
10: for all mean in mdo
11: if distance(pixel(i, j), mean)< distance(pixel(i, j), currentM ean(pixel(i, j)))
then
12: currentM ean(pixel(i, j )) mean
13: changed T RU E
14: end if
15: end for
16: end for
17: end for
18: end while
The method is an example of a minimisation problem, where the goal is to minimise the
within cluster sum of squares, and is computationally N P Hard. Several heuristic algo-
rithms[10][16] have been developed in an attempt to solve the problem, however, most of them
rely on convergence of the clusters - a property which is not guaranteed for every set of objects
(pixels) and initial means. The clusters given in this instance may not be optimal.
Edge, Ridge & Boundary Detection
Edge & Ridge detection is a method of partitioning an image by detecting the edges of areas
of interest or lines within it. This method is generally, at a very abstract level, an extension
of the thresholding method (detailed in Section 1.5.2) where the threshold is generated locally
to the analysed pixel, based on the colour and luminance properties of its neighbouring pixels,
and those in (generally) an approximate line with it.
Edge detection can be performed in a variety of ways[11][12], from filtration using gaussian
derivatives, to measuring the strength and direction of gradients in a greyscale image (Section
9
1.5.1). Boundary detection[13] takes a different angle, however. Contrary to edge detection
methods, boundary detection attempts to split the images into sections based on their features,
as opposed to their colour or luminance based edges - for example, in an image consisting of
mountains and sky, edge detection would (ideally) detect the outlines of the mountains and
clouds, whilst boundary detection would focus on the split between land and sky.
Figure 1.2: Example[2] of an edge detection algorithm. (a) Original Image (b) Edges detected.
Manuscript Images
As stated, scanned manuscript images usually contains large amounts of noise. Appendix I
shows a sample manuscript which is in good condition. Some of the visible artefacts include
creases in the paper, leakage of letters on the opposite size of the paper, dark brown spots and
edges as well as visible staining to the paper.
Since the goal of this project is to automate the palaeography process, Appendix IV shows a
mock-up of a script which has been through the process (showing a sample design of the output of
this system). As you can see, the first line of letters have been marked with individual bounding
boxes, with as little overlap as possible. This mock-up has simulated the user hovering the mouse
over a detected letter, which highlights all similar detected letters, and displays certain meta-
data about the letter (including the name, any notes, any how many times it has been found -
this will eventually be a user-customisable feature).
10
Chapter 2
Methodology
2.1 Algorithms & Segmentation Process
Before segmentation can be performed, because of the highly noisy nature of the images in
use, we must first attempt to remove as much noise as possible from the image. Will be done by
running 3 separate algorithms on the image, with the intention of leaving only black ((0,0,0) in
RGB colour space) pixels, which are part of letters within the image. To improve performance,
wherever possible, we will process separate segments of the image simultaneously.
2.1.1 Parallelisation
In order to provide multithreading support to algorithms, each algorithmic class extends a
Plugin superclass, which provides several helper methods. Firstly, the Plugin class allows indi-
vidual classes to access the number of cores available, so that they can fork the correct number
of threads (since running too many simultaneous threads would result in a harsh performance
reduction). It also allows provides a single non thread-related method, rbgaToRGBArray()
which converts a 32 bit integer RGBA value into an array of R, G, and B values (this is done
by bit-shifting, as shown in Figure 2.1.
return new int[]{(rgba >> 16) & 0xFF, (rgba >> 8) & 0xFF, rgba & 0xFF };
Figure 2.1: A short algorithm converting a 32 bit integer RGBA value into an array of RGB
values, where 0 => R, 1 => G and 2 => B
Plugin holds 3 variables, other than the CPU count, one containing the number of threads
that a class intends to use, one containing the number of started threads, and one containing the
number of ended threads. To create a multithreaded environment, subclasses should begin by
calling the resetThreads(int) method, with the number of threads it intends to use (setting the
value of the numThreads variable). The subclass can pass getThreadCount() to use the maxi-
mum available threads. The subclass should then create the correct number of threads, calling
registerThread() when each thread is started, which not only increments the value of thread-
StartCount, but provides the thread with a unique integer ID in the range [0...numT hreads).
When each thread has finished, it should call the unregisterThread method to tell the Plugin
class that it has completed its task (this saves having to monitor several threads simultaneously
which would waste CPU cycles).
11
Once the initial threads have been created, the algorithmic class itself should them listen
for the isAlive() method to return false. Once it returns false, all threads have completed
computation, and the class has a full data set to work with. This isAlive() check is performed
by simply checking whether less threads have been ended than were created.
2.1.2 Colour Removal
Since we will be using contrast levels to segment the images, all non-greyscale colour infor-
mation ((x, x, x) in RGB colour space, for xfrom 0 to 255) is redundant. Non-greyscale colour
information can also cause issues in later algorithms, particularly blurring. Due to the red hue
that most manuscript images possess (which can be related to both the colour of the paper
comprising the vast majority of the image, and the colour temperature of the scanning equip-
ment used), when combining colour information for neighbouring pixels, the value of the red
channel increases, which can affect the contrast levels, since the difference in the levels of red
between pixels decreases rapidly, thus introducing further noise.
The algorithm which removes colour from the image will iterate through every pixel of the
image, as per Algorithm 2.1.2. This algorithm runs in O(n), where n is the number of pixels
within the image. Since every pixel must be changed, an algorithm with a lower asymptotic
running time than this is not possible, however, it is possible to reduce the real-life running
time, by using multithreading. Since the changing of any pixel is not reliant on the information
contained within any other pixels, the task can be split between any number of threads, with
each thread manipulating a specific fraction of the image.
Algorithm 2 Basic Greyscale Conversion on Image of Size (w*h)
1: for ifrom 0 to w1do
2: for jfrom 0 to h1do
3: avg (red(pixel(i, j)) + green(pixel(i, j)) + blue(pixel(i, j )))/3
4: red(pixel(i, j )) avg
5: green(pixel(i, j)) avg
6: blue(pixel(i, j)) avg
7: end for
8: end for
As shown, we iterate over all pixels in the image, taking the average of the red, green and
blue colour levels of the pixel, and setting all 3 channels to this value. All 3 colour channels are
then assigned this averaged value.
2.1.3 Thresholding
As shown in Algorithm 2.1.3, thresholding is performed in a very similar way to colour
removal, in that it is split over multiple threads, with each thread simultaneously processing a
proportion of the image. The algorithm follows the exact original design, in which the average
of the red, green and blue channels are taken (so that the algorithm will work even if the colour
remover has not been run), and if that is above the given threshold, the pixel is changed to
white. If it resides below, or equals the given threshold, it is set to black.
12
Algorithm 3 Thresholding Algorithm on Greyscale Image of Size (w*h)
1: threshold C3C3C3 (colour hex)
2: i0
3: j0
4: for ifrom 0 to w1do
5: for jfrom 0 to h1do
6: if colour(pixel(i, j )) > threshold then
7: pixel(i, j)BLAC K
8: else
9: pixel(i, j)W H IT E
10: end if
11: end for
12: end for
An alternative to this would be to check contrasting threshold levels between pixels, as op-
posed to testing against a global threshold. For example, consider a user set threshold of
t= 50%. For every pixel in the image, its greyscale level would be tested against its sur-
rounding pixels, and, if if the contrast between any two pixels is > t, the pixel will be given
(255,255,255), otherwise it will be set to (0,0,0). This method, however, has not been chosen
for two separate reasons. Firstly, checking the relative threshold levels of each surrounding
pixel of every pixel would increase the running time of the algorithm by 8 times, which would
effectively negate the effect of any parallelisation on most computers. Secondly, this method
would only erode the outermost section of any smaller areas of noise, instead of removing them
completely (consider an area of noise 3x3 pixels in size. The central pixel would ”pass” the
threshold test against its outer pixels, thus noise removal would fail.).
2.1.4 Blurring
Blurring is, again, performed in a similar fashion to the colour removal and thresholding
algorithms, in that it iterates through every pixel of the image, with each thread processing a
section of the image. Since blurring relies on the values of surrounding pixels, it does not edit
the original buffered image, instead calculating the value of each pixel based on the original
image, and setting this colour in a second, blurred image.
We use a threshold variant of blurring, where the colour of any pixel is averaged with that of
all pixels of distance <=naway (where nis the threshold set by the user). This is shown in
Algorithm 4. This algorithm is the most time consuming of all algorithms performed, and runs
in O(nm2), where n is the number of pixels in the image, and m is the threshold size.
For each pixel (x, y) in the image, the algorithm iterates over [-threshold, threshold] in both
directions, taking the average colour of all pixels in the range [(xthreshold, y threshold),
(x+threshold, y +threshold)]. If the pixel is out of range (for example, consider (x, y) = (0,0)
and threshold = 1 - the pixel (-1, -1) does not exist), an exception is caught, and that pixels is
excluded from computation. This process is shown in Algorithm 2.1.4
This blurring can be performed at 2 stages in the process, both before, and after thresholding.
The blurring can be performed before thresholding, where minor anomalous areas of pixels are
removed before thresholding. Alternatively, it could be run after thresholding, to remove any
13
Algorithm 4 Colour Blurring an Image of Size (w*h) and threshold t
1: i0
2: j0
3: for ifrom 0 to w1do
4: for jfrom 0 to h1do
5: colour 0
6: for mfrom tto tdo
7: for nfrom tto tdo
8: colour colour +colour(pixel(i+m, j +n)) // We assume that the colour is
greyscale here
9: colour colour/(2t+ 1)
10: colour(pixel(i, j )) colour
11: end for
12: end for
13: end for
14: end for
single black pixels, however, this would require re-thresholding after the blurring, since it would
introduce non-black pixels, which would avoid detection in the segmentation part of the process.
However, to keep the computation time of the process as low as possible, we will blur before
thresholding, and only perform one initial thresholding.
2.1.5 Threshold Level Prediction
Threshold level prediction is run upon importing an image, and is used to suggest a thresh-
olding level to the user. 3. It is computed by simply calculating the average colour level of
every pixel in the image. Since, as shown in Chapter 3, darker images require stronger thresh-
old levels, if the value is above 140, the predictor returns average 0.9, otherwise it returns
0.8average. This is shown in Figure 2.2
if(avg > 140) {
return (int)(avg*0.9);
} else {
return (int)(avg*0.8);
}
Figure 2.2: Predicting Thresholding Levels - darker images require stronger thresholding.
2.1.6 Area of Interest (Text Block) Detection
The first stage in the process is to detect the approximate location of the majority of the text
with the image (the Area-of-Interest). This will be done based largely on the density of black
pixels within sections of the image.
Before the Area-Of-Interest can be found, a small amount of pre-processing will be required.
Since we are comparing the density of black pixels in sections relatively to that as a whole, we
need to calculate the average percentage of black pixels over the whole image. This can be done
very quickly in O(n) by iterating over each pixel, incrementing a variable upon detection of a
14
black pixel, and then finally dividing the variable by the number of pixels in the image. This
will give the overall proportion of black pixels within the image as a whole.
Once we have the average black-pixel density of the image, we can test each section (with
a preliminary suggested size of 200*200 pixels), to see if it has a greater than average density
of black pixels. This scanning will start at all four corners of the image, simultaneously, and
work towards the centre. Once a section of pixels with a greater than average density has been
located, we will check that the density of the 3 adjacent sections, connected relatively inward,
also have a greater than average density of black pixels. If this is the case, the initial section
marks a corner of our area-of-interest. If not, the algorithm will move on the next section.
Area of interest detection has been implemented as per the initial design, following Figure
2.3. We first of all split the image into nxmblocks of size s, and launch 4 threads, with starting
points (0,0), (0,n), (m, 0) and (m, n). These threads then converge centrally, comparing the
density of black pixels in each section. The algorithm, then calculates the dimensions of the
bounding box required to hold all 4 detected sections, and returns this as a java Rectangle for
use in the line segmentation algorithm.
Figure 2.3: Area of Interest Detection algorithm detects the primary block of text in a
manuscript by searching for groups of pixels with lower than average levels of whitespace in
an inward-relative direction.
A graphical representation of this process is shown in Figure 2.3, where the arrows show the
starting points and directions of travel, the green boxes show the first detected sections, and
the grey boxes show the 3, relatively central, sections, which must also meet the greater than
average black pixel density criteria. The area of interest would be the bounding box which
encompasses all 4 green sections.
2.1.7 Line Segmentation
Due to its need to detect alternating areas of black and white pixels in one continuous area,
LineSegmenter is the only algorithm which can not easily be multithreaded. This does not have
a massive performance implication though, since LineSegmenter has far less operations than
most other algorithms, and most of these operations are purely mathematical.
LineSegmenter operates in 3 primary stages. The first is to calculate the density of black
pixels on each row of pixels within the image, and finally take the average of these lines. Next,
the algorithm searches for lines of text. This is performed by searching for a row of pixels which
has a greater than average density of black pixels. Once this is found at y,y1 is marked as
15
the opening point of a line. The search then changes to look for the first row of pixels which has
a less than average density of black pixels, and again, once this is found at y,y1 is marked as
the closing point of a line. This process repeats until the end of the area of interest is reached,
as shown in Algorithm 5.
Algorithm 5 Line Segmentation within Area of Interest size w*h
1: def lines new array of (y1,y2)
2: for yfrom 0 to h1do
3: for xfrom 0 to w1do
4: if color(x, y) == BLAC K then
5: y1y
6: for ifrom xto h1do
7: for jfrom 0 to w1do
8: if color(i, j )6=BLACK AND i== w1then
9: y2j1
10: lines lines + (y1, y2)
11: xi+ 1
12: go to 16
13: end if
14: end for
15: end for
16: end if
17: end for
18: end for
The final stage of the algorithm is to verify the integrity of the detected lines. This is done
by firstly matching opening values of y to their corresponding closing values. Each opening y is
then compared to the previous opening y, and if the distance between them is less than half of
the average distance between lines, the line is deleted. The remaining lines are then converted
into rectangles with the same width as the area of interest. This list of rectangles are then
returned for use in the next stage in the process.
2.1.8 Letter Segmentation
Letter segmentation is the most complicated of the stages. Letters are segmented by following
the same method as Line Segmentation (Section 2.1.7), scanning horizontally as opposed to
vertically, along each individual detected line. The overall letter segmentation is performed in
4 stages:
Initial Segmentation
Letters are segmented, as above, with the image thresholded at the level given by the user in
the process panel. This process is outlined in Algorithm 6.
Small Letter Joining
Once the letters have been segmented, the segments are analysed. Any sequential segments
which are both less than half the average width in size, and close to each other, are merged.
These merged segments are then re-segmented, using an image which was thresholded at a
16
Algorithm 6 Letter segmentation within Area of Interest size w*h and array of lines L
1: def letters new array of (x1,x2)
2: for ifrom 0 to L.size 1do
3: lineopen L[i].y1
4: lineclose L[i].y2
5: for xfrom 0 to w1do
6: for yfrom lineopen to lineclose do
7: if color(x, y) == BLAC K then
8: x1x
9: for ifrom xto w1do
10: for jfrom lineopen to lineclose do
11: if color(i, j )6=BLACK AND j== lineclose then
12: x2i1
13: letters letters + (x1, x2)
14: xi+ 1
15: go to 20
16: end if
17: end for
18: end for
19: end if
20: end for
21: end for
22: end for
higher level (resulting in thicker lines/more noise), in an attempt to join any letters which were
split by the segmentation process. This is shown graphically in Figure 2.4.
Figure 2.4: The Effect of Re-Segmenting using Differing Threshold Levels
Large Letter Splitting
Once any small letters have been joined as necessary, a similar process is performed for any
letters that are larger than average (thus having a width >2average). Any letters which meet
this criteria are re-segmented using a darker image, which highlights any gaps between letters,
allowing the segmented to detect the gap if one exists. Again, this is shown in Figure 2.4.
17
Fine Adjustment
The final stage of the process is to fine-tune the height of each segment to the height of the
letter. To do this, each individual box is expanded vertically, in both directions, until either
whitespace is met, or the expansion reaches the 150 pixel limit. This 150 pixel limit has been
put in place to ensure that the segmenter doesn’t detect the line below as part of the letter,
which would result in a continuously expanded box (in rare circumstances this could occur -
such as the tail of a ymeeting the head of a d).
2.2 Data Structures
As shown by the Class Diagrams given in Appendix X, the system uses 3 primary data
structures - Manuscript Library, which holds a collection of Manuscripts, which, in turn, contains
a collection of Letters.
2.2.1 Manuscript Library Data Structure
The Manuscript Library data structure is the simplest of the 3, containing one single variable,
an enumerated LinkedList, initialised to hold Manuscript objects. Methods are provided to add
and remove manuscripts to and from the library, and it is possible to retrieve manuscripts based
on their filename and label. This uses a very basic search method - iterating over all manuscripts
in the list, returning the first that matches the given string (as shown in Figure 2.5).
public Manuscript getManuscriptByFilename(String filename) {
for(Manuscript m : getManuscripts()) {
if(m.getFilename().equals(filename)) {
return m;
}
}
return null;
}
Figure 2.5: Getting a Manuscript from the Library
In order to reduce the memory footprint of the system, the LinkedList variable is not initialised
at runtime, and is instead uses lazy instantiation. The variable is initialised upon the first call
to the getManuscripts() method.
2.2.2 Manuscript Data Structure
The Manuscript structure contains information about a particular manuscript (or image),
and an enumerated LinkedList of Letters that the manuscript contains. The Manuscript data
structure contains the filename,label,notes,import and process dates of the image itself. It
also stores the algorithmic parameters used for the blur,padding and threshold stages of the
process, the last time the image was processed.
The class contains several methods for finding and manipulating letters, including getLetter-
AtPoint(x, y), which returns the letter visible at any given coordinate (Figure 2.6), and similar
methods to the Manuscript Library class, for getting letters by their label. The class also has a
18
method which allows the user to check if a Manuscript already contains a given letter. Whilst
not all of these methods have been used in the devised algorithms, the data structures have been
designed to be as structurally sound as possible, providing an adequate grounding for future
development.
public Letter getLetterAtPoint(int x, int y) {
for(Letter l : getLetters()) {
if(l.contains(x, y)) {
return l;
}
}
return null;
}
Figure 2.6: Finding the Letter at any given Point
This is done by iterating through the list of Letters, and checking if the bounding box of each
letter contains the given point (this method is later referred back to the native java Rectangle
class.
2.2.3 Letter Data Structure
The Letter structure stores information about individual letters, including their label, any
notes about them, their coordinates, and whether or not they have been flagged for review. To
allow easy reference without having to perform reverse searching, it also stores a reference to
the Manuscript in which the Letter is contained.
Coordinates are stored as two java Point objects, with one storing the top left (start) point of
the bounding box of the letter, with the other storing the bottom right (end). This approach was
chosen since it is more flexible than using a java Rectangle when it comes to manupulation. As an
example, extending the bounding box upwards by 1 pixel using points could be completed using
the function call startCoordinate.setY(startCoordinate.getY() + 1). The same transformation
using java Rectangle would require 4 effective operations over 2 function calls - one to set the
position (both xand y), and one to increase the size (again, both width and height ). The Letter
class also contains several other manipulation and searching methods, including a direct method
which will resize the letter in any direction, by moving the corresponding Point as previously
described.
public boolean contains(int x, int y) {
if(x > getStartCoordinate().getX() && x < getEndCoordinate().getX()) {
if(y > getStartCoordinate().getY() && y < getEndCoordinate().getY()) {
return true;
}
}
return false;
}
Figure 2.7: Checking if a Point (x,y) is within the Boundary Box of a Letter
19
The contains(x, y) method is also available, which checks if the point (x, y) is contained
within the bounding box dictated by the two Point objects. This method is shown in Figure
2.7. The class also allows direct comparison with other letters, based on the label and notes of
the two letters, as shown in Figure 2.8.
public boolean isSimilarTo(Letter otherLetter) {
if(!getLabel().equals("Unknown Letter") &&
getLabel().toLowerCase().equals(otherLetter.getLabel().toLowerCase())) {
return true;
} else if(!getNotes().equals("No Notes") &&
getNotes().toLowerCase().equals(otherLetter.getNotes().toLowerCase())) {
return true;
} else {
return false;
}
}
Figure 2.8: Comparison of Letters using Labels and Notes
20
Chapter 3
Experimental Results
3.1 Segmentation Progress
3.1.1 Segmentation Process & Results
The segmentation process has achieved a letter recognition accuracy of approximately 90% (as
shown in Appendix VI) within images which have not suffered an damage. On noisier images,
the result is slightly lower, but remains above 80% in most cases. Figures 3.1 and 3.2 (AOI
Detection) show the stages of the process in succession. Firstly, all colour is removed from the
image, followed by blurring of the image, to soften the colour harshness of any small areas of
noise. Once the image is blurred, it is then thresholded, at the level set in the UI (Figure 3.1
shows thresholding at 105).
Figure 3.1: Segmentation Process: shows the image at various points of the segmentation pro-
cess. From top to bottom: colour removed, blurred, thresholded, lines segmented, and the final
letter-segmented image. The area-of-interest (run between thresholding and line segmentation)
is shown separately in Figure 3.2.
Once the image is thresholded, the area of interest is detected (Figure 3.2), which highlights
the section of the image most likely to contain letters. This eradicates noise around the edge of
the image resulting from the scanning process. This area of interest is then scanned for rows
of text (Figure 3.1), and then, in turn, each row of text is then scanned for individual letters.
Once the letters have been found, each individual letter is adjusted in an attempt to find its
best fit. The resulting output can be seen in the final row of Figure 3.1.
21
Figure 3.2: Segmentation Process: Area of Interest Sample Output
This process is able to detect the vast majority of letters within manuscripts, even if high
levels of noise are present. Since the algorithm is very much heuristic, adjustments can be
made in the user interface, through the use of keyboard shortcuts. In order to aid this finishing
process, any letters which the algorithm determines to be anomalous post-segmentation are
automatically flagged for review, meaning that they will be coloured red. This is determined
by validating the width of the letters, since prior knowledge of the manuscripts in question
shows that the vast majority of letters are of an approximately equal width. An example of a
segmented manuscript is shown in Appendix VII.
3.1.2 Effect of Parametric Setting Adjustments
Both the Thresholder and Blurrer algorithms allow the user to adjust their parameters. For
thresholding, this is the colour level at which the colour becomes white. In blurring, it is the
radius around each pixel which is taken into account when blurring the image. The values of
these parameters are crucial in the success of the segmentation process.
Figure 3.3: Effect of Threshold Level Adjustment. From top to bottom: Original Image,
Thresholding at 64, Thresholding at 96, Thresholdng at 128, Thresholding at 160, Thresholding
at 192.
The threshold level is adjustable from 0 (where only black pixels remain black) to 255 (at
which point all pixels will become black). Figure 3.3 shows the effect of various threshold
levels on a section of one of the manuscripts used for evaluation purposes. The first row of
text is the original image, with the lines below showing the result of thresholding from 64 to
192, at intervals of 32. At 192, the vast majority of the image is black, and thus no text is
distinguishable, whilst at 160, text becomes readable, but there is still a large amount of noise.
22
The levels of noise become acceptable at 128, with the only remaining noise being black bars
to the left and right, which would be removed by the area of interest detector. The differences
between 64 and 128, whilst seemingly minor, are actually the variances which make the most
difference to the letter segmentation process. For example, looking at the final two letters
on the second word, ”ad”, at 128 they are clearly joined. Upon reducing the threshold level
to 64, however, the join between them isn’t dark enough to withstand the thresholding, and
they become two distinct letters. This shows the reasoning behind thresholding 3 times in the
process - we threshold at an average, middle ground, level, suggested by the user, and then use
a lower level to detect letters which should be split, and then a higher level to join letters which
have been mistakenly split when thresholded at the original level. This process ensures that
the majority of letters are detected correctly, and, compared to using a single threshold level,
results in a roughly 30% increase in the proportion of accurately detected letters.
The second adjustable parameter, the blur radius, is adjustable in the range m= [0,10].
The decision to limit this to such a small range is due to the running time of the algorithm,
O(nm2). Blurring removes smaller objects of noise by reducing their colour levels, resulting in
them being coloured white by the thresholding algorithm. An example of this is shown to the
far end of the line of text in Figure 3.4. To the right of the final ”L” are two very small dots
of noise. As the blur radius increases, these dots get lighter until, when blurring at a radius of
10px, they are not longer visible.
Figure 3.4: Effect of Blur Level Adjustment. From top to bottom: Original Image, Blurring
at 1px, Blurring at 5px, Blurring at 10px, Blurring at 25px.
Figure 3.4 shows why the ColorRemover algorithm must be run first. Due to the colour of
the paper on which most manuscripts are presented, blurring the image increases the effect of
this greater than average red level, and the entire image takes a red tint. Removing the colour
from the image, however, does not affect the Blurrers ability to remove noise from the image,
and also stops these high red levels from affecting the Thresholder.
3.2 Technical Performance
3.2.1 Resource Usage
The system will fully utilise as much CPU processing power as the computer has available,
and is set to, by default, run within a 2GB java memory heap. During testing, on a Mac OS X
system, after processing 20 images of resolution 4000x6000 pixels in quick succession, memory
usage remained below 1GB. Memory usage below this would be difficult to achieve, due to how
Java keeps BufferedImages in memory whilst they are being iterated over.
23
3.2.2 Algorithm Performance
As described in Section 3.2.3, all algorithms are multithreaded to increase performance levels,
with the exception of LineSegmenter. All designed algorithms run in linear time, with the
exception of the Blurrer algorithm, which runs in O(nm2), where n is the pixel count, and m
is the blur radius. This performance is reasonable with small values of m, thus m is limited to
values 10 in the GUI. With the default blur radius value of 1, the entire process occurs in
linear time, with respect to the number of pixels and letters present in the image. It would be
valid to say that the segmentation process runs in O(lm2n), where lis the number of letters
present in the image, mis the blur radius selected and nis the number of pixels.
It is important to note that the algorithms developed are heuristic, in that they do, in no
way, guarantee an optimal result. The algorithmic parameters are key to the system’s success -
whilst the system will predict the best thresholding level to use, it is important for the user to
use their judgement - since certain aspects can skew this prediction. If the image, for example,
has large dark areas to the top or bottom (or anywhere away from the main block of text), this
may lower the predicted threshold value, which may skew the overall result.
With optimal threshold settings, on average, the system detects approximately 90% of letters
within a manuscript, excluding any which have been the subject of paper damage.
3.2.3 Effectiveness of Parallelisation on Running Time
All algorithms use a dynamic number of threads, matching the number of available CPU
cores on the system, to process several parts of the image simultaneously. The exceptions to
these are LineSegmenter, which, due to the nature of the algorithm, can only use one thread,
and AOIDetector, which always uses 4 threads.
Algorithm No MT 1C 2C 2C (HT) 4C 4C (HT)
Blurrer 1 1.037 0.553 0.593 0.28 0.297
Color Remover 1 1.007 0.563 0.633 0.283 0.303
Thresholder 1 0.997 0.507 0.54 0.277 0.317
AOI Detector 1 1.38 1.187 0.993 0.413 0.533
Line Segmenter 1 1.03 1.017 0.983 0.967 0.967
Letter Segmenter 1 0.993 0.52 0.533 0.267 0.263
Figure 3.5: Average Multithreaded Algorithm Performance (Relative)
Figure 3.5 shows the average performance times of the algorithms, relatively compared to
having no multithreading, over various CPU configurations. This is shown in greater detail (in-
cluding tests with multiple images) in Appendix V. These results show a maximum performance
increase of up to 380% (LetterSegmenter running on 4 cores with hyper threading enabled), and
also show how some some of the algorithms can have a minor decrease in performance under
circumstances. Take, for example, AOIDetector. This algorithm takes 38% and 18.7% longer to
run on systems with 1 or 2 cores, compared to running with no multithreading. Once the core
count reaches 4 (2 with hyper threading results in 4 virtual cores), we start to see a marginal
increase in performance, with the performance increasing dramatically upon reaching 4 physical
cores.
24
This minor decrease in performance for the two non-multithreading-optimised algorithms,
however, is negated by the much larger increase on the slower algorithms (AOIDetector and
LineSegmenter are the two fastest algorithms by nature). On a two core machine, for example,
the 18% performance decrease for AOIDetector is outweighed by the increases for the much
slower Blurrer and Thresholder, which is run 3 times during the segmentation process.
Figure 3.6: Algorithm Performance Graph. HT = Hyper Threading.
The performance of each algorithm over each configuration is graphed in Figure 3.6, with a
polynomial trend line. This graph shows that whilst LineSegmenter remains fairly stable in its
running time over all configurations, the running time of each algorithm is inversely proportional
to the number of cores available (all show a downward trend). For LineSegmenter, which is
not multithreading aware, this slight performance gain is most likely attributed to enhanced
operating system performance with multiple cores (since it will have a full core available, instead
of one core minus operating system resources), and the ability of Java to offload its own threads
(such as the GUI objects) over multiple cores, thus freeing up resources.
The graph shows, however, there is sometimes a slight performance decrease when hyper-
threading is used. This can be attributed to java not being hyper-threading aware, and reporting
the number of virtual CPU cores, as opposed to physical. This results in splitting the same
work, over the same number of physical cores, twice as much, meaning that each core has to
juggle two simultaneous jobs. As of the current Java version, there is no way to retrieve the
number of physical cores, and even with this negligible performance decrease, the performance
increase given between configurations is worthwhile.
25
Chapter 4
Conclusions & Future Research
4.1 Conclusions
4.1.1 Performance
The system achieves high levels of performance, and reasonably high rates of success at
detecting letters within manuscripts. By dividing the problem, the system is able to achieve
linear time performance (dependant on the blur radius setting given by the user), and thus,
depending on the host computer, is able to detect upwards of 1000 letters in a manuscript in
15-30 seconds. Comparing this to manual annotation, if a palaeographer was able to annotate
one letter every 3 seconds, and worked continuously, the same task would take approximately
50 minutes - up to 200 times longer.
4.1.2 Noise Removal
Noise removal takes place in several stages, thresholding removes the majority of noise, whilst
blurring the image removes minor groups of pixels which could affect the segmentation algo-
rithm. Any exterior noise not removed by these methods is then filtered out when the area
of interest is detected. Instead of wasting time attempting to remove high amounts of noise
around the edges of the image, the algorithm adapts to ignore this area, and focus on areas
with a greater than average density of black pixels, in a pattern which suggests text is present.
4.1.3 User Interaction
Whilst the system is able to automatically suggest thresholding levels based on average colour
levels within the image, it allows the user to easily customise the process. It also allows the user
to change how letters are displayed, by setting padding levels, improve the detection of letters
by adjusting the thresholding level (since the user can take into account visual artefacts), and
the amount of blurring that takes place. This also provides a way for the user to adjust how
long the process takes - since higher levels of blurring can make noise removal more reliable, but
also substantially increases the running time as the blurring algorithm runs in quadratic time.
High levels of usability have been maintained in the GUI - ensuring all use cases are easy
to perform. No operation takes more than 4 mouse clicks, and the most commonly required
features have been prominently placed as buttons on the main screen. The image library is
fully searchable, which allows the user to maintain a large library of images without decreasing
usability. Several smaller functions have also been implemented, such as printing support, and
26
the ability to handle deleted images without having to resort to manually deleting files from
within the operating system.
4.2 Future Research & Extensions
4.2.1 K-Means Clustering Implementation
Although the current letter adjustment algorithm is fast, and achieves relatively high levels of
accuracy, it could be improved through the use of K-Means Clustering. K-Means Clustering was
not implemented in this project for two reasons - firstly, it would increase the complexity of the
programming required dramatically, thus delaying the project. Secondly, and more importantly,
it would increase the asymptotic running time of application to extremely high levels.
With further research, however, this could be implemented in reasonable time. One example
of how this would be possible is to follow the approach that this entire project has taken -
divide and conquer. Instead of performing one large clustering run on the entire image, with
each known letter to be a mean, one could perform clustering around each individual letter, or
group thereof, only including pixels within a certain radius of each letter.
4.2.2 Artificial Intelligence
Another way in which the system could be improved with further work is through the use of
artificial intelligence. The system, in its current state, is dumb, in that it doesn’t learn from
its mistakes or user actions. The system could, for example, learn about thresholding levels on
different styles of images based on the post-processing adjustments that the user makes. An
example of this in practice would be that if the user adjusts a large proportion of detected
letters to be taller, the letter segmentation algorithm needs to be adjusted when run on similar
images.
This could also be a great benefit to the threshold prediction function, which currently per-
forms a linear operation on the average colour within the image. The system could learn which
threshold levels achieve best performance on images with differing characteristics, thus using
this information to improve its prediction.
4.2.3 Interaction with Existing and Future Tools
One of the key areas of extension, however, is the system’s ability to interact with other tools.
Since the system both reads from, and saves to, a library file in standards-compliant JSON
format, it is perfectly possible for other systems to read data from this file. Take, for example,
an OCR system which is capable of very high identification rates of individual letters. Such a
system could read each letters coordinates from the JSON file, attempt to detect the letter, and
feed this information back into the JSON file. This would allow this system to automatically
identify the detected letters without further programming.
A further extension of this concept would be to expand the file formats which the system is
capable of using - since JSON is still a fairly rare format. The system could, in theory, use a
library formatted in XML, CSV, or SQL, allowing an even wider range of systems to interact.
27
Chapter 5
References
[1] DigiPal, 2012. Digital Resource for Palaeography. [online] Available at: <http://www.digipal.eu/>
[Accessed April 2012]
[2] Tony Jebara Columbia University, 2000. The Sobel Operator. [online image] Available at:
<http://www.cs.columbia.edu/˜jebara/htmlpapers/UTHESIS/node15.html>[Accessed April 2012]
[3] Northern Kentucky University, 2003. Light Contrast Illusions. [online image] Available at:
<http://www.nku.edu/ issues/illusions/LightContrasts.htm>[Accessed April 2012]
[4] Pal, N. R. and Pal, S. K., 1993. A Review on Image Segmentation Techniques. Pattern
Recognition, 26(9), pp.1277-1294.
[5] Gonzalez, R. C. and Woods, R. E., Digital Image Processing. 3rd ed. Prentice Hall
[6] Lee, J., 1983. Digital image smoothing and the sigma filter. Computer Vision, Graphics and
Image Processing, 24(2), pp. 255-269.
[7] Gull, S. F. and Daniell, G. J., 1978. Image reconstruction from incomplete and noisy data.
Nature, 272, pp. 686-690.
[8] Cheriet, M., Said, J.N. and Suen, C.Y., 1998. A recursive thresholding technique for image
segmentation. Image Processing, IEEE Transactions on, 7(6), pp. 918-921.
[9] Wagstaff, K., Cardie, C., Rogers, S. and Schroedl, S., 2001. Constrained K-means Cluster-
ing with Background Knowledge. Proceedings of the Eighteenth International Conference on
Machine Learning, pp. 557-584.
[10] Bradley, P. S. and Fayyad, U. M., 1998. Refining Initial Points for K-Means Clustering.
Proceedings of the 15th International Conference on Machine Learning, pp. 91-99.
[11] Canny, J., 1986. A Computational Approach to Edge Detection. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 8(6), pp. 679-698.
[12] Marr, D. and Hildreth, E., 1980. Theory of Edge Detection Proceedings of the Royal Society
of London. Series B, Biological Sciences 207(1167) pp. 187-217.
[13] Mumford, D. and Shah, J., 1985. Boundary Detection By Minimizing Functionals. In
IEEE Conference on Computer Vision and Pattern Recognition, pp. 22-26.
[14] Boyle, L. E., 1984. Medieval Latin palaeography: a bibliographical introduction. University
of Toronto Press.
[15] Stokes, P. A., 2007. Palaeography and Image-Processing: Some Solutions and Problems.
Digital Medievalist. [online journal] Available at <http://www.digitalmedievalist.org/journal/3/stokes/>
[Accessed April 2012]
[16] Inaba, M., Katoh, N. and Imai, H., 1994. Applications of weighted Voronoi diagrams and
randomization to variance-based k-clustering (extended abstract). Proceedings of the tenth an-
nual symposium on Computational geometry, pp. 332-339.
[17] Oracle, 2004. Bug ID: 5082531 JScrollPane and FlowLayout do not interact properly.
[online] Available at: <http://bugs.sun.com/bugdatabase/view bug.do?bug id=5082531>[Ac-
28
cessed June 2012]
[18] Soler, L., Delingette, H., Malandain, G., et. al, 2001. Fully automatic anatomical, patholog-
ical, and functional segmentation from CT scans for hepatic surgery. Computer Aided Surgery,
6(3), pp. 131-142.
[19] Garain, U., Parui, S. K., Paquet, T. andHeutte, L, 2007. Machine Dating of Handwritten
Manuscripts. Ninth International Conference on Document Analysis and Recognition, 2, pp.
759-763.
[20] Tizhoosh, H. R., 2005. Image thresholding using type II fuzzy sets. Pattern Recognition,
38(12), pp. 2363-2372.
29
Appendix
Appendix I: Sample Raw Manuscript Image
CCCC 162, pp. 1 - 138, 161 - 564: 109. The Master and Fellows of Corpus Christi College,
Cambridge.
30
Appendix II: Algorithm Data Flow Design
31
Appendix III: JSON Format Example
{
’manuscript’ :
[
{
’filename’ : ’!CCC111_007v2.jpg’,
’label’ : ’!CCC111_007v2.jpg’,
’notes’ : ’No notes’
’importdate’ : ’19/07/2012’,
’blur’ : ’0’,
’padding’ : ’0’,
’threshold’ : ’53’,
’processdate’ : ’19/07/2012’,
’letter’ :
[
{
’filename’ : ’!CCC111_007v2.jpg’,
’label’ : ’Unknown Letter’,
’notes’ : ’No Notes’,
’flagged’ : ’false’,
’geometry’:
{
’type’: ’Polygon’,
’coordinates’ : ’[[1288, 923], [1299, 923], [1288, 930], [1299, 930]]’
},
’crs’:
{
’type’: ’name’,
’properties’:
{
’name’: ’EPSG:3785’
}
},
’type’: ’Feature’,
’properties’:
{
’saved’: 1
}
}
]
}
}
}
32
Appendix IV: Process Panel GUI Screenshot
33
Appendix V: Parallelisation Performance Testing
This table shows the relative performance of all algorithms running on various CPU configu-
rations (HT suggests that hyper threading was enabled, and thus the test was using double the
number of virtual cores), compared to the algorithm running without multithreading. Testing
was conducted in a virtualised Mac OS X 10.7 installation, running on an Apple iMac, with
Intel Core i7 2600 CPU (3.4GHz), 32GB RAM (8GB available to the virtual machine) and a
solid state drive. All processor core limitations were applied in the used virtualisation software,
to ensure that the results were not affected by the use of different specification processors.
Algorithm Image No MT 1C 2C 2C (HT) 4C 4C (HT)
1 1 0.98 0.61 0.72 0.28 0.3
Blurrer 2 1 1.03 0.57 0.6 0.26 0.28
3 1 1.01 0.51 0.58 0.31 0.33
Average 1 1.007 0.563 0.633 0.283 0.303
1 1 1.11 0.58 0.59 0.26 0.28
Color 2 1 1.03 0.53 0.58 0.27 0.3
Remover 3 1 0.97 0.55 0.61 0.31 0.31
Average 1 1.037 0.553 0.593 0.28 0.297
1 1 1.01 0.53 0.55 0.28 0.29
Thresholder 2 1 1.01 0.48 0.56 0.25 0.31
3 1 0.97 0.51 0.51 0.3 0.35
Average 1 0.997 0.507 0.54 0.277 0.317
1 1 1.58 1.31 1.12 0.5 0.61
AOI 2 1 1.36 1.2 0.95 0.41 0.5
Detector 3 1 1.2 1.05 0.91 0.33 0.49
Average 1 1.38 1.187 0.993 0.413 0.533
1 1 0.99 1.02 0.97 0.95 0.97
Line 2 1 1 0.98 0.99 0.96 0.95
Segmenter 3 1 1.1 1.05 0.99 0.99 0.98
Average 1 1.03 1.017 0.983 0.967 0.967
1 1 0.95 0.5 0.52 0.25 0.26
Letter 2 1 0.98 0.51 0.55 0.24 0.28
Segmenter 3 1 1.05 0.55 0.53 0.31 0.25
Average 1 0.993 0.52 0.533 0.267 0.263
34
Appendix VI: Segmentation Accuracy Testing Results
The following table shows the results of testing the accuracy of the segmentation process.
Images were segmented using their optimal values, and the proportion of letters within the image
accurately detected, and the number of false detections, were measured. These two measures
are independent -Letters Detected represents the percentage of letters detected successfully,
whilst the False Positive Rate is the proportion of boxes which detect noise as a letter.
Image Letters Detected False Positive Rate
Image 1 87% 5%
Image 2 91% 6%
Image 3 95% 4%
Image 4 94% 2%
Image 5 85% 5%
Image 6 84% 4%
Image 7 92% 5%
Image 8 87% 3%
Image 9 89% 4%
Image 10 93% 6%
Average 89.8% 4.4%
35
Appendix VII: Sample Segmented Image (Post-Processing)
CCCC 162, pp. 1 - 138, 161 - 564: 109. The Master and Fellows of Corpus Christi College,
Cambridge.
36
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper we consider thek-clustering problem for a set S of n points i=xi in thed-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the inter-cluster criterion to minimize, the sum on intra-cluster errors over every cluster is used, and as the intra-cluster criterion of a clusterSj , a -1pi∈Sjxi-x Sj 2 is considered, where &dot; is the L2 norm and xSj is the centroid of points in Sj, i.e., 1/Sjpi∈Sjxi. The cases of a=1,2 correspond to the sum of squared errors and the all-pairs sum of squared errors, respectively. The k-clustering problem under the criterion with a=1,2 are treated in a unified manner by characterizing the optimum solution to the kclustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shutter function for the Voronoi diagrams. The primary shutter function is shown to be OnOkd, which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intra-cluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an ∈–approximate 2–clustering inOn1/∈d time, which is quite practical and may be used to real large-scale problems such as the color quantization problem.
Article
Full-text available
This paper considers the application of image-processing and data-mining to the analysis of scribal hands. The work of forensic document analysts on feature-extraction is considered, particularly the algorithms developed for automatic handwriting-recognition by Srihari, and by Bulacu and Schomaker. Automatic clustering is also considered using the AutoClass package. Preliminary results of the author's own experiments with these approaches are presented, and some of the obstacles are outlined which must be overcome before a practical system can be developed for the automatic identification of medieval scribes.
Article
Results are presented of a powerful technique for image reconstruction by a maximum entropy method, which is sufficiently fast to be useful for large and complicated images. Although our examples are taken from the fields of radio and X-ray astronomy, the technique is immediately applicable in spectroscopy, electron microscopy, X-ray crystallography, geophysics and virtually any type of optical image processing. Applied to radioastronomical data, the algorithm reveals details not seen by conventional analysis, but which are known to exist.
Article
Objective: To improve the planning of hepatic surgery, we have developed a fully automatic anatomical, pathological, and functional segmentation of the liver derived from a spiral CT scan. Materials and methods: From a 2 mm-thick enhanced spiral CT scan, the first stage automatically delineates skin, bones, lungs, kidneys, and spleen by combining the use of thresholding, mathematical morphology, and distance maps. Next, a reference 3D model is immersed in the image and automatically deformed to the liver contours. Then an automatic Gaussian fitting on the imaging histogram estimates the intensities of parenchyma, vessels, and lesions. This first result is next improved through an original topological and geometrical analysis, providing an automatic delineation of lesions and veins. Finally, a topological and geometrical analysis based on medical knowledge provides hepatic functional information that is invisible in medical imaging: portal vein labeling and hepatic anatomical segmentation according to the Couinaud classification. Results: Clinical validation performed on more than 30 patients shows that delineation of anatomical structures by this method is often more sensitive and more specific than manual delineation by a radiologist. Conclusion: This study describes the methodology used to create the automatic segmentation of the liver with delineation of important anatomical, pathological, and functional structures from a routine CT scan. Using the methods proposed in this study, we have confirmed the accuracy and utility of the creation of a 3D liver model compared with the conventional reading of the CT scan by a radiologist. This work may allow improved preoperative planning of hepatic surgery by more precisely delineating liver pathology and its relationship to normal hepatic structures. In the future, this data may be integrated with computer-assisted surgery and thus represents a first step towards the development of an augmented-reality surgical system.
A conceptually simple but effective noise smoothing algorithm is described. This filter is motivated by the sigma probability of the Gaussian distribution, and it smooths the image noise by averaging only those neighborhood pixels which have the intensities within a fixed sigma range of the center pixel. Consequently, image edges are preserved, and subtle details and thin lines such as roads are retained. The characteristics of this smoothing algorithm are analyzed and compared with several other known filtering algorithms by their ability to retain subtle details, preserving edge shapes, sharpening ramp edges, etc. The comparison also indicates that the sigma filter is the most computationally efficient filter among those evaluated. The filter can be easily extended into several forms which can be used in contrast enhancement, image segmentation, and smoothing signal-dependent noisy images. Several test images 128 × 128 and 256 × 256 pixels in size are used to substantiate its characteristics. The algorithm can be easily extended to 3-D image smoothing.
Article
Image thresholding is a necessary task in some image processing applications. However, due to disturbing factors, e.g. non-uniform illumination, or inherent image vagueness, the result of image thresholding is not always satisfactory. In recent years, various researchers have introduced new thresholding techniques based on fuzzy set theory to overcome this problem. Regarding images as fuzzy sets (or subsets), different fuzzy thresholding techniques have been developed to remove the grayness ambiguity/vagueness during the task of threshold selection. In this paper, a new thresholding technique is introduced which processes thresholds as type II fuzzy sets. A new measure of ultrafuzziness is also introduced and experimental results using laser cladding images are provided.
Article
Many image segmentation techniques are available in the literature. Some of these techniques use only the gray level histogram, some use spatial details while others use fuzzy set theoretic approaches. Most of these techniques are not suitable for noisy environments. Some works have been done using the Markov Random Field (MRF) model which is robust to noise, but is computationally involved. Neural network architectures which help to get the output in real time because of their parallel processing ability, have also been used for segmentation and they work fine even when the noise level is very high. The literature on color image segmentation is not that rich as it is for gray tone images. This paper critically reviews and summarizes some of these techniques. Attempts have been made to cover both fuzzy and non-fuzzy techniques including color image segmentation and neural network based approaches. Adequate attention is paid to segmentation of range images and magnetic resonance images. It also addresses the issue of quantitative evaluation of segmentation results.