ArticlePDF Available

Bi-level Document Image Compression using Layout Information

Authors:

Abstract and Figures

This paper describes the document image compression process, and gives an account of how the zones in a document are located and how they are classified into text and non-text. We provide an extensive comparison of the compression achieved by this new system with JBIG (CCITT, 1993), 2
Content may be subject to copyright.
1
Bi-level Document Image Compression
using Layout Information
Stuart J. Inglis and Ian H. Witten
Computer Science, University of Waikato, Hamilton, New Zealand
email: {singlis, ihw}@cs.waikato.ac.nz
1 Introduction
Most bi-level images stored on computers today comprise scanned text, and their number is
escalating because of the drive to archive large volumes of paper-based material electronically.
These documents are stored using generic bi-level image technology, based either on classical run-
length coding, such as the CCITT Group 4 method, or on modern schemes such as JBIG that
predict pixels from their local image context. However, image compression methods that are
tailored specifically for images known to contain printed text can provide noticeably superior
performance because they effectively enlarge the context to the character level, at least for those
predictions for which such a context is relevant. To deal effectively with general documents that
contain text and pictures, it is necessary to detect layout and structural information from the image,
and employ different compression techniques for different parts of the image. Such techniques are
called document image compression methods.
A scheme that extracts characters from document images and uses them as the basis for
compression was first proposed over twenty years ago by Ascher and Nagy (1974), and many
other systems have followed (Pratt et al., 1980; Johnsen et al., 1983; Holt, 1988; Witten et al.,
1992, 1994). The best of these methods break the compression process into a series of stages and
use different techniques for each. Whereas the images of the actual characters must necessarily be
compressed using an image compression model, their sequence can be better compressed using a
standard text compression scheme such as PPMC (Cleary and Witten, 1984; Bell et al., 1990).
The present project extends these systems to include an automatic discrimination between text and
non-text regions in an image. If a particular area of an image is known to be a photograph, for
example, a more appropriate compression model can be used. In order to do so, the document
must first be divided into apparently homogeneous regions or zones; in order to do that, it is
necessary to make estimates of the inter-line and inter-character spacing. The analysis of document
layout has been investigated previously by several researchers (e.g. Casey et al., 1992; Nagy et
al., 1992; Pavlidis and Zhou, 1992; O’Gorman, 1993); but the methods often depend on particular
numeric values or rules that relate to the resolution, print size and typographical conventions of the
documents being analyzed, and on hand-tailored heuristics to differentiate blocks of text from
drawings and halftones. Our emphasis is on completely automatic processing that does not require
any manual intervention or tuning, except for the fact that it may be trained on representative
examples of documents that have been segmented and classified manually.
This paper describes the document image compression process, and gives an account of how the
zones in a document are located and how they are classified into text and non-text. We provide an
extensive comparison of the compression achieved by this new system with JBIG (CCITT, 1993),
2
the standard for bi-level image compression, on one thousand document images.
1
We begin by
mentioning a large publicly available corpus of scanned images that was used for the work, in
order that it can be replicated. The resulting document image compression system achieves lossless
compression ratios 25% greater than JBIG, and very slightly lossy ratios that are nearly three times
as great as JBIG.
2 Image Corpus
Previous researchers on document image compression have obtained their results using the eight
standard CCITT facsimile test images (Hunter and Robinson, 1980), or with unavailable private
image collections. We have selected a large publicly-available database of 979 bi-level images,
published by the University of Washington, as the image corpus for the experiments (University
of Washington, 1993).
The images in the UW ENGLISH DOCUMENT IMAGE DATABASE I are scanned at 300 dpi from 160
different journals. Many have small non-zero skew angles, and in some cases text is oriented
vertically instead of horizontally. Each one has been manually segmented into zones; there are a
total 13,831 zones in the database or an average of 14 per image. For instance, a particular image
may be divided into six text zones, one halftone zone, and one table zone. The corpus contains a
total of twelve zone classes. Text zones, which are by far the most frequent (88% of all zones),
represent paragraphs, bibliography sections, page numbers and document titles; other zone types
include line drawings, halftones, math, and rules.
3 Document Image Compression
Document image compression works by building a dictionary of repeated patterns, usually
representing characters, from an image and replacing them by pointers into the pattern dictionary.
Because most English characters contain only one connected area, we define a component as an
eight-connected set of black pixels and use the component in all stages of the system.
The first stage is to extract the components from the image. The dictionary is built from these using
a template-matching procedure. As the components are processed, the patterns belonging to each
dictionary entry—that is, all instances of that character—are averaged. When all components have
been processed, the dictionary contains an average, supposedly representative, example of each
different component, and the components in the image can be represented as indexes into the
dictionary.
This kind of compression, where a dictionary is created and indexes into it are coded, can be
viewed as a type of (LZ78-style) macro compression. However, during the matching process that
determines whether or not two components are similar, matches are accepted even if the
components are not completely identical. Thus when the image is reconstructed from the
dictionary, there is noise around the edges of components. The difference between the
reconstructed image and the original is known as the halo image. Of course, if exact matching were
enforced, the dictionary would become almost as large as the number of components in the image,
because the noise makes each component unique.
1
This present version of the paper contains preliminary results for a subset of 200 images.
3
A simple model of the process consists of five steps:
extract all components from the image;
build a dictionary based on the components;
compress the dictionary;
compress the components using the dictionary;
compress the halo to form a lossless image.
This model is described by Witten et al. (1992). In this paper we extend it to cater for zones of
different types. Components are clustered into zones, and the zones classified into the types text,
non-text and noise. Only the text components are used in the dictionary building stage. The halo of
errors around the reconstructed characters is compressed as before. Now, however, there is a
further image, called the remainder, which comprises the non-text regions (including noise).
The new model contains eight steps:
extract all components from the image;
group the components into zones;
classify each zone into text, non-text or noise;
build a dictionary based on the text zones;
compress the dictionary;
compress the text components using the dictionary;
compress the remainder—the non-text and noise regions;
compress the halo to form a lossless image.
We briefly describe each step. Figure 2 gives an example image and shows how it is broken down
into parts.
COMPONENT EXTRACTION
Components are located by processing the image in raster-scan order until the first black pixel is
found. Then a boundary tracing procedure is initiated to determine the width and height of the
component containing that pixel. A standard flood-fill routine is called to extract the component
from the image. As each component is extracted, several features, including the centroid and area,
are determined and used later for comparing and classifying components.
DOCUMENT ZONE CLUSTERING
Components are grouped into zones by clustering them using a bottom-up technique known as the
docstrum (O’Gorman, 1993). This involves finding the k nearest neighbors of each component,
using the Euclidean distance between the components’ centroids. Each component is linked with
its neighbors to form a graph. In our experiments k=5 is used, because the five closest neighbors
usually include two or three components in the same word, or the same line, and two components
in adjacent lines.
Once the nearest neighbors are found, links in the graph between components that exceed a
threshold are broken. This is a dynamic threshold, calculated directly from the docstrum, and is
defined as the smaller of three times the most common component-within-line distance, and
2
times the average between-line distance. After the links whose lengths exceed this threshold
have been broken, the transitive closure of the graph is formed. The resulting sub-graphs represent
document zones.
4
ZONE CLASSIFICATION
To use the appropriate compression model, each zone’s classification must be determined. Instead
of developing an ad hoc heuristic to distinguish the zones, we use a machine learning technique to
generate rules automatically for each possible type of zone. Machine learning methods learn how to
classify instances into known categories. The process involves a training stage, which operates on
pre-classified instances, followed by a testing stage, which uses the rules obtained during training
on instances whose class is unknown. We use the well-known and robust C4.5 method (Quinlan,
1992), having determined in preliminary tests that it achieves consistently good results on this
problem compared with other machine learning methods (Inglis and Witten, 1995).
Six features are calculated for each zone:
zone density—number of components per unit area;
component density—mean density of black pixels in the components’ bounding boxes;
aspect ratio—mean ratio of height to width of the components’ bounding boxes;
circularity—the square of the perimeter divided by the area, averaged over all components;
separation—mode distance between a component and its nearest neighbor;
run length—mean horizontal run length of the components’ black pixels.
The 13,831 zones in the database images have been classified manually and comprise a total of
12,216 text and 1615 non-text zones. They were divided randomly in half to form training and test
sets. The features for the training set, along with the manually-assigned class, text or non-text,
were given to C4.5, which produced a decision tree. When this tree was used to classify the zones
in the test set, 96.7% accuracy was achieved in replicating the manually-assigned classifications.
(This result is averaged over 25 runs, using different random splits each time.) Figure 1 shows a
simple 10-node decision tree that is generated for the simpler task of discriminating text regions
from ones containing halftone. Trees for the text vs. non-text discrimination are three or four times
as large as this.
For example, Figure 2(a) shows an image in its original form, and Figure 2(b) shows the text
zones that are identified by segmenting it into zones as described above and classifying them
automatically. The training data for the classifier comprises image zones that were determined
manually: these are not the same as the zones generated automatically. In general, the automated
procedure finds more zones, although this does depend on the distance threshold.
DICTIONARY GENERATION
The dictionary is built using only the zones that are classified as text. The dictionary-building
operation relies on the ability to compare two different components to see whether they represent
the same character. This comparison operation is known as template matching. The template
matcher aligns the components by their centroids and processes them pixel by pixel. The method
that we use rejects a match if the images differ at any point by a cluster of pixels of greater than a
certain size; in order to detect mismatches due to the presence of a thin stroke or gap in one image
but not the other, another heuristic is used (Holt, 1988).
The easiest way to build the dictionary is to store the first example of each different component as a
reference example for subsequent comparisons. However, compression is adversely effected if the
first component is a poor representative of its class. Consequently a running average is made of all
examples of each different component during the dictionary-building process. As components are
5
mean component density
halftone
> 0.82
mean component density
<= 0.82
mean horiz run length
<= 0.75
most common dist
> 0.75
mean horiz run length
<=15.9
mean horiz run length
> 15.9
text
> 9
mean horiz run length
<= 9
text
<=11.52
number component/box
> 11.52
text
<= 114.2
halftone
> 114.2
text
<=0.93
halftone
> 0.93
halftone
> 10.4
mean horiz run length
<= 10.4
text
> 5.3
number component/box
<=5.3
text
<= 3.2
halftone
> 3.2
Figure 1: Decision tree discriminating text from halftones
extracted from the image, they are checked against the dictionary to determine whether a
sufficiently similar component has been seen before. If not, the new component is added to the
dictionary. Otherwise, if the component did match a dictionary element, that element is updated by
incorporating the new component into the running average.
Because of this averaging process, it is possible that once the dictionary has been finalized a
component might not match any of the elements in the dictionary. Therefore after the building
process, each component is re-checked against the dictionary and added if necessary. This happens
rarely: it accounts for less than 2% of dictionary entries.
DICTIONARY COMPRESSION
The dictionary elements are compressed using Moffat’s (1991) two-level context method. This
scheme gives the highest compression of all pixel-context based methods. The compression model
is initialized at the start of compression, and the statistics are carried over from one dictionary
element to the next.
TEXT COMPONENT COMPRESSION
The components in each text zone are processed sequentially. First, their indexes into the
dictionary are compressed using PPMC (Bell et al., 1990). Then the relative position of each
component with respect to its predecessor in the zone is calculated. These offsets are compressed
using a three-tier model, using the first model if possible, the second one if the statistics required
by the first are not available, and the third as a last resort. The first model contains counts of offset
6
values with respect to a dictionary element; that is, the offsets are character-specific. The second
accumulates all offset information for every dictionary element; a generic model for all characters.
Finally, if neither of these first two models can be used, the offset is gamma encoded (Elias,
1975).
At this stage the dictionary has been compressed, the component order is known, and the
component positions in the image are known. Thus the recipient can form a reconstructed image
using the average components from the dictionary.
REMAINDER COMPRESSION
The remainder image consists of the non-text and noise components. An example is shown in
Figure 2(d). As the remainder is independent of the previous stages, the other images are of no
help in coding it. Therefore it is compressed using the two-level context method. If the system is
processing an image in lossy mode, this is the end of the compressed stream.
HALO COMPRESSION
As there is only one example of each component in the dictionary, pixel errors occur around the
borders of characters when the reconstructed image is compared with the original. These errors
constitute the halo image, and Figure 2(c) shows an example—which is the XOR difference
between Figure 2(a) and 2(b). It is possible from the halo image to make out the outlines of
characters, and in some places actually to read them. Because the reconstructed image has already
been coded, it makes sense to encode the halo with respect to it. In fact, since the entire
reconstructed image is available, a context that is centered on the pixel to be coded can be used.
This has been coined “clairvoyant” compression (Witten et al., 1992). In practice, it turns out that
coding the original image using a clairvoyant context in the reconstructed image leads to higher
compression than if the halo is coded using the same context, because the halo is complicated by
its exclusive-or nature. The compressed stream is now lossless.
4 Experiments
In this section, the document image compression scheme described above is compared with two
pixel-context models: JBIG
2
and Moffat’s (1991) two-level compression scheme.
The JBIG implementation gave an overall compression ratio of 17.5:1 over the corpus of images,
while the two-level scheme achieved 19.1:1. The document image compression scheme was run in
two different modes, with and without inclusion of the halo. The lossless mode, with the halo,
achieved 22.1:1 and the lossy mode 49.4:1. These results are summarized in Table 1. It should be
noted that the lossy mode guarantees a very good approximation to the original image: it is
impossible for a block of more than four incorrect pixels to occur. Arguably, the lossy image is
actually cleaner than the original because all examples of each letter are exactly the same.
The times taken to compress and decompress, measured on a SUN SparcCenter 1000, are shown
in Table 2.
3
The document compression scheme is asymmetrical in that dictionary building and
2
We have used an implementation of JBIG written by Markus Kuhn (1995), because it is the most complete of the
two publicly available implementations.
3
The times quoted for the document compression scheme are somewhat pessimistic because no serious attempt has
been made to optimize the code.
7
Scheme Total size Compression
ratio
JBIG 11 841 564 17.5
Two-level 10 854 655 19.1
Lossless 9 394 914 22.1
Lossy 4 202 827 49.4
Table 1: Compression figures for the different schemes
Scheme Compression
time (min:sec)
Decompression
time (min:sec)
JBIG 0:22 0:23
Two-level 1:00 1:04
Lossless 5:00 2:00
Lossy 3:50 0:20
Table 2: Compression time per image for each scheme
Header Dictionary Symbols Offsets Halo Remainder Footer
0.1% 5.8% 5% 4% 60% 25% 0.1%
Table 3: Relative component sizes for document image compression
zone classification are not required when decompressing. This contrasts with JBIG and the two-
level scheme, for which compression and decompression take about the same time. Lossless
compression in the document-based scheme is fifteen times slower than JBIG, decompression is
six times slower. In lossy mode, compression is ten times slower and decompression is slightly
faster. Table 2 shows the full results for all schemes.
Due to the asymmetric nature of the document image compression system, it is most applicable to
situations where lossy decompression is time-critical, such as on-line searches, or when
compression time is unimportant, such as in CD-ROM mastering.
Table 3 shows the contribution of the different kinds of information in lossless document image
compression to the total compressed size, for a typical example image. The halo is over half the file
size, which explains why the lossy compression ratio is more than twice as great the lossless one.
5 Conclusion
We have extended a document image compression system to include layout and zone classification
information, and made it robust and suitable for general-purpose use by removing the necessity for
manual intervention or tuning to particular typographical conventions. The resulting compression
scheme yields a 25% improvement over JBIG. Using the system in lossy mode where the halo
around the characters is not encoded, the compression ratio becomes nearly three times that for
JBIG. The new scheme is asymmetric in that decompression is much faster than compression, but
both are considerably slower than JBIG—except for lossy decompression, which is slightly faster.
8
References
Ascher, R.N. and Nagy, G. (1974) “A means for achieving a high degree of compaction on scan-digitized printed
text.” IEEE Transactions of Computers, vol. 23, no. 11, pp. 1174–1179, November.
Bell, T.C., Cleary, J.G. and Witten, I.H. (1990) “Text compression”, Englewood Cliffs, NJ, USA: Prentice Hall
Casey, R. Ferguson, D., Mohiuddin, K. and Walach, E. (1992) “Intelligent forms processing system.” Machine
Vision and Applications, vol. 5, pp. 143–155.
CCITT (1993) “Progressive bi-level image compression." Recommendation T.82; also appears as ISO/IEC standard
11544.
Cleary, J.G. and Witten, I.H. (1984) “Data compression using adaptive coding and partial string matching,” IEEE
Trans. on Communications, COM-32, pp. 396–402.
Elias, P. (1975) “Universal codeword sets and representations of the integers.” IEEE Trans Information Theory, vol.
IT-21, pp. 194–203.
Holt, M.J. (1988) “A fast binary template matching algorithm for document image data compression." in Pattern
Recognition, J. Kittler Ed. Berlin, Germany: Spring Verlag.
Hunter, R. and Robinson, A.H. (1980) “International digital facsimile coding standard.” Proc IEEE, vol. 68, no. 7,
pp. 854–867.
Inglis, S. and Witten, I.H. (1995) “Document zone classification using machine learning.” Proc. Digital Image
Computing: Techniques and Applications, Brisbane, Australia, December.
Johnsen, O., Segen, J. and Cash, G.L. (1983) “Coding of two-level pictures by pattern matching and substitution."
Bell System Tech. Jour., vol. 62, no. 8, pp. 2513–2545, May.
Kuhn, M. (1995) URL: http://wwwcip.informatik.uni-erlangen.de/user/mskuhn
Moffat, A. (1991) “Two level context based compression of binary images." in Proc. IEEE Data Compression
Conference, Los Alamitos, CA, USA: IEEE Computer Society Press, pp. 328–391, April.
Nagy, G., Seth, S. and Viswanathan, M. (1992) “A prototype document image analysis system for technical
journals.” IEEE Computer, pp. 10–22; July.
O’Gorman, L. (1993) “The document spectrum for page layout analysis." IEEE Trans Pattern Analysis and Machine
Intelligence, vol. PAMI-15, no. 11, pp. 1162–1173, November.
Pavlidis, T. and Zhou, J. (1992) “Page segmentation and classification.” CVGIP: Graphical Models and Image
Processing, vol. 54, no. 6, pp. 484–496.
Pratt, W.K., Capitant, P.J., Chen, W.H., Hamilton, E.R. and Wallis, R.H. (1980) “Combined symbol matching
facsimile data compression system." Proc. IEEE, vol. 68, no. 7, pp. 786–796, July.
Quinlan, J.R. (1992) C4.5: Programs for Machine Learning, Morgan Kaufmann.
University of Washington English Document Image Database I, (1993) Seattle, WA, USA.
Witten, I.H., Bell, T.C., Harrison, M.H., James, M.L. and Moffat. A. (1992) “Textual image compression.” Proc.
DCC, edited by J.A. Storer and M. Cohn, pp. 42–51. IEEE Computer Society Press, Los Alamitos, CA.
Witten, I.H., Bell, T.C., Emberson, H., Inglis, S. and Moffat, A. (1994) “Textual image compression: Two-stage
lossy/lossless encoding of textual images." Proc. IEEE, vol. 82, no. 6, pp. 878–888, June.
9
Figure 2(a): Original Image Figure 2(b): Text components
Figure 2(c): Halo image Figure 2(d): Remainder image
... Extracting components is often achieved as a two-stage process. Use of nested component extraction began with Johnsen et al. [JSC83], and was continued by Witten et al. [WBE + 94], Inglis and Witten [IW96] and most recently by Zhang [Zha97]. ...
Article
Document image compression reduces the storage requirements for digitised books or documents by using characters as the fundamental unit of compression. Compression gains can be achieved by identifying regions that contain text, isolating unique characters, and storing them in a codebook. This thesis investigates several fundamental areas of the compression process. Algorithms for each area are tested on a corpus of images and the improvements tested for statistical significance. Methods for isolating characters from a bitmap are investigated along with techniques for determining reading order. We introduce the use of the docstrum to aid image compression and show that it improves upon previous methods. The Hough transform is shown to be an accurate method for determining page skew and gives robust results over a range of image resolutions. Compression is shown to improve when the skew of an image is determined automatically, and used to determine reading order. If images can be segm...
Article
Full-text available
When processing document images, an important step is classifying the zones they contain into meaningful categories such as text, halftone pictures, line drawings, and mathematical formulae. A character recognition system, for example, may confine its attention to zones that are classified as text, while in an image compressor may employ specialized techniques and models for zones such as halftone pictures. The penalty for incorrect classification may range from incorrect interpretation to reduced efficiency. In any case, the higher the classification accuracy, the better the results. This classification problem is not new. For example, Wang and Srihari [1] described a method for zone classification that gave 100% accuracy on the images they used for testing. But these images contained just 41 zones—and one of the categories occurred only once. Previous approaches to the problem generally describe a new set of features and assign classes using some linear weighted formula or nonlinear heuristic. In contrast, our work uses pre-defined features and investigates the application of standard machine learning methods, using a large publicly-available document database as a source of training and test data.
Conference Paper
Full-text available
The authors describe a method for lossless compression of images that contain predominantly typed or typeset text-they call these textual images. An increasingly popular application is document archiving, where documents are scanned by a computer and stored electronically for later retrieval. Their project was motivated by such an application: Trinity College in Dublin, Ireland, are archiving their 1872 printed library catalogues onto disk, and in order to preserve the exact form of the original document, pages are being stored as scanned images rather than being converted to text. The test images are taken from this catalogue. These typeset documents have a rather old-fashioned look, and contain a wide variety of symbols from several different typefaces-the five test images used contain text in English, Flemish, Latin and Greek, and include italics and small capitals as well as roman letters. The catalogue also contains Hebrew, Syriac, and Russian text.< >
Conference Paper
Intelligent document segmentation can bring electronic browsing within the reach of most users. The authors show how this is achieved through document processing, analysis, and parsing the graphic sentence.
Article
A pattern matching approach is proposed for coding of two-level pictures. Patterns, which are either symbols such as characters, or fractions of black regions, such as line segments, are extracted from the facsimile. They are compared and matched to already transmitted patterns, called library patterns. If a correct match is detected, only the position of the pattern and the identification of the matching library pattern are transmitted. If a pattern does not match any library pattern, it is added to the library and its binary description is transmitted. Compared to conventional two-dimensional codes, the compression is often doubled and is sometimes 4.5 times higher. Compared to a symbol-matching coding technique,2 the compression has increased by 20 to 80 percent, depending upon the document.
Conference Paper
In the coding of digital facsimile documents, a number of non-information-preserving codes have been proposed which make use of the repetition of binary patterns corresponding to printed or typewritten text characters. The success of such a scheme depends on the accuracy and speed of the template-matching algoritnm which decides whether one pattern may be substituted for another without loss of context or legibility. This paper proposes a template-matching algorithm whose accuracy compares favourably with that of other reported algorithms, and whose execution time is significantly faster in computer simulations. The proposed algorithm is particularly well suited to realisation in parallel hardware.
Article
Page segmentation is the process by which a scanned page is divided into columns and blocks which are then classified as halftones, graphics, or text. Past techniques have used the fact that such parts form right rectangles for most printed material. This property is not true when the page is tilted, and the heuristics based on it fail in such cases unless a rather expensive tilt angle estimation is performed. We describe a class of techniques based on smeared run length codes that divide a page into gray and nearly white parts. Segmentation is then performed by finding connected components either by the gray elements or of the white, the latter forming white streams that partition a page into blocks of printed material. Such techniques appear quite robust in the presence of severe tilt (even greater than 10 °) and are also quite fast (about a second a page on a SPARC station for gray element aggregation). Further classification into text or halftones is based mostly on properties of the across scanlines correlation. For text correlation of adjacent scanlines tends to be quite high, but then it drops rapidly. For halftones, the correlation of adjacent scanlines is usually well below that for text, but it does not change much with distance.
Article
This paper describes an intelligent forms processing system (IFPS) which provides capabilities for automatically indexing form documents for storage/retrieval to/from a document library and for capturing information from scanned form images using intelligent character recognition (ICR). The system also provides capabilities for efficiently storing form images. IFPS consists of five major processing components: (1) An interactive document analysis stage that analyzes a blank form in order to define a model of each type of form to be accepted by the system; the parameters of each model are stored in a form library. (2) A form recognition module that collects features of an input form in order to match it against one represented in the form library; the primary features used in this step are the pattern of lines defining data areas on the form. (3) A data extraction component that registers the selected model to the input form, locates data added to the form in fields of interest, and removes the data image to a separate image area. A simple mask defining the center of the data region suffices to initiate the extraction process; search routines are invoked to track data that extends beyond the masks. Other special processing is called on to detect lines that intersect the data image and to delete the lines with minimum distortion to the rest of the image. (4) An ICR unit that converts the extracted image data to symbol code for input to data base or other conventional processing systems. Three types of ICR logic have been implemented in order to accommodate monospace typing, proportionally spaced machine text, and handprinted alphanumerics. (5) A forms dropout module that removes the fixed part of a form and retains only the data filled in for storage. The stored data can be later combined with the fixed form to reconstruct the original form. This provides for extremely efficient storage of form images, thus making possible the storage of very large number of forms in the system. IFPS is implemented as part of a larger image management system called Image and Records Management system (IRM). It is being applied in forms data management in several state government applications.
Conference Paper
Variable order models coupled to an arithmetic coder have proved a successful paradigm for text compression. The author explores the usefulness of a similar scheme for binary images. An algorithm due to Langdon and Rissanen (IEEE Trans. vol.COM-6, p.158-67 of 1981) is used as test bed for an experimental investigation, and with a two-level scheme based upon a conditioning context of 18 pixels a compression gain of about 20% can be achieved. Other experiments show that, for the test data used, this compression is at most about 30% inefficient