PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this article, I'm going to present a low-degree polynomial runtime image partition algorithm that can quickly and reliably partition an image into objectively distinct regions, using only the original image as input, without any training dataset or other exogenous information. All of the code necessary to run the algorithm is available on my researchgate homepage.
Content may be subject to copyright.
Vectorized Image Partitioning
Charles Davi
June 16, 2019
Abstract
In this article, I’m going to present a low-degree polynomial runtime
image partition algorithm that can quickly and reliably partition an image
into objectively distinct regions, using only the original image as input,
without any training dataset or other exogenous information. All of the
code necessary to run the algorithm is available on my researchgate home-
page.1
1 Introduction
In a previous paper [1], A New Model of Artificial Intelligence, I presented a
series of low-degree polynomial runtime algorithms rooted in information theory
that can quickly and reliably solve a wide variety of problems in artificial intel-
ligence. The practical goal of the model is to allow extremely high-dimensional
problems in AI to be solved on small, cheap, low-energy devices. The academic
goal of the model is to show that these problems can be restated in the language
of information theory. The result is a powerful set of core algorithms that are
so efficient, they can probably be embedded on a chip, and can certainly run
quickly on even the most low-end, modern consumer devices. In addition to the
possibly unprecedented efficiency of these algorithms, they are also very accu-
rate, consistently generating predictions and classifications with an accuracy of
over 90%.2
1I retain all rights, copyright and otherwise, to all of the algorithms, and other information
presented in this paper. In particular, the information contained in this paper may not be used
for any commercial purpose whatsoever without my prior written consent. All research notes,
algorithms, and other materials referenced in this paper are available on my researchgate
homepage, at https://www.researchgate.net/profile/Charles Davi, under the project heading,
Information Theory.
2For a demonstration of the algorithms as applied to the UCI and MNIST datasets, see,
A New Model of Artificial Intelligence: Application to Data I and A New Model of Artificial
Intelligence: Application to Data II, respectively.
1
All of the algorithms presented in [1] operate using assumptions rooted in
information theory. Specifically, I showed in [1] that information theory allows
us to make useful assumptions, and form reasonable expectations, about images
generally, and datasets generally, even in the absence of exogenous information
about the particular image, or dataset, being analyzed. Though my model
of AI is of course capable of making use of training data, and learning, the
core algorithms are all designed using these assumptions to operate without
any prior data. Specifically, the image partition algorithm I presented in [1] can
autonomously subdivide an image into visually distinct regions, often identifying
entire objects, and distinguish between background and foreground, all without
any training data or other prior information (i.e., using only the original image
as input). Similarly, the categorization algorithm I presented in [1] can take
an arbitrary dataset of vectors, and autonomously produce categorizations that
are consistent with hidden classification data.
The algorithm I’ll present in this note is similar in purpose and design to the
image partition algorithm I presented in [1]. However, the algorithm I’ll present
in this note generates partitions using set intersections, rather than Euclidean
distance. That is, the algorithm presented in [1] calculates a Euclidean measure
of difference between two regions in an image, whereas the algorithm I’ll present
in this paper calculates the intersection of the sets of colors contained in two
regions. Additionally, the algorithm I’ll present in this note makes use of a new
measure of complexity that can be thought of as a geometric entropy that varies
as a function of both the position and distribution of the colors in an image.3
2 Entropy, Complexity, and Color
The entropy of an image is typically measured by counting the frequency of each
color in the image, and then measuring the Shannon entropy of the resultant
distribution, given by the following celebrated equation due to Claude Shannon:
H=
n
X
i=1
pilog( 1
pi
),(1)
where piis the frequency of color i, and nis the total number of colors in
the image.4As a result, the entropy of an image is determined entirely by the
distribution of colors in the image, without regard for the positions of the colors.
For example, Figure 1 shows a photograph that I took in Williamsburg,
3Note that, ordinarily, the entropy of an image depends upon only the distribution of the
colors in the image, and is therefore invariant with respect to the positions of the colors in
the image.
4C. E. Shannon, A Mathematical Theory of Communication.
2
Brooklyn, of a small red wall, together with another image generated by subdi-
viding the original photograph into 100 equally-sized rectangular regions, and
then swapping 15 randomly selected pairs of regions. Intuitively, the scrambled
image on the right is more visually complex than the original image on the left,
despite the fact that both images have the same entropy.5Therefore, it is nec-
essarily the case that we cannot measure this type of visual complexity using
the Shannon entropy alone.
Figure 1: A photo of a wall, and that photo after being scrambled.
This intuition points to the more general fact that the Shannon entropy is
not a measure of complexity. As originally stated, Equation (1) says that if a
source generates signal iwith probability pi, then His the minimum average
number of bits per signal required to encode the source. Stated differently, since
log( 1
pi) is the code length for signal i, and piis the probability of signal i, the
entropy of a source is, therefore, the expected number of bits necessary to encode
one signal generated by the source. Shannon showed that it is not possible to
encode a source using fewer bits on average without losing information.6
An image is a static artifact, and is, therefore, not a source that generates
signals over time. Nonetheless, the colors in an image have frequencies, which
allows us to define a distribution on the colors in the image. We can then use
Equation (1) to calculate the entropy of that distribution. Interpreted literally,
the entropy of an image is the minimum average number of bits necessary to
encode a single color in the image. If all colors have equal frequencies, then
the entropy of the image will be maximized. If instead a single color has a
5Using Octave, the entropy for both images is 7.5552 bits.
6If we consider more general opportunities for compression that take into account patterns
generated by a source, and not just the probability distribution of the signals, then it is
possible to encode a source using an average number of bits per signal that is less than H.
3
high frequency, then the image will have a lower entropy. For any uniform
distribution of colors, the greater the number of colors, the greater the entropy
of the image.
As a general matter, knowing the entropy of an image does not provide any
information about the visual complexity of the image. This is demonstrated by
the images in Figure 1, which, despite having very different visual complexities,
nonetheless produce the same measure of entropy. However, as the entropy of
an image approaches 0, there’s a practical limit to how complex the image can
be. If the entropy of an image is exactly 0, then it consists of a single color,
which implies no visual complexity at all (i.e., it’s just a single, solid color). If an
image consists almost entirely of a single color, then the relatively small number
of pixels left over probably won’t be noticeable, suggesting any near-zero entropy
image will probably not be very visually complex. As a result, the concepts of
entropy and complexity are not totally independent, but for entropies sufficiently
far from 0, knowing the entropy of an image tells us basically nothing about the
visual complexity of the image.
The distinction between entropy and complexity is not unique to images.
For example, the string s=abcabcabc has a uniform distribution of characters,
and therefore, the entropy of sis log(3), which is the maximum possible entropy
for a string on a three character alphabet. However, it’s obviously not complex
or random in any reasonable sense, since it has a simple structure (i.e., s=
(abc)n, for n= 3). Therefore, our intuitive understanding of complexity and
randomness must depend upon something other than the mere distribution of
symbols in a string, or colors in an image.
The more appropriate measure of randomness is the Kolmogorov com-
plexity of s, denoted K(s), which is the length of the shortest program (mea-
sured in bits) that generates son a Turing Machine. A string sis Kolmogorov-
random if K(s) = |s|+C, where Cis a constant that does not depend upon
s. We know that such a constant Cmust exist since every Turing Machine can
run a program that simply prints its input on the output tape. As a result, C
cannot be larger than the length of this “print” program. Note that if a string
is Kolmogorov-random, then it cannot be compressed. That is, a string sis
Kolmogorov-random if the shortest program that generates sis approximately
the same length as sitself.7
We can use the Kolmogorov complexity to justify our intuition that as we
scramble an image, we obviously increase its complexity, even though its entropy
is unchanged. Specifically, the structures of the objects in an image present op-
portunities for compression, and when we scramble an image, we destroy its
structure, suggesting that we probably also destroy opportunities for compres-
sion. For example, if an image is perfectly symmetrical about its vertical axis,
7For a more fulsome discussion of this topic, and its applications to physics, see my paper,
A Computational Model of Time-Dilation.
4
then we need only half the image to reproduce the entire image. That is, we
can write a program that mirrors the left or right half of an input image, and
therefore, the Kolmogorov complexity of a perfectly symmetrical image can-
not be greater than half the pixel information, plus a constant (i.e., the length
of the “mirror” program).8It turns out that the Kolmogorov complexity is
not computable, so we cannot, as a general matter, confirm that scrambling a
given image increases its Kolmogorov complexity. However, the concept is still
useful, since it justifies our intuition that destroying structure increases com-
plexity, which the Kolmogorov complexity formalizes by equating complexity
with compression. Specifically, since scrambling an image destroys structure,
and structure presents opportunities for compression, we can form the reason-
able expectation that scrambling an image will increase the actual mathematical
complexity of the image (i.e., its Kolmogorov complexity).
Figure 2: The center region of the photo in Figure 1, before and after scrambling the entire
image.
Scrambling an image will probably also reduce the local color consistency
of the image. That is, because objects in the real world are generally roughly
consistent in color over small distances, it follows that if we permute the pixels
in an image, we’re going to randomize the locations of the colors in the image,
thereby likely destroying this local color consistency. This implies that as we
scramble an image, we should expect to increase the local complexity of the
image. For example, we can say that the wall in Figure 2 is “red”, even though it
contains 41,542 unique colors. After scrambling the image, that same region can
no longer be reasonably described using a single color, and the actual number of
colors within the region increases to 47,471 unique colors. Therefore, scrambling
the image increases the amount of information that is required to informally
describe, and formally represent, the colors in the region. Moreover, if we
8In fact, the Kolmogorov complexity of a perfectly symmetrical image cannot be greater
than the Kolmogorov complexity of half the image plus a constant, since we can first run the
shortest program that produces half of the image, and then apply the mirror program to that
output. That is, we take half of the image and maximally compress it, knowing that we can
unpack just that half and reproduce the entire original image.
5
consider just this region, and not the entire image, then our measure of entropy
does change, since the distribution of colors within the region changes as a result
of the scrambling, increasing from 7.0810 bits to 7.2911 bits.9
As a general matter, if an image is locally color consistent, then it is, there-
fore, reasonable to expect that scrambling the image will increase the amount
of information necessary to describe or represent the colors in any given region
in the image. As such, the local color consistency of an image should decrease
as we scramble an image, and its Kolmogorov complexity should increase. As
a result, if we can measure the local color consistency of an image, it should
serve as a reasonable heuristic for the Kolmogorov complexity of the image. As
noted, the Kolmogorov complexity is not computable, but as I show below, we
can quickly measure local color consistency using vectorized processes. How
accurately this measure actually approximates the Kolmogorov complexity is
an academic question that I do not address any further, since in any case, the
concept of local color consistency certainly allows us to find the boundaries of
objects in an image.10
3 Vectorized Image Partitioning
3.1 Generating Regions
The first step of this image partition algorithm is to call a function I introduced
in Section 2.3 of [1] that subdivides an image into a set of equally-sized rect-
angular regions. These regions are sized to maximize the standard deviation of
the entropies of the regions generated, which will cause each region to contain
maximally different amounts of color information. This generally results in the
grid imposed upon the image by the regions being positioned tightly around the
actual macroscopic boundaries of the objects in the image.
Figure 3 shows the results of this function as applied to the photograph of
the red wall, with the average color of each region shown as a visual aid to
outline the regions generated by the function. The regions generated by this
function form the basis of all of the analysis that follows, and allow us to analyze
an image in terms of a small number of regions, as opposed to a large number
of pixels.
9The image partition algorithm I presented in [1] begins by calculating the entropies of
rectangular regions in the input image, resulting in a measure that is sensitive to both the
position and distribution of the colors in an image. See Section 2 of [1] generally.
10In a previous note, I showed that this measure of local color consistency is so powerful,
that simply applying a brute-force algorithm that maximizes the measure can reassemble a
scrambled image to its original state (or close to it), using only the scrambled image as input.
See, “Reassembling a Scrambled Image with No Prior Information”.
6
Figure 3: The original photo, together with the average color of each region generated by the
initial step of the algorithm.
3.2 δ-Intersection
At a high level, the partition algorithm operates by calculating the intersection
between the sets of colors contained in different regions generated during the
initial step described above. However, rather than make use of ordinary set
intersection, the algorithm uses a method I developed called δ-intersection,
which, rather than test for equality, tests whether the norm of the difference
between two vectors is less than some value δ. If so, then the two vectors are
treated as “the same” for purposes of calculating the intersection between the
two sets that contain the vectors in question. For example, if A={10,20,30},
B={11,23,50}, and δ= 4, then |AδB|= 2. That is, 10 and 11 constitute a
match, since 1110 < δ, and 23 and 20 also constitute a match, since 2320 < δ,
whereas 30 and 50 do not constitute a match, since 50 30 > δ.
This value of δis optimized using the same methods that I introduced in
Section 2.7 of [1], which make use of information theory, producing a context-
dependent level of distinction that allows us to say whether any two given vec-
tors are close enough to be considered the same. In this case, we’re using this
method to compare color vectors and ask whether two color vectors are suffi-
ciently similar to be considered the same in the context of the image as a whole.
Though not addressed in this paper, this method also allows us to generate
categorizations on sets of vectors.
3.3 Delimiting Boundaries
After generating the rectangular regions described above, the next step of the
algorithm is to call a function that “reads” the regions from left to right, and top
7
to bottom, calculating the δ-intersection between adjacent regions, and marks
the likely positions of any boundaries in the image based upon the rate of change
in the intersection count between adjacent regions. For example, beginning with
the top left (1,1) region of the image, which is mostly white, we calculate the δ-
intersection between that region and the region to its immediate right, which is
also mostly white. As we transition from left to right, we’re going to eventually
reach regions that have a darker tone, causing the intersection count to start
to drop off. As you can see, regions (1,3) and (1,4) have significantly different
coloring, which is going to cause the δ-intersection count to suddenly drop off,
suggesting that there’s a boundary, which is actually the case.
Figure 4: The original photo, the average color of each region, and the boundaries identified
by the boundary detection function.
This process will produce an intersection count for each region in the original
image, thereby producing a matrix of integers. Taking the sum over this matrix
produces a measure of local color consistency. We can use this measure to
express the visual complexity generated by scrambling an image, and Figure 5
plots this measure as a function of the number of pairs of regions swapped, using
the original image of the red wall as its input, with the horizontal axis showing
the number of pairs swapped, and the vertical axis showing the measure of local
color consistency.11 The graph reflects the fact that permuting the regions of
an image destroys local color consistency, decreasing the intersection counts in
the matrix, thereby reducing the sum over the entire matrix.
This matrix also serves as the basis for the boundary detection function, and
the image partition algorithm itself, each of which searches for anomalously large
changes in the intersection counts stored in the matrix, suggesting a boundary.
Because the δ-intersection operator can be vectorized by representing sets as
matrices, this matrix can be generated very quickly. To see how the opera-
11This is the measure maximized by the algorithm I presented in, “Reassembling a Scram-
bled Image with No Prior Information”.
8
tor is vectorized, see the functions, “generate-lefthand-righthand-matrices” and
“calculate-delta-intersection” in my code bin.
Figure 5: A graph showing local color consistency as a function of the number of swaps.
3.4 Minimum Sufficient Difference
All of my work so far in artificial intelligence has made use of finite differences
to construct categories, and more generally, distinguish between objects. For
example, the value of δabove that we use to distinguish between color vectors is
a fixed value that is intended to be used as a limit on the difference between two
vectors, above which, we distinguish. I’ve coined the term minimum sufficient
difference to describe this concept of δgenerally. That is, in this case, δis the
minimum sufficient difference between two color vectors necessary to distinguish
between the colors. In simple terms, if the difference between two colors is more
than δ, then they’re not the same in the context of the image, and their difference
exceeds the minimum sufficient difference for distinction in that context.
However, we need a second minimum sufficient difference for “reading” an
image from left to right, since in that case, we’re comparing intersection counts,
not color vectors. That is, first we calculate the δ-intersection between neigh-
boring regions in an image, but afterwards, we need a second minimum sufficient
difference to say what amount of change in intersection count justifies marking
a boundary between two regions. Moreover, there might not be a single mini-
mum sufficient difference between intersection counts capable of identifying all
of the boundaries in a given image. As a simple example, consider the following
sequence of integers:
9
1,2,5,107,210,250.
Let’s pick a fixed value of δ= 6. Reading this sequence from left to right, and
calculating the difference between adjacent entries, we would place delimiters
as follows:
1,2,5||107||210||250.
If these numbers represent the intersection counts between neighboring re-
gions in an image, then this partition is probably wrong, since the numbers 107,
210, and 250, probably all correspond to a single, new region that begins at 107.
That is, the correct partition is probably the following:
1,2,5||107,210,250.
This partition cannot be produced using a fixed finite difference. Specifically,
since 5 and 107 are categorized separately, it must be the case that 107 5 =
102 > δ. Because 107 and 210 are categorized together, it must be the case that
210 107 = 103 < δ. But obviously, it cannot be the case that δ < 102 and
δ > 103. Nonetheless, we might need to produce this partition, so as a result,
the boundary detection function makes use of a ratio test, rather than a finite
difference test. Specifically, it tests the ratio between the intersection counts of
neighboring regions.
Continuing with the sequence of integers above, we would calculate the ratio
between 1 and 2 (.5), 2 and 5 (.4), 5 and 107 (.0467), 107 and 210 (.5095), and
210 and 250 (.84). Using this approach, we can fix a minimum ratio of ∆ = .4,
which will cause us to draw a boundary at the right place, between 5 and 107.
This value of ∆ is optimized by the boundary detection function using the same
methods that I presented in Section 2.7 of [1], though because this function uses
a ratio test, the equations are slightly different than the equations presented in
[1].12 Applying this approach to the original image of the red wall generates the
boundaries in Figure 4.
3.5 Gathering Contiguous Features
The last step of the partition algorithm is to assemble the rectangular regions
generated in the first step into larger, contiguous features that should correspond
to macroscopic objects in the image. This is done by a “crawler” algorithm that
12See the function, “delimit-image” in my code bin.
10
begins with a given region, and looks to each of its neighboring regions (up,
down, left, and right), testing the rate of change in the ratio of the intersection
counts for each region. If the rate of change doesn’t imply a boundary between
the original region and its neighbor, then the neighbor is added to a queue.
Every item in the queue is then similarly tested, with any neighbors of those
neighbors being added to the queue if they constitute a match with the original
region (i.e., the new region is added to the queue if, when compared to the
original region, the transition from the original region to the new region would
not imply that a boundary exists between them). This continues until the queue
is exhausted, at which point the algorithm selects a new initial region, repeating
this process until every region in the image has been assigned to a feature. That
is, the algorithm crawls the image from some initial region in every direction,
until it runs out of regions that constitute a match with the initial region, and
then selects a new initial region, repeating this process until all regions are
part of a feature. Because the δ-intersection calculation is vectorized, and the
number of regions is usually between 100 and 144, this can be done very quickly.
Figure 6: The original photo, together with contiguous features identified by the algorithm.
11
Figure 7: The original photo, together with contiguous features identified by the algorithm.
The actual product generated by this process is a matrix that contains nu-
merical labels for each region in the image, which I call the region matrix. If
two regions have the same label in the region matrix, then they are part of the
same, larger contiguous feature.13 The region matrix makes it easy to extract
the contiguous features identified by the partition algorithm using vectorized
processes, which allows for further processing of the image. Figure 6 shows three
features identified by the partition algorithm as applied to the photograph of the
red wall, and Figure 7 shows three features identified by the partition algorithm
as applied to a photograph I took in Copenhagen, Denmark.
13Though the image partition algorithm I presented in [1] is distinct, it also produces a
region matrix using a similar crawler algorithm, which I describe in some detail in Section 2.7
of [1].
12
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.