Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Jun 17, 2019

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Jun 16, 2019

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Jun 16, 2019

Content may be subject to copyright.

Vectorized Image Partitioning

Charles Davi

June 16, 2019

Abstract

In this article, I’m going to present a low-degree polynomial runtime

image partition algorithm that can quickly and reliably partition an image

into objectively distinct regions, using only the original image as input,

without any training dataset or other exogenous information. All of the

code necessary to run the algorithm is available on my researchgate home-

page.1

1 Introduction

In a previous paper [1], A New Model of Artiﬁcial Intelligence, I presented a

series of low-degree polynomial runtime algorithms rooted in information theory

that can quickly and reliably solve a wide variety of problems in artiﬁcial intel-

ligence. The practical goal of the model is to allow extremely high-dimensional

problems in AI to be solved on small, cheap, low-energy devices. The academic

goal of the model is to show that these problems can be restated in the language

of information theory. The result is a powerful set of core algorithms that are

so eﬃcient, they can probably be embedded on a chip, and can certainly run

quickly on even the most low-end, modern consumer devices. In addition to the

possibly unprecedented eﬃciency of these algorithms, they are also very accu-

rate, consistently generating predictions and classiﬁcations with an accuracy of

over 90%.2

1I retain all rights, copyright and otherwise, to all of the algorithms, and other information

presented in this paper. In particular, the information contained in this paper may not be used

for any commercial purpose whatsoever without my prior written consent. All research notes,

algorithms, and other materials referenced in this paper are available on my researchgate

homepage, at https://www.researchgate.net/proﬁle/Charles Davi, under the project heading,

Information Theory.

2For a demonstration of the algorithms as applied to the UCI and MNIST datasets, see,

A New Model of Artiﬁcial Intelligence: Application to Data I and A New Model of Artiﬁcial

Intelligence: Application to Data II, respectively.

1

All of the algorithms presented in [1] operate using assumptions rooted in

information theory. Speciﬁcally, I showed in [1] that information theory allows

us to make useful assumptions, and form reasonable expectations, about images

generally, and datasets generally, even in the absence of exogenous information

about the particular image, or dataset, being analyzed. Though my model

of AI is of course capable of making use of training data, and learning, the

core algorithms are all designed using these assumptions to operate without

any prior data. Speciﬁcally, the image partition algorithm I presented in [1] can

autonomously subdivide an image into visually distinct regions, often identifying

entire objects, and distinguish between background and foreground, all without

any training data or other prior information (i.e., using only the original image

as input). Similarly, the categorization algorithm I presented in [1] can take

an arbitrary dataset of vectors, and autonomously produce categorizations that

are consistent with hidden classiﬁcation data.

The algorithm I’ll present in this note is similar in purpose and design to the

image partition algorithm I presented in [1]. However, the algorithm I’ll present

in this note generates partitions using set intersections, rather than Euclidean

distance. That is, the algorithm presented in [1] calculates a Euclidean measure

of diﬀerence between two regions in an image, whereas the algorithm I’ll present

in this paper calculates the intersection of the sets of colors contained in two

regions. Additionally, the algorithm I’ll present in this note makes use of a new

measure of complexity that can be thought of as a geometric entropy that varies

as a function of both the position and distribution of the colors in an image.3

2 Entropy, Complexity, and Color

The entropy of an image is typically measured by counting the frequency of each

color in the image, and then measuring the Shannon entropy of the resultant

distribution, given by the following celebrated equation due to Claude Shannon:

H=

n

X

i=1

pilog( 1

pi

),(1)

where piis the frequency of color i, and nis the total number of colors in

the image.4As a result, the entropy of an image is determined entirely by the

distribution of colors in the image, without regard for the positions of the colors.

For example, Figure 1 shows a photograph that I took in Williamsburg,

3Note that, ordinarily, the entropy of an image depends upon only the distribution of the

colors in the image, and is therefore invariant with respect to the positions of the colors in

the image.

4C. E. Shannon, A Mathematical Theory of Communication.

2

Brooklyn, of a small red wall, together with another image generated by subdi-

viding the original photograph into 100 equally-sized rectangular regions, and

then swapping 15 randomly selected pairs of regions. Intuitively, the scrambled

image on the right is more visually complex than the original image on the left,

despite the fact that both images have the same entropy.5Therefore, it is nec-

essarily the case that we cannot measure this type of visual complexity using

the Shannon entropy alone.

Figure 1: A photo of a wall, and that photo after being scrambled.

This intuition points to the more general fact that the Shannon entropy is

not a measure of complexity. As originally stated, Equation (1) says that if a

source generates signal iwith probability pi, then His the minimum average

number of bits per signal required to encode the source. Stated diﬀerently, since

log( 1

pi) is the code length for signal i, and piis the probability of signal i, the

entropy of a source is, therefore, the expected number of bits necessary to encode

one signal generated by the source. Shannon showed that it is not possible to

encode a source using fewer bits on average without losing information.6

An image is a static artifact, and is, therefore, not a source that generates

signals over time. Nonetheless, the colors in an image have frequencies, which

allows us to deﬁne a distribution on the colors in the image. We can then use

Equation (1) to calculate the entropy of that distribution. Interpreted literally,

the entropy of an image is the minimum average number of bits necessary to

encode a single color in the image. If all colors have equal frequencies, then

the entropy of the image will be maximized. If instead a single color has a

5Using Octave, the entropy for both images is 7.5552 bits.

6If we consider more general opportunities for compression that take into account patterns

generated by a source, and not just the probability distribution of the signals, then it is

possible to encode a source using an average number of bits per signal that is less than H.

3

high frequency, then the image will have a lower entropy. For any uniform

distribution of colors, the greater the number of colors, the greater the entropy

of the image.

As a general matter, knowing the entropy of an image does not provide any

information about the visual complexity of the image. This is demonstrated by

the images in Figure 1, which, despite having very diﬀerent visual complexities,

nonetheless produce the same measure of entropy. However, as the entropy of

an image approaches 0, there’s a practical limit to how complex the image can

be. If the entropy of an image is exactly 0, then it consists of a single color,

which implies no visual complexity at all (i.e., it’s just a single, solid color). If an

image consists almost entirely of a single color, then the relatively small number

of pixels left over probably won’t be noticeable, suggesting any near-zero entropy

image will probably not be very visually complex. As a result, the concepts of

entropy and complexity are not totally independent, but for entropies suﬃciently

far from 0, knowing the entropy of an image tells us basically nothing about the

visual complexity of the image.

The distinction between entropy and complexity is not unique to images.

For example, the string s=abcabcabc has a uniform distribution of characters,

and therefore, the entropy of sis log(3), which is the maximum possible entropy

for a string on a three character alphabet. However, it’s obviously not complex

or random in any reasonable sense, since it has a simple structure (i.e., s=

(abc)n, for n= 3). Therefore, our intuitive understanding of complexity and

randomness must depend upon something other than the mere distribution of

symbols in a string, or colors in an image.

The more appropriate measure of randomness is the Kolmogorov com-

plexity of s, denoted K(s), which is the length of the shortest program (mea-

sured in bits) that generates son a Turing Machine. A string sis Kolmogorov-

random if K(s) = |s|+C, where Cis a constant that does not depend upon

s. We know that such a constant Cmust exist since every Turing Machine can

run a program that simply prints its input on the output tape. As a result, C

cannot be larger than the length of this “print” program. Note that if a string

is Kolmogorov-random, then it cannot be compressed. That is, a string sis

Kolmogorov-random if the shortest program that generates sis approximately

the same length as sitself.7

We can use the Kolmogorov complexity to justify our intuition that as we

scramble an image, we obviously increase its complexity, even though its entropy

is unchanged. Speciﬁcally, the structures of the objects in an image present op-

portunities for compression, and when we scramble an image, we destroy its

structure, suggesting that we probably also destroy opportunities for compres-

sion. For example, if an image is perfectly symmetrical about its vertical axis,

7For a more fulsome discussion of this topic, and its applications to physics, see my paper,

A Computational Model of Time-Dilation.

4

then we need only half the image to reproduce the entire image. That is, we

can write a program that mirrors the left or right half of an input image, and

therefore, the Kolmogorov complexity of a perfectly symmetrical image can-

not be greater than half the pixel information, plus a constant (i.e., the length

of the “mirror” program).8It turns out that the Kolmogorov complexity is

not computable, so we cannot, as a general matter, conﬁrm that scrambling a

given image increases its Kolmogorov complexity. However, the concept is still

useful, since it justiﬁes our intuition that destroying structure increases com-

plexity, which the Kolmogorov complexity formalizes by equating complexity

with compression. Speciﬁcally, since scrambling an image destroys structure,

and structure presents opportunities for compression, we can form the reason-

able expectation that scrambling an image will increase the actual mathematical

complexity of the image (i.e., its Kolmogorov complexity).

Figure 2: The center region of the photo in Figure 1, before and after scrambling the entire

image.

Scrambling an image will probably also reduce the local color consistency

of the image. That is, because objects in the real world are generally roughly

consistent in color over small distances, it follows that if we permute the pixels

in an image, we’re going to randomize the locations of the colors in the image,

thereby likely destroying this local color consistency. This implies that as we

scramble an image, we should expect to increase the local complexity of the

image. For example, we can say that the wall in Figure 2 is “red”, even though it

contains 41,542 unique colors. After scrambling the image, that same region can

no longer be reasonably described using a single color, and the actual number of

colors within the region increases to 47,471 unique colors. Therefore, scrambling

the image increases the amount of information that is required to informally

describe, and formally represent, the colors in the region. Moreover, if we

8In fact, the Kolmogorov complexity of a perfectly symmetrical image cannot be greater

than the Kolmogorov complexity of half the image plus a constant, since we can ﬁrst run the

shortest program that produces half of the image, and then apply the mirror program to that

output. That is, we take half of the image and maximally compress it, knowing that we can

unpack just that half and reproduce the entire original image.

5

consider just this region, and not the entire image, then our measure of entropy

does change, since the distribution of colors within the region changes as a result

of the scrambling, increasing from 7.0810 bits to 7.2911 bits.9

As a general matter, if an image is locally color consistent, then it is, there-

fore, reasonable to expect that scrambling the image will increase the amount

of information necessary to describe or represent the colors in any given region

in the image. As such, the local color consistency of an image should decrease

as we scramble an image, and its Kolmogorov complexity should increase. As

a result, if we can measure the local color consistency of an image, it should

serve as a reasonable heuristic for the Kolmogorov complexity of the image. As

noted, the Kolmogorov complexity is not computable, but as I show below, we

can quickly measure local color consistency using vectorized processes. How

accurately this measure actually approximates the Kolmogorov complexity is

an academic question that I do not address any further, since in any case, the

concept of local color consistency certainly allows us to ﬁnd the boundaries of

objects in an image.10

3 Vectorized Image Partitioning

3.1 Generating Regions

The ﬁrst step of this image partition algorithm is to call a function I introduced

in Section 2.3 of [1] that subdivides an image into a set of equally-sized rect-

angular regions. These regions are sized to maximize the standard deviation of

the entropies of the regions generated, which will cause each region to contain

maximally diﬀerent amounts of color information. This generally results in the

grid imposed upon the image by the regions being positioned tightly around the

actual macroscopic boundaries of the objects in the image.

Figure 3 shows the results of this function as applied to the photograph of

the red wall, with the average color of each region shown as a visual aid to

outline the regions generated by the function. The regions generated by this

function form the basis of all of the analysis that follows, and allow us to analyze

an image in terms of a small number of regions, as opposed to a large number

of pixels.

9The image partition algorithm I presented in [1] begins by calculating the entropies of

rectangular regions in the input image, resulting in a measure that is sensitive to both the

position and distribution of the colors in an image. See Section 2 of [1] generally.

10In a previous note, I showed that this measure of local color consistency is so powerful,

that simply applying a brute-force algorithm that maximizes the measure can reassemble a

scrambled image to its original state (or close to it), using only the scrambled image as input.

See, “Reassembling a Scrambled Image with No Prior Information”.

6

Figure 3: The original photo, together with the average color of each region generated by the

initial step of the algorithm.

3.2 δ-Intersection

At a high level, the partition algorithm operates by calculating the intersection

between the sets of colors contained in diﬀerent regions generated during the

initial step described above. However, rather than make use of ordinary set

intersection, the algorithm uses a method I developed called δ-intersection,

which, rather than test for equality, tests whether the norm of the diﬀerence

between two vectors is less than some value δ. If so, then the two vectors are

treated as “the same” for purposes of calculating the intersection between the

two sets that contain the vectors in question. For example, if A={10,20,30},

B={11,23,50}, and δ= 4, then |A∩δB|= 2. That is, 10 and 11 constitute a

match, since 11−10 < δ, and 23 and 20 also constitute a match, since 23−20 < δ,

whereas 30 and 50 do not constitute a match, since 50 −30 > δ.

This value of δis optimized using the same methods that I introduced in

Section 2.7 of [1], which make use of information theory, producing a context-

dependent level of distinction that allows us to say whether any two given vec-

tors are close enough to be considered the same. In this case, we’re using this

method to compare color vectors and ask whether two color vectors are suﬃ-

ciently similar to be considered the same in the context of the image as a whole.

Though not addressed in this paper, this method also allows us to generate

categorizations on sets of vectors.

3.3 Delimiting Boundaries

After generating the rectangular regions described above, the next step of the

algorithm is to call a function that “reads” the regions from left to right, and top

7

to bottom, calculating the δ-intersection between adjacent regions, and marks

the likely positions of any boundaries in the image based upon the rate of change

in the intersection count between adjacent regions. For example, beginning with

the top left (1,1) region of the image, which is mostly white, we calculate the δ-

intersection between that region and the region to its immediate right, which is

also mostly white. As we transition from left to right, we’re going to eventually

reach regions that have a darker tone, causing the intersection count to start

to drop oﬀ. As you can see, regions (1,3) and (1,4) have signiﬁcantly diﬀerent

coloring, which is going to cause the δ-intersection count to suddenly drop oﬀ,

suggesting that there’s a boundary, which is actually the case.

Figure 4: The original photo, the average color of each region, and the boundaries identiﬁed

by the boundary detection function.

This process will produce an intersection count for each region in the original

image, thereby producing a matrix of integers. Taking the sum over this matrix

produces a measure of local color consistency. We can use this measure to

express the visual complexity generated by scrambling an image, and Figure 5

plots this measure as a function of the number of pairs of regions swapped, using

the original image of the red wall as its input, with the horizontal axis showing

the number of pairs swapped, and the vertical axis showing the measure of local

color consistency.11 The graph reﬂects the fact that permuting the regions of

an image destroys local color consistency, decreasing the intersection counts in

the matrix, thereby reducing the sum over the entire matrix.

This matrix also serves as the basis for the boundary detection function, and

the image partition algorithm itself, each of which searches for anomalously large

changes in the intersection counts stored in the matrix, suggesting a boundary.

Because the δ-intersection operator can be vectorized by representing sets as

matrices, this matrix can be generated very quickly. To see how the opera-

11This is the measure maximized by the algorithm I presented in, “Reassembling a Scram-

bled Image with No Prior Information”.

8

tor is vectorized, see the functions, “generate-lefthand-righthand-matrices” and

“calculate-delta-intersection” in my code bin.

Figure 5: A graph showing local color consistency as a function of the number of swaps.

3.4 Minimum Suﬃcient Diﬀerence

All of my work so far in artiﬁcial intelligence has made use of ﬁnite diﬀerences

to construct categories, and more generally, distinguish between objects. For

example, the value of δabove that we use to distinguish between color vectors is

a ﬁxed value that is intended to be used as a limit on the diﬀerence between two

vectors, above which, we distinguish. I’ve coined the term minimum suﬃcient

diﬀerence to describe this concept of δgenerally. That is, in this case, δis the

minimum suﬃcient diﬀerence between two color vectors necessary to distinguish

between the colors. In simple terms, if the diﬀerence between two colors is more

than δ, then they’re not the same in the context of the image, and their diﬀerence

exceeds the minimum suﬃcient diﬀerence for distinction in that context.

However, we need a second minimum suﬃcient diﬀerence for “reading” an

image from left to right, since in that case, we’re comparing intersection counts,

not color vectors. That is, ﬁrst we calculate the δ-intersection between neigh-

boring regions in an image, but afterwards, we need a second minimum suﬃcient

diﬀerence to say what amount of change in intersection count justiﬁes marking

a boundary between two regions. Moreover, there might not be a single mini-

mum suﬃcient diﬀerence between intersection counts capable of identifying all

of the boundaries in a given image. As a simple example, consider the following

sequence of integers:

9

1,2,5,107,210,250.

Let’s pick a ﬁxed value of δ= 6. Reading this sequence from left to right, and

calculating the diﬀerence between adjacent entries, we would place delimiters

as follows:

1,2,5||107||210||250.

If these numbers represent the intersection counts between neighboring re-

gions in an image, then this partition is probably wrong, since the numbers 107,

210, and 250, probably all correspond to a single, new region that begins at 107.

That is, the correct partition is probably the following:

1,2,5||107,210,250.

This partition cannot be produced using a ﬁxed ﬁnite diﬀerence. Speciﬁcally,

since 5 and 107 are categorized separately, it must be the case that 107 −5 =

102 > δ. Because 107 and 210 are categorized together, it must be the case that

210 −107 = 103 < δ. But obviously, it cannot be the case that δ < 102 and

δ > 103. Nonetheless, we might need to produce this partition, so as a result,

the boundary detection function makes use of a ratio test, rather than a ﬁnite

diﬀerence test. Speciﬁcally, it tests the ratio between the intersection counts of

neighboring regions.

Continuing with the sequence of integers above, we would calculate the ratio

between 1 and 2 (.5), 2 and 5 (.4), 5 and 107 (.0467), 107 and 210 (.5095), and

210 and 250 (.84). Using this approach, we can ﬁx a minimum ratio of ∆ = .4,

which will cause us to draw a boundary at the right place, between 5 and 107.

This value of ∆ is optimized by the boundary detection function using the same

methods that I presented in Section 2.7 of [1], though because this function uses

a ratio test, the equations are slightly diﬀerent than the equations presented in

[1].12 Applying this approach to the original image of the red wall generates the

boundaries in Figure 4.

3.5 Gathering Contiguous Features

The last step of the partition algorithm is to assemble the rectangular regions

generated in the ﬁrst step into larger, contiguous features that should correspond

to macroscopic objects in the image. This is done by a “crawler” algorithm that

12See the function, “delimit-image” in my code bin.

10

begins with a given region, and looks to each of its neighboring regions (up,

down, left, and right), testing the rate of change in the ratio of the intersection

counts for each region. If the rate of change doesn’t imply a boundary between

the original region and its neighbor, then the neighbor is added to a queue.

Every item in the queue is then similarly tested, with any neighbors of those

neighbors being added to the queue if they constitute a match with the original

region (i.e., the new region is added to the queue if, when compared to the

original region, the transition from the original region to the new region would

not imply that a boundary exists between them). This continues until the queue

is exhausted, at which point the algorithm selects a new initial region, repeating

this process until every region in the image has been assigned to a feature. That

is, the algorithm crawls the image from some initial region in every direction,

until it runs out of regions that constitute a match with the initial region, and

then selects a new initial region, repeating this process until all regions are

part of a feature. Because the δ-intersection calculation is vectorized, and the

number of regions is usually between 100 and 144, this can be done very quickly.

Figure 6: The original photo, together with contiguous features identiﬁed by the algorithm.

11

Figure 7: The original photo, together with contiguous features identiﬁed by the algorithm.

The actual product generated by this process is a matrix that contains nu-

merical labels for each region in the image, which I call the region matrix. If

two regions have the same label in the region matrix, then they are part of the

same, larger contiguous feature.13 The region matrix makes it easy to extract

the contiguous features identiﬁed by the partition algorithm using vectorized

processes, which allows for further processing of the image. Figure 6 shows three

features identiﬁed by the partition algorithm as applied to the photograph of the

red wall, and Figure 7 shows three features identiﬁed by the partition algorithm

as applied to a photograph I took in Copenhagen, Denmark.

13Though the image partition algorithm I presented in [1] is distinct, it also produces a

region matrix using a similar crawler algorithm, which I describe in some detail in Section 2.7

of [1].

12