Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Feb 09, 2019

Content may be subject to copyright.

A New Model of Artiﬁcial Intelligence:

Application to Data I

Charles Davi

February 9, 2019

Abstract

In this article, I’m going to apply the new, polynomial time model

of artiﬁcial intelligence that I’ve developed to four well-known datasets

from the UCI Machine Learning Repository. For each of the four clas-

siﬁcation problems, the categorizations and predictions generated by the

algorithms were generated on an unsupervised basis. Over the four clas-

siﬁcation problems, the categorization algorithm had an average success

rate of 92.833%, where success is measured by the percentage of categories

that are consistent with the hidden classiﬁcation data. Over the four clas-

siﬁcation problems, the prediction algorithm had an average success rate

of 93.497%, where success is measured by the percentage of predictions

that are consistent with the hidden classiﬁcation data. All of the code

necessary to run these algorithms, and apply them to the training data,

is available on my researchgate homepage.1

1 Introduction

In a previous working paper,2I introduced a new model of artiﬁcial intelligence

rooted in information theory that can solve high-dimensional, machine learning

problems in polynomial time by making use of data compression and vectorized

processes. Speciﬁcally, I introduced an image feature recognition algorithm,

a categorization algorithm, and a prediction algorithm, each of which has a

low-degree polynomial run time, allowing a wide class of problems in artiﬁcial

1I retain all rights, copyright and otherwise, to all of the algorithms, and other information

presented in this paper. In particular, the information contained in this paper may not be used

for any commercial purpose whatsoever without my prior written consent. All research notes,

algorithms, and other materials referenced in this paper are available on my researchgate

homepage, at https://www.researchgate.net/proﬁle/Charles Davi, under the project heading,

Information Theory.

2A New Model of Artiﬁcial Intelligence.

1

intelligence to be solved quickly and accurately on an ordinary consumer device.

In this article, I’m going to apply the categorization algorithm and prediction

algorithm to four well-known datasets from the UCI Machine Learning Repos-

itory. In a second article, I will apply the full set of three algorithms to image

classiﬁcation problems.

2 Unsupervised Data Classiﬁcation

All of the classiﬁcation problems that we’ll analyze in this section will be solved

on an unsupervised basis, with the classiﬁcation labels hidden from the algo-

rithms. This is accomplished by simply moving the classiﬁcation labels to the

N+1 entry of each vector, where Nis the dimension of the dataset.3For each of

the classiﬁcation problems, we’ll begin by categorizing the relevant dataset, and

then measuring the performance of the categorization algorithm by analyzing

the categories it generates. Speciﬁcally, we’ll measure how well the categories

generated correspond to the hidden classiﬁcation labels. Then, we’ll generate

predictions using each of the datasets, and measure the performance of the pre-

diction algorithm by counting the number of correct classiﬁcation predictions.

All of the training datasets we’ll analyze in this article are courtesy of the

UCI Machine Learning Repository.4

The Iris Dataset5

We’ll begin with the well-known “Iris” dataset, which consists of 150 data

points. Each data point consists of 4 values, which are intended to provide

information regarding the speciﬁc type of ﬂower the data point represents. There

is a hidden ﬁfth value which contains the label of the actual class of ﬂower

the data point represents. There are three classes of ﬂowers, represented by

the numbers 1, 2, and 3, respectively, and as noted above, the categorization

algorithm is blind to this information.

We’ll begin by having the categorization algorithm take the 150 data points,

and construct a set of categories. In this case, the categorization algorithm

generated 40 categories. Of those 40 categories, only 3 categories contain data

points from more than one class of data, which can be seen in the center of Figure

1 below. As a result, 37

40 = 92.500% of the categories generated are consistent

with the hidden classiﬁcation data. The three categories that contained mixed

classes of data points contained data from categories 2 and 3.

3The algorithms ignore any data above the dimension of the dataset.

4Note that I’ve made some formatting changes to the datasets so that they can work with

the algorithms. The code necessary to format the datasets appropriately is available on my

researchgate homepage, at the link provided above.

5https://archive.ics.uci.edu/ml/datasets/Iris.

2

Figure 1: The number of data points in each category for the Iris dataset, with the bars

colored according to the hidden classiﬁcation labels.

In order to generate predictions, we’ll rerun the categorization algorithm on

a randomly selected subset of the original dataset, using the remaining data

points as inputs to the prediction algorithm. This will allow the prediction

algorithm to make non-trivial predictions using new data that was not used to

generate the categories from which it will make predictions.

The prediction algorithm is itself capable of running in two overall modes:

a “rejection on” mode, where it rejects new data that is beyond the scope of

its prior data, causing it to fail to make a prediction; and “rejection oﬀ” mode,

where it always makes a prediction. We’ll begin by running the prediction al-

gorithm in “rejection on” mode, which means there are three possible outcomes

for each prediction: success (i.e., a correct classiﬁcation); rejection (i.e., the

new data is outside of the scope of the prior data); and failure (i.e., an incorrect

classiﬁcation).

I ran a single round of predictions, and this produced 13 successful classiﬁ-

cations, 1 rejection, and 1 failure. If we rerun the same predictions in “rejection

oﬀ” mode, this generates 14 successful classiﬁcations, and 1 failure. This run of

predictions demonstrates that the rejection of a new data point does not imply

that the new data point would have resulted in an incorrect prediction, but

rather, that the new data point is simply not a suﬃciently good ﬁt for the prior

data for the prediction algorithm to make a conﬁdent prediction.

The results of 25 rounds of predictions in “rejection on” mode, for a total

3

of 375 predictions, consisted of 308 successful classiﬁcations, 53 rejections, and

14 fails. This implies an accuracy of either 308

375 = 82.133% or 308

322 = 95.652%,

depending upon whether you do, or do not, include the rejections in the denom-

inator, respectively.

Figure 2: The number of successes, rejections, and fails for 375 classiﬁcation predictions using

the Iris dataset, with rejections turned on.

The results of another 25 rounds of predictions in “rejection oﬀ’ mode, for

another 375 predictions, consisted of 357 successful classiﬁcations and 18 fails.

This implies an accuracy of 357

375 = 95.200%.

4

Figure 3: The number of successes, rejections, and fails for 375 classiﬁcation predictions using

the Iris dataset, with rejections turned oﬀ.

The Ionosphere Dataset6

This dataset consists of 351 data points, each with 34 dimensions of data,

together with a classiﬁer hidden in the 35-th entry of each data point that marks

the data point as either “good”, represented by a “g” in the original dataset,

or “bad”, represented by a “b” in the original dataset.7The classiﬁcation task

is to identify which data points are good and which are bad based upon the 34

dimensions of each data point. Again, as above, we begin by having the cate-

gorization algorithm construct categories using the data, blind to the classiﬁer

of each data point. In this case, the categorization algorithm generated 141

categories, of which 96.454% consisted of a single class of data.

6https://archive.ics.uci.edu/ml/datasets/Ionosphere.

7Prior to running the algorithms, I edited the original dataset, replacing each gwith a 1,

and each bwith a 2.

5

Figure 4: The number of data points in each category for the Ionosphere dataset, with the

bars colored according to the hidden classiﬁcation labels.

We begin by making predictions in “rejection on” mode, again by ﬁrst re-

running the categorization algorithm on a randomly generated subset of the

dataset, and using the remaining data points as inputs to the prediction algo-

rithm. The ﬁrst run of 35 predictions consisted of 19 successful classiﬁcations,

14 rejections, and 2 failed classiﬁcations. Running the same 35 predictions in

“rejection oﬀ” mode produced 28 successful classiﬁcations, and 7 failed classi-

ﬁcations. In this case, a signiﬁcant number of the rejected data points turned

out to generate failed classiﬁcations.

The results of 25 rounds of predictions in “rejection on” mode, for a total

of 875 predictions, consisted of 462 correct classiﬁcations, 383 rejections, and

30 failed classiﬁcations. This implies an accuracy of either 462

875 = 52.800%

or 462

492 = 93.902%, depending upon whether you do, or do not, include the

rejections in the denominator, respectively.

6

Figure 5: The number of successes, rejections, and fails for 875 classiﬁcation predictions using

the Ionosphere dataset, with rejections turned on.

The results of 25 rounds of predictions in “rejection oﬀ” mode, for a total of

875 predictions, consisted of 771 correct classiﬁcations, and 104 failed classiﬁca-

tions. This implies an accuracy of 771

875 = 88.114%. In this example, the ability

to reject data signiﬁcantly improved the accuracy of the predictions.

Figure 6: The number of successes, rejections, and fails for 875 classiﬁcation predictions using

the Ionosphere dataset, with rejections turned oﬀ.

The Parkinson’s Dataset8

8https://archive.ics.uci.edu/ml/datasets/Parkinsons.

7

This dataset consists of 195 data points, each with 22 dimensions of data,

together with a classiﬁer hidden in the 23-rd entry of each data point that marks

the data point as corresponding to a healthy individual, represented by a 0, or

an individual with Parkinson’s disease, represented by a 1.9The classiﬁcation

task is to identify which data points correspond to individuals with Parkinson’s

disease. We begin by having the categorization algorithm construct categories

using the data, blind to the classiﬁer of each data point. In this case, the

categorization algorithm generated 98 categories, of which 91.837% consisted of

a single class of data.

Figure 7: The number of data points in each category for the Parkinson’s dataset, with the

bars colored according to the hidden classiﬁcation labels.

We begin by making predictions in “rejection on” mode, again by ﬁrst rerun-

ning the categorization algorithm on a randomly generated subset of the dataset,

and using the remaining data points as inputs to the prediction algorithm. The

ﬁrst run of 19 predictions consisted of 15 successful classiﬁcations, 4 rejections,

and 0 failed classiﬁcations. Running the same 19 predictions in “rejection oﬀ”

mode produced 18 successful classiﬁcations, and 1 failed classiﬁcation.

The results of 25 rounds of predictions in “rejection on” mode, for a total

of 475 predictions, consisted of 285 correct classiﬁcations, 158 rejections, and

32 failed classiﬁcations. This implies an accuracy of either 285

475 = 60.000%

or 285

317 = 89.905%, depending upon whether you do, or do not, include the

rejections in the denominator, respectively.

9Although the original dataset contains 23 dimensions, the ﬁrst column of the dataset is

the patient’s name, which I’ve removed. Also, after removing the column headers in the top

row of the dataset, there are 195 rows of data remaining. As a result, I believe that the UCI

website erroneously reports that the dataset contains 197 rows of data.

8

Figure 8: The number of successes, rejections, and fails for 475 classiﬁcation predictions using

the Parkinson’s dataset, with rejections turned on.

The results of 25 rounds of predictions in “rejection oﬀ” mode, for a total

of 475 predictions, consisted of 390 correct classiﬁcations, and 85 failed classiﬁ-

cations. This implies an accuracy of 390

475 = 82.105%.

Figure 9: The number of successes, rejections, and fails for 475 classiﬁcation predictions using

the Parkinson’s dataset, with rejections turned oﬀ.

The Wine Dataset10

This dataset consists of 178 data points, each with 13 dimensions of data,

10https://archive.ics.uci.edu/ml/datasets/Wine.

9

together with a classiﬁer hidden in the 14-th entry of each data point that

indicates the class of wine that the data point represents. There are three classes

of wines, represented by the numbers 1, 2, and 3, respectively. The classiﬁcation

task is to identify the class of the wine given the ﬁrst 13 dimensions of the data.

We begin by having the categorization algorithm construct categories using the

data, blind to the classiﬁer of each data point. In this case, the categorization

algorithm generated 74 categories, of which 90.541% consisted of a single class

of data.

Figure 10: The number of data points in each category for the Wine dataset, with the bars

colored according to the hidden classiﬁcation labels.

We begin by making predictions in “rejection on” mode, again by ﬁrst rerun-

ning the categorization algorithm on a randomly generated subset of the dataset,

and using the remaining data points as inputs to the prediction algorithm. The

ﬁrst run of 17 predictions consisted of 12 successful classiﬁcations, 4 rejections,

and 1 failed classiﬁcations. Running the same 17 predictions in “rejection oﬀ”

mode produced 16 successful classiﬁcations, and 1 failed classiﬁcations.

The results of 25 rounds of predictions in “rejection on” mode, for a total

of 425 predictions, consisted of 311 correct classiﬁcations, 96 rejections, and

18 failed classiﬁcations. This implies an accuracy of either 311

425 = 73.176%

or 311

329 = 94.529%, depending upon whether you do, or do not, include the

rejections in the denominator, respectively.

10

Figure 11: The number of successes, rejections, and fails for 425 classiﬁcation predictions

using the Wine dataset, with rejections turned on.

The results of 25 rounds of predictions in “rejection oﬀ” mode, for a total

of 425 predictions, consisted of 387 correct classiﬁcations, and 38 failed classiﬁ-

cations. This implies an accuracy of 387

425 = 91.059%.

Figure 12: The number of successes, rejections, and fails for 425 classiﬁcation predictions

using the Wine dataset, with rejections turned oﬀ.

11