SECURITIES: This booklet is not an offer, or a solicitation for an offer, to enter into any transaction. It is solely for informational purposes, only to describe a set of algorithms that
implement machine learning and deep learning (the “algorithms”).
Vectorized Deep Learning
July 18, 2022
In a series of lemmas and corollaries, I proved that under certain rea-
sonable assumptions, you can classify and cluster datasets with literally
perfect accuracy. Of course, real world datasets don’t perfectly conform to
the assumptions, but my work nonetheless shows, that worst-case polyno-
mial runtime algorithms can produce astonishingly high accuracies. This
results in run-times that are simply incomparable to any other approach
to A.I. of which I’m aware, with classiﬁcations at times taking seconds
over datasets comprised of tens of millions of vectors, even when run on
consumer devices. Below is a summary of the results of this model as
applied to benchmark datasets, including UCI and MNIST datasets, as
well as several novel datasets rooted in thermodynamics. All of the code
necessary to follow along is available on my ResearchGate Homepage, and
In a series of lemmas and corollaries (see, “Analyzing Dataset Consistency”
), I proved that given certain reasonable assumptions about a dataset, simple
algorithms can classify and cluster with literally perfect accuracy (see, speciﬁ-
cally, Lemmas 1.1 and 2.3 of ). Of course, real world datasets don’t always
conform to the assumptions, but my work nonetheless shows, that worst-case
polynomial runtime algorithms can produce astonishingly high accuracies, as a
general matter. This results in run-times that are simply incomparable to any
other approach to deep learning of which I’m aware, with classiﬁcations at times
taking seconds over datasets comprised of tens of millions of vectors, even when
run on consumer devices. Below is a summary of the results of this model as
applied to UCI and MNIST datasets, as well as several novel datasets rooted in
thermodynamics. All of the code necessary to follow along is available on my
ResearchGate Homepage, and www.blacktreeautoml.com.
For a mathematically rigorous, theoretical explanation, of why these algo-
rithms work, see . For an in-depth, practical explanation of how these al-
gorithms work, including applications to other datasets, see “A New Model of
Artiﬁcial Intelligence” .
As a general matter, my work seeks to make maximum use of data compression,
and parallel computing, taking worst-case polynomial runtime algorithms, pro-
ducing, at times, best-case constant runtime algorithms, that also, at times, run
on a small fraction of the input data. The net result is astonishingly accurate
and eﬃcient Deep Learning software, that is so simple and universal, it can run
in a point-and-click GUI.
Figure 1: Runtime with 10 Columns Figure 2: Runtime with 15 Columns
Even when running on consumer devices, Black Tree’s runtimes are simply
incomparable to typical Deep Learning techniques, such as Neural Networks,
and Figures 1 and 2 above show the runtimes (in seconds) of Black Tree’s fully
vectorized Delta Clustering algorithm, running on a MacBook Air 1.3 GHz Intel
Core i5, as a function of the number of rows, given datasets with 10 columns
(left) and 15 columns (right), respectively. In the worst case (i.e., with no
parallel computing), Black Tree’s algorithms are all polynomial in runtime as a
function of the number of rows and columns.
1.2 Spherical Clustering
Almost all of my classiﬁcation algorithms ﬁrst make use of clustering, and so
I’ll spend some time describing that process. My basic clustering method is in
1. Fix a radius of delta (the calculation is described below);
2. Then, for each row of the dataset (treated as a vector), ﬁnd all other rows
(again, treated as vectors), that are within delta of that row (i.e., contained in
the sphere of radius delta, with an origin of the vector in question).
This will generate a spherical cluster, for each row of the dataset, and there-
fore, a distribution of classes within each such cluster.
The iterative version of this method has a linear runtime as a function of
(M1) ⇥N,whereMis the number of rows and Nis the number of columns
(note, we simply take the norm of the di↵erence between a given vector, and
all other vectors in the dataset). The fully vectorized version of this algorithm
has a constant runtime, because all rows are independent of each other, and all
columns are independent of each other, and you can, therefore, take the norm
of the di↵erence between a given row and all other rows, simultaneously. As a
result, the parallel runtime is constant.
1.3 Calculating Delta
My simplest clustering methods use a supervised calculation of delta: simply
increase delta some ﬁxed number of times, beginning at delta equals zero, using
a ﬁxed increment, until you encounter your ﬁrst error, which is deﬁned by the
cluster in question containing a vector that is not of the same class as the origin
vector (see Section 2.2 below). This will of course produce clusters that contain
a single class of data, though it could be the case, that you have a cluster of
one for a given row (i.e., the cluster contains only the origin).