ArticlePDF Available

Hyper-Dimensional Computing Challenges and Opportunities for AI Applications

Authors:

Abstract and Figures

Brain-inspired architectures are gaining increased attention, especially for edge devices to perform cognitive tasks utilizing its limited energy budget and computing resources. Hyperdimensional computing (HDC) paradigm is an emerging framework inspired by an abstract representation of neuronal circuits’ attributes in the human brain. That includes a fully holographic random representation, high-dimension vectors representing data, and robustness to uncertainty. The basic HDC pipeline consists of an encoding, training and comparison stages. The encoding algorithm maps different representations of inputs into a single class and stores them in the associative memory (AM) throughout the training stage. Later, during the inference stage, the similarity is computed between the query vector, which is encoded using the same encoding model, and the stored classes in the AM. HDC has shown promising results for 1D applications using less power, and lower latency than state-of-the-art digital neural networks (DNN). While in 2D applications, convolutional neural network (CNN) still achieves higher classification accuracy at the expense of more computations. In this paper, a comprehensive study on the HDC paradigm, main algorithms, and its implementation is presented. Moreover, the main state-of-the-art HDC architectures for 1D and 2D applications are highlighted. The article also analyzes two competing paradigms, namely, HDC and CNN, in terms of accuracy, complexity, and the number of operations. The paper concluded by highlighting challenges and recommendations for future directions on the HDC framework.
Content may be subject to copyright.
Digital Object Identifier
Hyper-Dimensional Computing
Challenges and Opportunities for AI
Applications
EMAN HASSAN, (GRADUATE STUDENT MEMBER, IEEE), YASMIN HALAWANI, (MEMBER,
IEEE), BAKER MOHAMMAD, (SENIOR MEMBER, IEEE), AND HANI SALEH (SENIOR
MEMBER, IEEE)
System-on-Chip Center, Department of Electrical and Computer Engineering, Khalifa University, Abu-Dhabi, UAE
Corresponding author: Baker Mohammad (e-mail: baker.mohammad@ku.ac.ae).
This publication is based upon work supported by the Khalifa University Competitive Internal Research Award (CIRA) under Award No.
[CIRA-2019-026] and award No. [RC2-2018-020]].
ABSTRACT Brain-inspired architectures are gaining increased attention, especially for edge devices to
perform cognitive tasks utilizing its limited energy budget and computing resources. Hyperdimensional
computing (HDC) paradigm is an emerging framework inspired by an abstract representation of neuronal
circuits’ attributes in the human brain. That includes a fully holographic random representation, high-
dimension vectors representing data, and robustness to uncertainty. The basic HDC pipeline consists of
an encoding, training and comparison stages. The encoding algorithm maps different representations of
inputs into a single class and stores them in the associative memory (AM) throughout the training stage.
Later, during the inference stage, the similarity is computed between the query vector, which is encoded
using the same encoding model, and the stored classes in the AM. HDC has shown promising results for 1D
applications using less power, and lower latency than state-of-the-art digital neural networks (DNN). While
in 2D applications, convolutional neural network (CNN) still achieves higher classification accuracy at the
expense of more computations.
In this paper, a comprehensive study on the HDC paradigm, main algorithms, and its implementation
is presented. Moreover, the main state-of-the-art HDC architectures for 1D and 2D applications are
highlighted. The article also analyzes two competing paradigms, namely, HDC and CNN, in terms of
accuracy, complexity, and the number of operations. The paper concluded by highlighting challenges and
recommendations for future directions on the HDC framework.
INDEX TERMS brain-inspired architectures, hyperdimensional computing, convolutional neural networks,
encoding, classification, associative memory
I. INTRODUCTION
ADVANCEMENTS in deep learning (DL) algorithms
has outperformed conventional machine learning (ML)
approaches in many applications: image classification, voice
recognition [1], activity recognition [2] and object tracking
[3]. Convolutional neural networks (CNN), such as AlexNet
[4], provide excellent classification accuracy at the cost
of large memory storage, memory access, and computing
complexity. ML has traditionally been implemented in the
cloud and data centers like Big Blue, Google, .. etc. The
need to move the processing of ML algorithms to edge
devices is gaining importance for many reasons [5]. Firstly,
some applications are sensitive to response time, such as an
autonomous vehicle, since they cannot tolerate the network’s
latency. Secondly, security and privacy, as storing sensi-
tive information on the cloud, is vulnerable to hackers and
viruses. Third, moving the data to/from the cloud is costly
in terms of power and resources. To this end, many critical-
domain applications such as health care and autonomous
vehicles found the intensive ML algorithms impractical for
real-time edge devices [6], [7]. Therefore, it is crucial to
design an efficient algorithm to perform the cognitive tasks
and specialized hardware to provide high efficiency for edge
devices.
Inspired by the brain’s computational abilities with the
advancements in-memory technologies, hyper-dimensional
1
(HD) computing has open new avenues as a potential light-
weight classifier for resource systems to perform a diversity
of cognitive tasks [8]. Hyper-dimensional computing focuses
on dimensional expansion rather than reduction by emulating
the neuron’s activity. However, due to the large size of the
brain’s circuit, the neural activity patterns can be modeled as
a point in high dimensional space, that is, with hyper-vector
[9]. For vectors of thousands of dimensions (e.g., >1000), it
is called an HD vector.
This article focuses on reviewing the state-of-the-art
(SOTA) designs exploiting the HDC paradigm for classi-
fication tasks. Also, it highlights the main operations of
HDC and compares it to the SOTA computing paradigm for
classification tasks, CNN. The main reasons for conducting
such a study are to (1) provide an overview of several
methodologies commonly used in HD computing to solve
classical and new cognitive tasks. (2) describes existing HD
architectures and highlight their implication on the system’s
overall performance. (3) the impact of HD vector size and
training set size on the system accuracy is analyzed. (4)
compare the HDC with the competing CNN paradigm in
terms of efficiency and accuracy.
The outline of this paper is as follows: the data represen-
tation and the fundamental operations in hyper-dimensional
space are highlighted in Section II. Section III presents the
core of the HD computing framework for cognitive tasks and
some of the algorithms that have been designed to unlock
their potential. HD hardware architectures and memory de-
sign details are drawn in Section IV. Section V reviews major
current applications that employ HDC frameworks. Section
VI depicts a reasonable comparison between the SOTA CNN
model and the HD model for solving classification tasks. The
paper concludes with the pending challenges and possible
future works in Section VII.
II. BACKGROUND ON HYPERDIMENSIONAL
REPRESENTATION AND STRUCTURE
A. HD PROPERTIES AND BASIC OPERATIONS
In this section, the roots of HDC used to represent human
memory, perception, and cognitive abilities are presented us-
ing the mathematical proprieties of HD space, which is built
on rich linear algebra. Brain-inspired computing using HDC
is carried on large components with independent, identical
distribution (i.i.d) and could be binary or non-binary. Hyper-
dimensional vector (HD) or symbolic vector architecture are
common names for presenting data in high-space. The main
inspiration behind HDC comes from Kanerva’s work in [10]
to represent entities using several thousand dimensions vec-
tors and manipulate them utilizing the traditional computing
style.
Consider a case of the binary representation of 10,000 di-
mension vectors in HD space, having this staggering number
of unique vectors (210000), it is unlikely that any system will
require this number of vectors to represents its variables [11].
HD vectors are generated independently and randomly with
equal probability for zeroes and ones. Using this assumption,
the average Hamming distance between any two vectors in
the space is 5000 bits. Also, most HD space is gathered
around 50 bits of standard deviation (SD) in a binomial
distribution. SD depends on the vector’s size, which means
that for large size vectors, the binomial distribution becomes
thinner, and the majority of the space is concentrated around
0.5normalized Hamming distance Hn. Furthermore, it is
important to ascertain the inherent symbolic nature of HD
computing. One of the first observed extraordinary properties
for HDC resides in its ability for analogical and hierarchical
reasoning [12], [13]. Capitalizing on that, HD can be used to
build complex data structures such as sequences [10], images
[11], and lists [12]. In the HD domain, three main operations,
namely; multiplication, addition, and permutation, referred to
as (MAP) operations, are utilized for vectors modeling:
Multiplication (Binding): is used to bind two HD
vectors together, which is usually done using XOR
bitwise operation. The output HD vector is orthogonal
(dissimilar) to HD vectors being bound. The binding
operation is invertible. Also, binding distributes over
addition, preserving the distance between vectors.
Addition (Bundling): is used to combine different HD
vectors into a single HD vector. The resulting HD vector
is similar to each component used in the bundling.
The final HD vector is binarized using a bitwise sum
threshold of nvectors, which yields 0when n/2of bits
in the vectors are zero, and 1otherwise. That happens
when the number of vectors in the set nis odd. For
of nbeing even (number of zeros equal number of
ones), one more random HD vectors added to the set
to break the ties [14]. Besides, the terms "threshold
sum," "majority sum," and "consensus sum" are used
interchangeably to represent the resultant vector and
denoted as S= [A+B+C], where A, B, C are HD
vectors.
Permutation (Shifting): is used as an alternative ap-
proach to bind the HD vector with a special kind of
matrix, called permutation matrix [10]. It is important
for the data/sequence where the order is important. For
example, it is often convenient to use a fixed permu-
tation (denoted as Π) to bind the item’s position, in a
sequence, to an HD vector representing the value of the
item in that position. Because permutation is merely re-
order, it preserves the distance between vectors.
B. HD PIPELINE FOR CLASSIFICATION TASKS
A versatile HD machine should handle the variety of input
data-types and scale with an incremented number of inputs
without affecting its fidelity. It demands efficient encoding
algorithms that affect the system performance and the mem-
ory optimization for hardware realization. Fig.1 demonstrates
the general HD pipeline for supervised classification tasks. It
consists of three main components related to the essential HD
pipeline stages.
Item Memory (IM): mapping inputs to high space starts,
2,
Class 1
HD General Pipeline
IM/CIM
Base_1 B0 ……… BD
Encoding Module
shifter
HDbase2
HDpos2
shifter
HDbase3
HDpos3
shifter
HDbase1
HDpos1
Majori ty Sum
Input features
Features
superposition
Train ing
Samples Bundling
Class 2
Associative
Memory
Base_2 B0 ……… BD
Base_n B0 ……… BD Class k
FIGURE 1. HD general pipeline with main stages including IM/CIM, where main input’s features (integers/continuous) HD vectors are randomly generated and
stored. Encoding module, where all features HD values and features HD position are combined using MAC operations to form an abstracted HD representation for
the given input. In the training stage, all HD vector samplers are combined to generate a class prototype stored in associative memory for inference.
as shown in Fig.1 with assigning unique HD seeds
representation for the bases defined over the given appli-
cation. For example, Latin letters are considered bases
for language recognition [10]. Every basis in the set is
assigned a randomly HD vector and saved in an item
memory (IM). Seed vectors selection could be orthogo-
nal (such as letters in language recognition). However,
representing integers or continuous values in a particular
order, in the same way is not appropriate. Therefore, a
continuous item memory (CIM) is used where two close
numbers have a small distance in their corresponding
HD vectors and are considered semi-orthogonal [15].
It is important to stress that this representation is kept
constant during training and inference phases for the
intended classification task. Nevertheless, the memory
storage for seed vectors cannot be avoided as it is
required to retain the seeds during training and testing
stages.
Encoding module: combines all encoded hyper-vectors
in the training stage to form one HD vector representing
each class. The same encoding model is used to map the
query HD vector, which will be compared later with all
other classes in the inference stage
Associative Memory (AM): stores all trained HD vec-
tors to be used later for the inference. The essential
function of AM is to compare the incoming encoded
query HD vector with the stored classes and return the
closest HD class vector using the appropriate similar-
ity metrics. The two similarity measurements adopted
in current HD encoding algorithms are the Hamming
distance that utilizes XOR operation and the cosine
similarity, which uses the inner product. Few works used
the overlap between coded vectors for measuring the
similarity between HD vectors [16].
The next section will focus on the encoding stage roots, types
of encoders, and the main encoding designs available in the
literature.
III. HD ENCODING STAGE AND ARCHITECTURE
In HD, a universal encoder capable of mapping any arbitrary
data-type into HD space does not exist. Each type of encoder
should be able to map application-specific data, after a proper
pre-processing, to a suitable architecture able to accomplish
the cognitive/classification tasks. The first step towards a
general HD architecture is to demonstrate essential stages of
HD algorithms in an abstracted manner. Moreover, in this
section, we highlight the main encoding architectures that
have been discussed in the literature.
A. COMMON ROOTS FOR HD ENCODING ALGORITHMS
Few simple techniques have been used repeatedly for HD
encoding. Table 1 summarize these techniques in addition to
the number of bundling stages used during encoding. Most of
the HD applications, which will be analyzed in details next
section, are using the following techniques
Multi-set of n-sequence: n-gram is used for text classi-
fication and modeling sequences. In n-gram, the orig-
inal data is re-modeled as long chunks of n-sequences
[10]. After mapping seeds set Si,{s1, s2...., sn}
into a hyper vectors V1, V2, ...Vnand having a set of
n-sequence [(m1, m2...., mn)|mi], they are encoded
as (Vm1)Π1(Vm2)Π2(Vm3)..... Πn1(Vmn ).
Likewise, all remaining n-sequences for a particular
input are encoded and bundled to generate the HD
representation for the particular sequence.
Features superposition: in this technique, feature vec-
tors are extracted and mapped to hyper-vector. As-
sume that we have a feature vector with dimension d
{f1, f2..., fd}. Then for each position in the feature
vector id, a Wihyper-vector is formed. Similarly,
each value in the feature vector fiis assigned hyper-
vector {U1, U2..., Ud}. To correlate the value with po-
sition, the binding operation is used [Pd
i=1 UiWi].
In the final step, all nsamples are encoded and super-
3
TABLE 1. HD Mode Based on Main Encoding Techniques and Number of
Superposition
HD mode
Encoding Roots n-gram feature superposition
Num. of bundling single stage multi-stages
imposed to get the final representation of the feature
[Pn
j=1 Pd
i=1 UiWi].
For classification tasks, the final step in the HD encoding
module is the bundling of all individual representations [17].
Building on that, HD algorithms can be divided into two
types according to the number of bundling used during HD
processing:
A single-stage algorithm, where the bundling operation
used only once. Each term in the bundled vector is
formed by binding its inputs and/or their permutation.
In other words,
I=
K
X
i=1
fi(I)(1)
Where Irepresents the input stream of finite dimension
of HD vectors I,{X=x1, x2...., xn HDV },k
represents the number of terms in the class, and the ith
term in Eq.1 is represented by
fi(I)=(Xb1Xb2...... Xbn )(2)
p1(Xs1)Πp2(Xs2)..... Πpn (Xsn ))
Each term fi(I)depends on certain input value,
some occurring by only using the binding operation
Bi,{b1, b2...., bn}according to their position in the
set {X}, and some requires permutation (Π) ,where
{p1, p2...., pn}is positive integers account the permu-
tation.
A multi-stages algorithm uses the output of a single-
stage algorithm as an input to another single-stage-
HD algorithm. Similarly, we can construct any multi-
stage HD architecture by combining smaller single-
stage algorithms in a hierarchical manner. The multi-
stage algorithm defines the complex non-linear relation-
ship between variables such as time, position, and value.
It is important to mention that only a few researchers
[18], [19] are using the multi-stages HD architecture
for cognitive/classification tasks, while the majority are
focusing on using the single-stage algorithm for its
simplicity.
B. THE MAIN ENCODING SCHEMES
There are several forms of mapping the data from its original
space to HD-space and are classified as follows:
Binary: where the value of HD elements are {0,1}.
Ternary: in which HD elements take values of {0,1,-1}.
Non-binary HD vector: where elements in such vector
are represented using fixed-point, complex, or floating-
point number.
Dense HD vector: in the dense HD vector, all HD
elements have an equal distribution probability.
Sparse HD vector: in the sparse model, the 0 value
dominates the HD vector with low presence for 1 or -
1.
Authors in [20] proposed a sparse random binary HD
vector utilizing permutation-based binding that operates on
a segmented vector, which resulted in efficient-energy de-
ployment. Introducing sparsity in the HD vector reduces the
number of multiplications and additions required to encode
the single HD vector. Holographic Graph Neuron (HoloGN)
for one-pass pattern learning was introduced in [21]. The
sparse code representation of the pattern improves noise re-
sistance in the architecture compared to the original HoloGN
abstracted representation, leading to a tangible improvement
in pattern accuracy. In [14], the author tested a synthesized
and real-world data using two types of mapping (projec-
tion): orthogonal and distance preserving. In the first type,
each symbol is assigned a unique random HD vector that
is kept fixed over the system’s life. For the second type of
mapping, the features’ value is quantized to a fixed number
of levels m [22]. Each unique feature is associated with the
corresponding distributed HD vector considering different
mapping preserving mechanisms: linear mapping, approxi-
mate linear mapping, and non-linear approximate mapping.
The approach proved that the sparse and dense mapping
using different mapping mechanisms showed nearly identical
performance classification tasks.
Likewise, SparseHD in [23] explored the possibility of
sparsity in hyper-vectors to improve the HD computing ef-
ficiency and reduce the computations required for inference.
SparseHD, which enforces sparsity either dimension-wise or
class-wise, takes the conventional trained HD model in non-
binary representation and feeds it to a Model Sparser (MS).
MS drops S% of least significant features in each class’s
trained HD model. Works in [6], [24]–[26] proposed a novel
encoder that quantizes a continuous range of inputs (e.g.,
input range {−1,1}) into M levels and then maps it into HD
vector using the linear mapping technique mentioned in [22]
and stored in CIM. It was demonstrated that the quantization
technique associated with CIM improved the encoder accu-
racy, and its implementation was hardware friendly when the
range of scalar values is known. Besides, the retraining step
was implemented in the training stage to update the AM and
improve the classification accuracy for supervised learning
by testing the classifier accuracy over the training set.
The encoding approach used in the brain-inspired classifier
(BRIC) [25] used locality-based sparse random projection,
which is based on Locality Sensitive Hash algorithms. The
generated HD vector for each feature value is represented
by the sign of dot product between the feature vector in
the n dimension and the random projected vector in the D
dimension using the n-gram window. Also, a pre-determined
4,
index s in the projection matrix is selected to be non-zero,
which creates a spatial locality pattern that the hardware can
take advantage of. The AM is incrementally updated using
the retraining approach. A generic hyper-vector manipulator
(MAN) module was demonstrated in [27] with cheaper log-
ical operations to re-materialize them later. Furthermore, it
allows the representational space to stay in a binary format
by applying back-to-back bundling, which is fundamental
for on-chip learning. FACD encoding module was suggested
in [28] for non-binary HD representation and includes three
main steps: model training, model refinement, and inference.
In the refinement stage, a non-linear K-mean clustering has
been applied on the trained class HD vectors to find the best
centroids representing the distribution of the values in each
HD class.
CompHD in [8], reduces the HD dimension intelligently
in a way that does not sacrifice the system’s accuracy. The
training HD vectors are divided into segments s, each of
length d=D/s, after mapped them to Hyperspace. The posi-
tional information for each segment is preserved using the
Hadamard matrix [29], and all segments are added up to
form one compressed HV model for each class. This method
reduces the cost of cosine similarity for associative search
by reducing multiplication and addition operations. However,
using many segments would increase the data to noise ratio
and affect the model accuracy.
SemiHD in [30] is an application-based model that trade
accuracy-for-efficiency
A fully binarized SearcHD algorithm was discussed in
[24] to reduce the number of addition operations in the
training stage by generating multiple binary HD vectors for
each class N. SearcHD stochastically sharing the query HD
vector elements with each HD class by exploiting bitwise
substitution Though this algorithm increases the memory
access overhead, which is introduced by bitwise substitution,
it accumulates the training data more intelligently.
In the AdaptHD, the author utilized the author same con-
cept of retaining the HD model used in [25]. However, a
learning rate αfactor was introduced to speed the model
convergence during the training phase. Two methods are
proposed for adaptive retraining; iteration-dependent and
data-dependent. However, the model sensitivity to a minor
change in learning rate would result in a misleading accuracy,
especially for noisy data.
Likewise, BinHD was proposed in [31] which enables the
HD model to be trained and tested in binary fashion using
binary accumulators [32].
Recently, a new technique was proposed in [33], namely,
QubitHD to eliminate the overhead of floating-point repre-
sentation and reduce the gap between the classification accu-
racy of binarized and non-binarized HD classifiers. It exploits
the principle that the information needs to be stored in a
quantum bit (Qubit) before its measurement. The algorithm
is based on QuantHD in [34]. However, it enables an efficient
binarization of the HD model during the retraining stage
using quantum measurement techniques.
An unsupervised learning algorithm, HDCluster in [35]
was investigated for clustering input data in HD space by
fully mapping and processing clusters in memory.
IV. GENERAL HD PROCESSOR
From the above abstraction, it is clear that the HD general
processor has two main components: 1) the encoder, which
is the only component that needs to be programmed for
a particular application. However, the AM design can be
optimized for more efficient search implementation. 2) the
Data-flow is one-directional, where all input HD vectors
usually flow from item memory to the encoder and end in the
associative memory One-pass learning named in [14] is also
given to the same Data-flow. An example of all the complete
HD processor framework elements is illustrated in Fig.2 for
hand digit classification task.
Data-parallel architectures are the potential candidate for
HD systems since parallelism exists in HD operations. In
both multiply and addition operations, the vector’s resulting
element depends only on the corresponding elements of its
operands. The result vector element depends on the nearby
operand element for the permutation operation. It is clear that
for any fixed-width architecture less than the D=1000 dimen-
sion, it will be ineffectual to process the large intermediate
HD vectors stored in the costly internal memory. Besides, a
redundant computation is needed in permutation operation
due to intra-word dependencies. To that end, the fundamen-
tal operations and inherent robustness in HDC make it a
good candidate for data-flow based array architecture [17].
Besides, the manipulation of large patterns stored in the
memory makes HDC a potential candidate for an emerging
in-memory computing paradigm or computational memory
based on nanoscale resistive memory or memristor devices
[36].
Furthermore, the HD computing characteristics inimitably
matches with the inherent abilities of the FPGA. Therefore,
efficient hardware FPGA implementations were proposed to
speed-up the HD model during the training (inference) stage
or improve the computational cost.
V. BENCHMARK APPLICATIONS
To date, the evolution of HD computing is still in the research
domain, and the number of applications using HD computing
for solving real-time problems is limited. However, there is
a positive tone as the number of research advancing new
applications for HD is increasing. In this section, the main
works utilizing HD for cognitive tasks, grouped according
to the data structure, are reviewed. We start with 1D data
structure applications (sequences or vectors) such as voice
recognition, text classification, bio-signal, and time-series
applications. Then we highlight some tasks that pertain to
the 2D structure data type and show how the HD computing
approach provides a means of encoding visual scenes such as
frame-based images as well as neuromorphic sensors output.
5
A. ONE-DIMENSION (1D) HD APPLICATIONS
1) Random indexing, text classification, and language
recognition
Random Indexing (RI) is one of the oldest and well-known
applications used to study the relationship between the words
in the language [37]. The RI method introduced the term
"index vector," which assigns a sparse ternary vector for each
document and semantic vector for every word. In the origi-
nal RI, only two HD components have been used: random
representation and bundling operation. In [38], permutation
operation was added to the encoding stage for semantic
representation. For more works on RI approaches, processing
and performance, the reader can refer to [39], [40].
A promising result is revealed in [41] for identifying
language in text documents. The statistic of n-gram letters
in a sentence is encoded using HD permutation, binding, and
bundling operations. This method was tested on 21 languages
using tri-gram letters utilizing Project Gutenberg Hart and
Wortschatz Corpora. This approach accuracy surpassed the
baseline ML learning methods used for the same applica-
tions.
In [42], HD sparse vectors were used to store the symbol’s
occurrence statistics in a Willshaw associative memory using
stochastic counters. Such a framework is useful for on-line
learning applications where the system keeps learning with
incoming data. Work in [43] proposed a bag-of-letters map-
ping scheme that helps identify a valid word in the dictionary
given a permuted letters of the word and using Hamming
distance as a similarity metric. The permuted text of the
"Cambridge test" was used as input. The method showed a
good potential in reconstructing the original words. However,
depending on the letter frequency in the world, a kind of bias
is revealed when measuring similarity. A novel method for
extracting the common sub-string in two strings with length
L1and L2using binary HD vectors was proposed in [44] uti-
lizing the same n-gram technique. For sequence prediction,
work in [45] described a prediction model based on HDC
exploiting the SDM system. The model used k consecutive
points to predict the entire sequence. The prediction’s rate
was limited to the memory capacity and the dimension HD
vector.
2) Sensory inputs recognition
The work in [46] uses HD’s principles for modeling the de-
pendencies between parallel multivariate inputs and proposes
an HD-based predictor (HDCP). The approach’s objective
is to predict the future state of the sequence for the stream
given their previous states. HDCP was tested on activity
recognition tasks using data from different body sensors in
the Palantir Context Data Library. The system performance
outperformed the ML start-of-the-art results and showed its
ability to account for differing reliability of different sensor
inputs. However, in designing HDCP, a special memory
architecture should be considered, such as sparse distributed
memory (SDM) and not the linear additive memory [10].
A similar approach was used in [47] for sequence predic-
tion. But instead of dense bipolar HD vectors, this model
used the sparse distributed HD coding to represent time-
dependent structures using ternary values. The approach,
which is called Sparse Distributed Predictor (SDP), was
tested on real-time mobile phone users data, which used to
predict the next application launched, next music playback
logs and, next GPS location of the user.
XOR logic gate
Pixel value
Pixel pos.
Pixel value
Pixel pos.
XOR output
XOR output
count count
Binarization
Class1
h
1.
h
D
Similarity logic
Comparator
Non-binary
Associative Search
Encoder
Distance Detector
XOR logic gate
HD Model
(Class HVs)
Input Features
h
1.
h
D
Classk
h
1.
h
D
Similarity logic
FIGURE 2. General HD Processor, for MNIST dataset classification. It
consists of IM for storing seeds, an encoder to map data to HD space, AM for
storing HD classes, and similarity measurements for the inference stage.
VoiceHD was proposed in [6] for speech recognition. Their
approach focused on transforming a collection of voice sets
called the Isolate dataset into the frequency domain using the
Mel-frequency Cepstral coefficients (MFCCs) mechanism.
The proposed encoder applied for N frequency bins, which
are all bundled to a single hyper-vector class representing the
intended voice. To overcome the capacity issue in HDC, the
design suggested retraining the AM memory by an incremen-
tal update to improve the classification accuracy. VoiceHD
was five times faster in execution than the conventional DNN
during training and testing.
3) Biomedical (body-sensing) applications
The fast growth of efficient electronics enables a major
enhancement in wearable and portable health care systems.
Building on that, utilizing the HD framework in the biomedi-
cal domain represents an active research area. Works in [18],
[48] developed a full set of HD templates that comprehen-
sively encode different types of bio-signals like EMG, EEG,
and ECoG for multi-class learning and classification. Authors
in [18], [48] extended the encoder in [19] to process simul-
taneous analog bio-signal inputs using continuous mapping
method used in [49]. The proposed approach used dense
bipolar HD vectors representation for dense sensors [48]. The
HD classifier learns 3 times faster than the SVM method,
with accuracy higher than the SVM using the full training
set.
In [50], authors provided an efficient binarized algorithm
for fast classification of human epileptic seizures using
6,
FIGURE 3. LeNet 5 architecture. It consisting of 7 layers: the input layer, two convolution layers followed by pooling ones. And two fully connected layers.
EEG signals [51] and the region of the brain that gener-
ates them using brain-inspired HD computing. Though the
performance of the algorithms mentioned above surpasses
the traditional ML methods, they are only limited to short-
term signal recording. They require data processing, which
would increase hardware complexity. [52] proposed a Lae-
laps algorithm to solve the issue for long-term recording
signal and operates with end-to-end binary operations to
avoid expensive fixed or floating-point arithmetic.
HD Computing-based Multimodality Emotion Recogni-
tion (HDC-MER) is explored in [53] from physiological
signals. The real-valued features of GSR, ECG, and EEG
were extracted and mapped to dense HD binary vectors
using a random non-linear mapping function. HDC-MER
classifier’s accuracy surpassed the extreme gradient boosting
(XGB) method using only (1/4) training data.
Heart rate response during deep breathing (DB) was an-
alyzed in [54] using the principle of HDC. Both Heart rate
and respiratory signals were synthesized and recorded from
real health patients and modeled using Fourier series analysis
to extract the desired features. These features are mapped to
HD vectors using non-linear approximate mapping [14]. The
proposed method using HDC was able to identify signals
with low cardio-respiratory synchronization during DB due
to arrhythmias or when the evaluation of autonomic function
using DB test [55] is considered as a problem.
HDC is relatively new in DNA profiling and has a great
potential to play a significant role in taxonomic identification.
In [56], the hyper-dimensional concept is applied to represent
the DNA nucleotides: adenine (A), guanine (G), cytosine (C),
and thymine (T), which are existed in the DNA molecules
in a particular order, for sequences classification. The HD
algorithm was tested on different data set and showed an
outstanding performance compared with conventional ML
algorithms such as KNN and SVM.
B. TWO-DIMENSION (2D) HD APPLICATIONS
Processing visual data in HD space seems to be the newest
application domain, and the least examined one. In [57],
HDC was used to represent the structured combination of
features detected by a perceptual circuitry from a visual
scene. The system structure is displayed through functional
imitation of the learning concept occurring in honey bees.
The encoding module A visual question answering (VQA)
system was proposed in [58] in which the machine must infer
an answer about the provided image.
The architecture consists of two parts; the first part maps
each image in the data set into bipolar 1000 dimension HD
vector using two layers of forward fully connected (FFW)
neural network. The second part contains the seeds of HD
vectors stored in IM and the related five questions described
in HD format as well. The system was queried on unseen
images for the five questions the accuracy was limited to
(60%-72%) for new images that were not available during
the training.
In [59], a holographic reduced representations method was
used to convert an input image into an HD vector. Interesting
property in this model is its ability to perform a continuous
mapping from positions to HD vectors to preserve the dis-
tance between the HD vectors. The examined experiments
demonstrated good performance on a simple visuospatial
inference task with an accuracy of 95% as well as on a 2D
navigation task.
Cellular Automata (CA) based HDC was used for an
analogy reasoning in [60]. The proposed method extracts
features from an image using a neural network, which is
then expanded into binary non-random HD vectors using
CA 90 rule [61]. A similar approach was used in [62] for
medical image classification. The proposed classifier was
assessed using IM- AGE CLEF2012 data set and collections
from the web. Even though the classifier’s performance was
competing with conventional CNN and BOVW classifiers, it
had variations due to the random permutation of data fed into
the CA grid in each iteration.
Recent work in [11] bridge the gap between perception
and action in robotic applications using HD binary vector
(HBV) as a currency to produce the "memories" concept of
previous events. The work target was to find the associative
velocity given time image, using only memories. The time-
image is mapped to HBV by first constructing an intensity
space containing values from [0-255] encoded as a binary
HD space with 8k vector length for each intensity. Both
time image HBV and velocity HBV vector are bounded into
a single HD space to create action-perception space. The
proposed method was tested on ego-motion estimation in the
autonomous vehicle using the MVSEC dataset. The results
showed comparable performance with the traditional CNN
approach by applying less training data. However, the data
density to be represented using HBV could limit the HD
performance. A concise summary of main works that exploit
HDC for various applications, its encoding module, and main
hardware implementations is highlighted in table 2.
The next section will compare HDC and CNN in terms
of the number of operations and accuracy for 1D and 2D
applications.
7
TABLE 2. Summary table for the main works on HDC including main encoding algorithms, HD vector representation, experimental results and hardware
implementations.
Ref. DESC.1Class-
Rep.2,3
APP-DIM.4HD-
ACC.5
ACC-Impv.6ML-ACC.7HW-IMP.8Eff/Speed9
[6] VoiceHD B 1D (speech
recognition)
88.1% Ret., 93.8% DNN ,
95.9% [63]
CPU 11.9x/4.6x
[14], [64] HD for pat-
tern recog-
nition
B, D, S 2D (synthe-
sised)
90% - CPU -
[13], [65] HD for
language
Identifica-
tion
B,T,D 1D (text clas-
sification)
96.7% - PCA,
97.9% [41]
ASIC 2x/-
[24] SearcHD B,D 1D(IoT
Botnet
detection)
94% Quant., 99.9% KNN,
99.9% [66]
ReRAM/
CAM
31.1x/12.8x
[46] HDCP NB, D 1D (activity
recognition)
82.2% - DT, 70.4%
[67]
- -
[8] CompHD NB, D 1D (gesture
recognition)
91.04% - - FPGA 8.04x/4.1x
[15], [27] HDC for
bio-signal
Processing
NB,B, D 1D(hand
gesture
recognition
(EMG))
90.8% n-gram tuning,
97.8%
SVM ,
89.7% [68]
FBGA 2.39×/986×
[18] HD for bio-
signal Pro-
cessing
NB, D 1D (EEG-
ERPs);
74.5% - Gaussian,
70.5% [68]
- -/3x
[23] SparseHD NB, S 1D (activity
recognition)
92% Ret., 95% - FPGA 48.5x/15x
[25] BRIC B, D 1D (activity
recognition)
95% Ret. - FPGA 64.1×/ 8.9
[26] HDC NB, B,
D
1D, activity
recognition
92.5% Ret., 97% BNN
97.2%
FPGA -/56x
[28], [69] FACH B, D 1D (speech
recognition)
- Ret., Ref. 95% - FPGA 5.9x/5.1x
[30] Semi-HD
for self
training
B, D 1D(detect
liver disor-
der(bupa))
- Ret., 74.1% SVM
83.8%
FPGA/
Raspberry
Pi 3
12.6x/
7.11x
[31] BinHD B, D 1D, Car-
diotocograms
97.6% Ret./non-
binary, 99.5%
Sisporto,
99% [70]
ASIC 12.4x/6.3X
[34] QuantHD NB, B,
D
1D, speech
recognition
95% Ret., Shuff.
95.7%
BNN,
96.1% [71]
FPGA 34.1x/4.1x
[35] HDCluster B, D 1D, breast
cancer
detection
96.2% - DT, 94.1%
[72]
- -
[41], [73] HD nano-
system
B, NB,
D
1D (text
recognition)
96.7% - SVM
97.9%
3D VR-
RAM/CMOS
50x/-
[50], [52],
[74]
Laelaps B, D 1D (epileptic
seizure
detection)
94.84% - DL,94.77%
[75]
CPU 1.4x/s 1.7x
[76] Hierarchical
MHD
NB, D 1D(speech
recognition)
93.3% decider confi-
dence, 95.9%
- CPU 6.6x/6.1x
[77] AdaptHD B,D 1D (activity
recognition)
- Ret., 96.15% Gradient
Boosting,
96.05%
Raspberry
Pi 3
6.3x/6.9x
[78] TP-HDC B,D 2D(muti-task
learning)
94.1% - - - -
1Description (Encoding Algorithm).
2HD class Representation (Binary (B), non-binary(NB), Dense(D), Sparse(S)).
3Non-Binary refers to integer, floating, bi-polar {1,1},Ternary{1,0,1} values.
4Data dimension (1D/2D) for specified application.
5Maximum accuracy for HD model
6Accuracy improved via (Retraining (Ret.), Quantization (Quant.), Refinement(Ref. ), Shuffle(Suff.))
7state-of-art ML method accuracy.
8HD Hardware Implementation.
9HW energy efficiency/speedup compared to ML/DNN/HD based algorithm.
8,
VI. CASE STUDIES ON HDC AND CNN
A. HDC ACCURACY AS A FUNCTION OF TRAINING
DATASET SIZE
This section studied both HD and CNN for popular data
set and compared the 2-approaches in terms of accuracy,
computing complexity, and overall performance. We imple-
mented both HDC and CNN approaches independently and
then analyzed the results to guide the selection based on
target needs. MNIST dataset was used in this work for HDC-
based classification. MNIST is a well-known dataset with
wide varieties for digit representation. For the HDC model,
each pixel position, in (28×28) image is assigned a unique
HD vector. Therefore, 784 randomly generated vectors act
as seeds stored in IM and fixed over the system’s life. D is
assumed to be equal to 10k. To preserve the pixel’s location
in the image, each pixel HD vector bounded with its intensity.
The gray-scale images are converted to black and white (BW)
for simple representation. This means that 0 denotes a black
pixel intensity, and 1 denotes a white pixel intensity for a par-
ticular image. This procedure is called orthogonal mapping,
which was inspired from [14] and modified according to the
targeted task. The mapping procedure for the MNIST dataset
is performed using the following steps and highlighted in
Fig.2:
Initialize the HD Mapper
Set the dimensionality D of the HD vectors.
Flatten input images, convert them to binary, and
set the features’ number. The digit image consists
of 784 pixels, where each pixel represents a feature.
Initialize the seeds randomly. For every feature i, a
dense binary random HD vector will be generated;
748 dissimilar vectors are stored in the IM.
Encoding mechanism: for each pixel position, its cor-
responding HD vector is bound by the value of that
pixel ( the random HD vector for the pixel position
is multiplied by a special matrix, called permutation
matrix, where it performs a 1-bit shift for black and 0-bit
shift for white)
Generate the HD distributed representation of the digit
by applying Eq.(1).
Training stages: combine HD vectors, for all similar
samples, into a single pattern binary representation Hb
using the majority sum operation: Hb= [Pn
i=1(I)],
where [] refers to the majority sum operation.
Store the binary representation for each digit (class
prototype) in the AM.
During the testing phase, the same mapping procedure is used
to generate the query HD vector. Later, this encoded binary
HD vector is compared to the stored HD representations
(classes) in the AM through the Hamming distance similarity
measurement.
Here, we study the effect of the training sets size for
the MNIST dataset on classification accuracy. In the first
experiment, we used series of randomly selected 50 images
for each digit. All the HD vectors of the presented image
for the particular digits were combined to form a single
HD representation of that digit(class). Thus, by the end of
the training phase, the associative memory contains 10 HD
vectors, each mutually representing all digits’ variations. We
repeated the experiment for 100, 500, 1000, 2000, 3000,
4400, and 5200 samples for each digit. During the testing
phase, 1000 new images for all digits were used as input.
The overall accuracy was measured as the percentage of the
correctly classified digits averaged over the test set size.
FIGURE 4. The average accuracy (%) of HD classifier for MNIST dataset as a
function of the training dataset size. The system was examined using 50, 100,
500,1000, 2200, 4400, and 5200 samples from each class. The accuracy
saturate at (89%) between 1000-2200 samples
Fig. 4 illustrates that the obtained accuracy for small sets
starts small and then increases in an approximately linear
fashion until it hits an upper limit slightly less than 90% accu-
racy. After that, the classification accuracy starts to decrease.
That is due to the majority sum operation, which generates
the class binary representations of the observed patterns. It
imposes a limit on the number of HD vectors included in the
sum, above which robust decoding of the individual operands
becomes very difficult [14]. This explains accuracy behavior
when the number of patterns increased above a specific
limit. Most HDC research focuses on overall accuracy to
measure the system’s performance. We added other metrics
to measure performance, such as Precision and Recall for
this work. Precision is the classifier’s ability to identify only
the relevant samples in the dataset. While Recall refers to
the system’s ability to identify all pertinent instances of the
dataset [79]. The results show that both Recall ( 86.5%) and
the Precision ( 85%) were achieved in retrieving the HDC
distributed representations based method. We found that the
accuracy, precision, and recall values are in the same range.
Further analysis for other metrics such as F1-score ( 86%),
MCC-Matthews correlation coefficient ( 0.85), and kappa-
Cohen’s kappa ( 0.81) reveals the robustness of the classifier
and that the system utilizes a balanced dataset.
9
B. HDC ACCURACY AS A FUNCTION OF DIMENSION
SIZE
In our HDC simulations, we have assumed a 10k vector
length. However, some studies show that using an HD vector
with a dimension less than 1000 elements is adequate to
represent the system [80]. This reduction in vector size would
positively reflect on the execution time and the chip area
implementation. Moreover, energy and search time are direct
functions of the dimension of the HD vector and the number
of classes considered in the search operation [80]. Results
show that a 1k representation can achieve 90.4% compared to
97.8% classification accuracy when utilizing a 10k HD vector
for language recognition dataset. For the MNIST dataset
employed in this work, we examine the accuracy level for
different HD vector sizes. It is clear from Fig.5 that the HD
of 6k dimension is the adequate size where the accuracy
hits above 89% level. Beyond that dimension, the accuracy
improves slightly at the cost of time and area. Nevertheless,
a cost reduction will emerge when using the 2k HD vector,
where the accuracy drops only by 0.2%. Hence, trade-off
accuracy for efficiency is application dependent and would
greatly impact system performance, especially for IoT edge
devices.
FIGURE 5. MNIST classification average accuracy (%) as a function of the
HD vector dimension. 1k, 2k, 4k, 6k, 8k and 10k HD vector sizes were
selected to examine the accuracy level. Result confirm that for (8%) of training
size, the need for 8K vector size to reach the maximum accuracy
C. HDC VS CNN
HDC is a promising model for edge devices as it does not
include the computationally demanding training step found
in the widely used CNN [17], [80]. A significant differ-
ence between the two computing paradigms is that HDC
departs from the dimensionality reduction concept found in
machine learning, such as neural networks. And focuses on
dimensionality expansion by emulating the neuron’s activity.
Nonetheless, HDC comes with its challenges as encoding
alone takes about 80% of the execution time of the training
[25] and some encoding algorithms might even increase the
TABLE 3. Accuracy, Energy consumption and Execution time of VoiceHD,
VoiceHD+NN and TinyDNN [6].
VoiceHD VoiceHD+NN TinyDNN
Accuracy 93.8% 95.3% 93.6%
Partial training accuracy (40%) 91.7% 92.9% 85%
Energy consumption (mJ) 38 53 454
Training execution time (min) 3.7 5.9 17
Testing execution time (ms) 0.87 1.1 4.6
size of encoded data by 20×[76]. This section compares
the two computing paradigms for 1D and 2D applications in
terms of accuracy, computational complexity, and number of
utilized parameters.
It has been shown that HDC outperforms digital neural
network (DNN) in 1D data set applications such as speech
recognition [6]. They proposed two HDC-based designs: a
single-stage HDC architecture, VoiceHD, and one that is
followed by a NN, VoiceHD+NN, to increase the accuracy.
Their designs were compared to a 3-layer NN implemented
using TinyDNN with around 3k neurons. Table 3 compares
the three designs in terms of full and partial training accu-
racies, training and testing times, and testing energy con-
sumption. As can be deduced, HDC-based designs can still
maintain high classification accuracy even with 40% only
of the training set. Although the number of parameters in
this example for HDC is more than the DNN, the designs
could perform a much faster runtime with lower energy
consumption.
HDC has been widely implemented for 1D signals, but
the complexity increases once it is expanded to 2D. In this
section, we want to quantify the number of operations and
parameters required by each computing paradigm. MNIST
classification was performed using LeNet5 on the Caffe
framework. LeNet 5 is a simple CNN architecture that con-
sists of 7 layers as shown in Fig. 3 [81]. Caffe, a deep
learning framework, has been used to simulate and quantify
the LeNet for MNIST digit classification [82]. Moreover, we
have used Netscape, a CNN analyzer tool, by importing the
LeNet design into it [83]. MNIST digit recognition based on
LeNet5 CNN achieves 99% classification accuracy using
a learning rate of 0.01 for 10,000 iterations. While the HDC
classification accuracy for the same MNIST data set achieved
86% classification accuracy with the full training dataset.
Table 4 below presents the accuracy comparison between the
recall results for the HDC model and the reference CNN
approach for MNIST dataset. The analysis shows that the
performance of the HDC is lower than the CNN approach
when considering the overall accuracy level. However, in
certain digits for the HDC model, the recall is inferior to other
digits. For example, as illustrated in the confusion matrix
attached in Fig.6, the accuracy of the digit “4”, “5”, and “8”
recognition is persistently lower than other digits. Thas is
due to its similarity to several other characters. In particular,
10 ,
FIGURE 6. Confusion matrix (CM) displays the total number of observations
in each cell, the rows in the CM corresponds to the right class, and the column
corresponds to the predicted class. The diagonal corresponds to the correctly
classified classes. The row at the bottom of the plot shows the percentages of
all samples belonging to each class correctly and incorrectly classified. These
metrics are often called the recall. The column on the far right of the plot shows
the percentages of all the samples predicted to belong to each class correctly
and incorrectly classified. These metrics are often called the precision.
for example, when recalling “4”, the recall scoring is “9”
- 19.1%, and scoring “6” - 4.04%. For recalling “5”, the
inference scoring “3” - 30% , “2” - 3.2%.
It is essential to highlight that for HD architecture, the
Hamming distance acts as a quantitative metric of the sim-
ilarity via direct comparison of distributed representations
without decoding those representations. Results in [7] show
that the larger number of common elements lead to a more
considerable similarity between resulting vectors. In MNIST
HD representation, the high resemblance in some digits,
for example, “4” and “9” makes the number of overlapped
elements between their HD vectors relatively high, making
the difference between analyzable patterns sometimes indis-
tinguishable and prone to error. That imposes a limit on the
number of overlapped elements between two patterns that
can be robustly detected. To reduce the error, one needs to
explore another method for mapping the HD space inputs,
such as using linear mapping instead of orthogonal mapping
[14]. Another suggestion is to extract the image’s main fea-
tures using CNN techniques and then map those features to
HD space. Also, utilizing some retraining techniques would
enhance the system accuracy and reduce the error [77].
To compute the No. of parameters required by the HDC-
based design during the testing phase, we assume: (a) An
array of size 784×10k is generated for random seeds and
stored in the IM. (b) Encoded classes stored in the associative
memory have a size of 10k×10. For computation operations,
we assume:(a) The No. of shifting is (0.5 ×784) assuming
that 50% of the seeds array needs to be shifted (HD vector
level operation). (b) Remove the shifting effect by perform-
ing the shifting operation again (0.5 ×784). (c) Column-wise
addition operation is (784 - 1)×10k. (d) XOR operation is
(10×10k) between the query and the encoded patterns in the
AM. (e) Addition operation following the XOR is 10×(10k -
1). (f) Comparison operations are 9.
TABLE 4. Computing Complexity of 2D (MNIST) using both HDC and CNN
Computing Paradigms.
CNN HDC
Full Training Accuracy 99.04% 86%
Partial Training Accuracy (16%) 98.3% (87.5-88.5)%
Partial Training Accuracy (8%) 97.7% (85-86)%
F1-score 99% 86%
No. of Parameters 431.08k 7940k
MAC operations 146.75M -
No. of Computations 974.08k 8030.78k
As can be deduced from Table 4, CNN is still superior
in accuracy utilizing full training set and with D = 10k for
the HDC architecture. To further study the impact of the
size of the training set on the CNN accuracy, the LeNet
trained with 10k and 5k samples. The reported accuracy was
98.29% (compared to 89.5% in HDC) and 97.70% (86%),
respectively. It is apparent that CNN yet performs better
for 2D applications as the accuracy drops slightly by 1.4%,
which is consistent with the findings in [81]. Nonetheless, in
the HDC design, no MAC operations are required, positively
reflecting on area, energy, and execution time.
VII. CONCLUSIONS AND RECOMMENDATIONS FOR
FUTURE WORK
AI at the edge is important to enable intelligent machines.
Efficient hardware architectures with low computing com-
plexity and memory requirements to allow low power and
small form factor is of paramount importance.
HDC is still considered a new paradigm and faces chal-
lenges requiring further analysis. One of these challenges
that need to address is the sensitivity of the HDC model,
which depends on several factors, such as the dimension
of the vector used to represent the input data, the mapping
form (Binary, Ternary, integer) to high dimension space, and
the majority sum (bundling sensitivity). Moreover, once the
model is implemented into hardware, there will be an un-
avoidable source of variations like devices and temperature,
which would lead to imprecision.
Future works direction may include but not limited to:
(1) exploit HDC intrinsic characteristics for more classi-
fication/cognitive tasks in different domains like security,
image processing, and real-time applications. (2) focus on
developing an efficient encoding algorithm that handles HDC
capacity limitation and would improve data representation
for 2D applications. (3) develop more hardware friendly met-
rics for similarity measurement that would enhance system
accuracy. (4) design a unified HD processor that addresses
11
diverse data types and can trade-offs accuracy to efficiency
based on the application requirements. (5) investigate the
dimension of HD vector that store and manipulate different
data representations to reduce the memory footprint and
enhance system efficiency. (6) study various methods for
integrating the DL/ML techniques with HDC and analyzing
system performance effects.
In addition, two computing paradigms are studied and
compared: Firstly, CNN focuses on building decisions based
on small features of the target input. The second one is the
HDC, which uses the HD vector to encode all possible input
views to match the holographic distributed representation.
Selecting either one has its implication on accuracy, comput-
ing complexity, and power consumption.
This paper showed that for 1D data, HDC is more efficient
and can provide superior overall performance. While in 2D
applications, CNN still achieves higher classification accu-
racy at the expense of more computations. Thus, 2D-HDC
reduced the number of required MAC operations by >140
M, which has a direct impact on area and power with 10%
accuracy loss. Besides, to architecture further optimization of
the implementation, CNN and HDC can benefit from many
new approaches such as in-memory computing, data reuse,
efficient data flow, .. etc.
VIII. APPENDIX
The following Github link can be used for Matlab code:https:
//github.com/emfhasan/HDCvsCNN_StudyCase. Two com-
puting approaches, HD and CNN for popular MNIST dataset,
are studied and compared in terms of accuracy, computing
complexity and overall performance.
REFERENCES
[1] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mo-
hamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen,
T. Sainath, and B. Kingsbury, “Deep neural networks
for acoustic modeling in speech recognition: The shared
views of four research groups, IEEE Signal processing
magazine, vol. 29, no. 6, pp. 82–97, 2012.
[2] F. J. Ordóñez and D. Roggen, “Deep convolutional and
lstm recurrent neural networks for multimodal wearable
activity recognition, Sensors, vol. 16, no. 1, p. 115,
2016.
[3] N. Wojke, A. Bewley, and D. Paulus, “Simple online
and realtime tracking with a deep association metric,”
2017.
[4] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi,
“XNOR-net: Imagenet classification using binary con-
volutional neural networks, Springer, Cham, pp. 525–
542, 2016.
[5] R. Morabito, “Virtualization on internet of things edge
devices with container technologies: A performance
evaluation, IEEE Access, vol. 5, pp. 8835–8850, 2017.
[6] M. Imani, D. Kong, A. Rahimi, and T. Rosing,
“Voicehd: Hyperdimensional computing for efficient
speech recognition,” in 2017 IEEE International Con-
ference on Rebooting Computing (ICRC). IEEE, 2017,
pp. 1–8.
[7] D. Kleyko, “Vector Symbolic Architectures and their
Applications Computing with Random Vectors in a Hy-
perdimensional Space,” Ph.D. dissertation, Luleå Uni-
versity of Technology Luleå, Luleå, Sweden, 2018.
[8] J. Morris, M. Imani, S. Bosch, A. Thomas, H. Shu,
and T. Rosing, “CompHD: Efficient Hyperdimensional
Computing Using Model Compression,” Proceedings of
the International Symposium on Low Power Electronics
and Design, vol. 2019-July, 2019.
[9] P. Kanerva, “Computing with High-Dimensional Vec-
tors,” IEEE Design and Test, vol. 36, no. 3, pp. 7–14,
2019.
[10] P. Kanerva, “Hyperdimensional Computing: An Intro-
duction to Computing in Distributed Representation
with High-Dimensional Random Vectors,” Cognitive
Computation, vol. 1, no. 2, pp. 139–159, 2009.
[11] A. Mitrokhin, P. Sutor, C. Fermüller, and Y. Aloi-
monos, “Learning sensorimotor control with neuromor-
phic sensors: Toward hyperdimensional active percep-
tion,” Science Robotics, vol. 4, no. 30, 2019.
[12] T. A. Plate, “Holographic reduced representation: Dis-
tributed representation for cognitive structures, Stan-
ford: CSLI, 2003.
[13] P. Kanerva, J. Kristoferson, and A. Holst, “Random
indexing of text samples for latent semantic analysis,
in Proceedings of the Annual Meeting of the Cognitive
Science Society, vol. 22, no. 22, 2000.
[14] D. Kleyko, A. Rahimi, D. A. Rachkovskij, E. Osipov,
and J. M. Rabaey, “Classification and recall with bi-
nary hyperdimensional computing: Tradeoffs in choice
of density and mapping characteristics,” IEEE Trans-
actions on Neural Networks and Learning Systems,
vol. 29, pp. 5880–5898, 2018.
[15] A. Rahimi, S. Benatti, P. Kanerva, L. Benini, and J. M.
Rabaey, “Hyperdimensional biosignal processing: A
case study for EMG-based hand gesture recognition,”
in 2016 IEEE International Conference on Rebooting
Computing (ICRC). IEEE, 2016, pp. 1–8.
[16] D. A. Rachkovskij and E. M. Kussul, “Binding and nor-
malization of binary sparse distributed representations
by context-dependent thinning, Neural Computation,
vol. 13, no. 2, pp. 411–452, 2001.
[17] S. Datta, R. A. Antonio, A. R. Ison, and J. M. Rabaey,
A programmable hyper-dimensional processor archi-
tecture for human-centric iot,” IEEE Journal on Emerg-
ing and Selected Topics in Circuits and Systems, vol. 9,
no. 3, pp. 439–452, 2019.
[18] A. Rahimi, P. Kanerva, L. Benini, and J. M. Rabaey,
“Efficient biosignal processing using hyperdimensional
computing: Network templates for combined learning
and classification of ExG signals,” Proceedings of the
IEEE, vol. 107, no. 1, pp. 123–143, 2018.
[19] A. Rahimi, S. Datta, D. Kleyko, E. P. Frady, B. Ol-
shausen, P. Kanerva, and J. M. Rabaey, “High-
12 ,
Dimensional Computing as a Nanoscalable Paradigm,
IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 64, no. 9, pp. 2508–2521, 2017.
[20] M. Laiho, J. H. Poikonen, P. Kanerva, and E. Lehtonen,
“High-dimensional computing with sparse vectors, in
IEEE Biomedical Circuits and Systems Conference
(BioCAS). IEEE, 2015, pp. 1–4.
[21] D. Kleyko, E. Osipov, A. Senior, A. I. Khan, and
Y. A. ¸Sekerciog˘
glu, “Holographic graph neuron: A
bioinspired architecture for pattern processing,” IEEE
transactions on neural networks and learning systems,
vol. 28, no. 6, pp. 1250–1262, 2016.
[22] M. Hersche, J. d. R. Millán, L. Benini, and
A. Rahimi, “Exploring embedding methods in bi-
nary hyperdimensional computing: A case study for
motor-imagery based brain-computer interfaces, arXiv
preprint arXiv:1812.05705, 2018.
[23] M. Imani, S. Salamat, B. Khaleghi, M. Sam-
ragh, F. Koushanfar, and T. Rosing, “Sparsehd:
Algorithm-hardware co-optimization for efficient high-
dimensional computing,” in IEEE 27th Annual Inter-
national Symposium on Field-Programmable Custom
Computing Machines (FCCM). IEEE, 2019, pp. 190–
198.
[24] M. Imani, X. Yin, J. Messerly, S. Gupta, and S. Mem-
ber, “SearcHD : A Memory-Centric Hyperdimensional
Computing with Stochastic Training, IEEE Transac-
tions on Computer-Aided Design of Integrated Circuits
and Systems, pp. 1–12, 2019.
[25] M. Imani, J. Morris, J. Messerly, H. Shu, Y. Deng, and
T. Rosing, “Bric: Locality-based encoding for energy-
efficient brain-inspired hyperdimensional computing,
in Proceedings of the 56th Annual Design Automation
Conference, 2019, pp. 1–6.
[26] Y. Kim, M. Imani, and T. S. Rosing, “Efficient human
activity recognition using hyperdimensional comput-
ing,” in Proceedings of the 8th International Conference
on the Internet of Things, 2018, pp. 1–6.
[27] M. Schmuck, L. Benini, and A. Rahimi, “Hardware Op-
timizations of Dense Binary Hyperdimensional Com-
puting: Rematerialization of Hypervectors, Binarized
Bundling, and Combinational Associative Memory,”
ACM Journal on Emerging Technologies in Computing
Systems, vol. 15, no. 4, pp. 1–25, 2019.
[28] M. Imani, S. Salamat, S. Gupta, J. Huang, and T. Ros-
ing, “Fach: FPGA-based Acceleration of Hyperdimen-
sional Computing by Reducing Computational Com-
plexity,” 2019, pp. 532–537.
[29] H. Kharaghani and B. Tayfeh-Rezaie, “A hadamard
matrix of order 428,” Journal of Combinatorial Designs,
vol. 13, no. 6, pp. 435–440, 2005.
[30] M. Imani, S. Bosch, M. Javaheripi, B. Rouhani,
X. Wu, F. Koushanfar, and T. Rosing, “SemiHD: Semi-
supervised learning using hyperdimensional comput-
ing,” in IEEE/ACM International Conference On Com-
puter Aided Design (ICCAD), 2019, pp. 1–8.
[31] M. Imani, J. Messerly, F. Wu, W. Pi, and T. Rosing,
A binary learning framework for hyperdimensional
computing,” in Design, Automation & Test in Europe
Conference & Exhibition (DATE). IEEE, 2019, pp.
126–131.
[32] Y. Wu, G. Wayne, A. Graves, and T. Lillicrap, “The
kanerva machine: A generative distributed memory,
2018.
[33] S. Bosch, A. S. de la Cerda, M. Imani, T. S. Rosing,
and G. De Micheli, “Qubithd: A stochastic accelera-
tion method for hd computing-based machine learning,”
arXiv preprint arXiv:1911.12446, 2019.
[34] M. Imani, S. Bosch, S. Datta, S. Ramakrishna, S. Sala-
mat, J. M. Rabaey, and T. Rosing, “QuantHD: A Quan-
tization Framework for Hyperdimensional Computing,
IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems.
[35] M. Imani, Y. Kim, T. Worley, S. Gupta, and T. Rosing,
“Hdcluster: An accurate clustering using brain-inspired
high-dimensional computing,” in Design, Automation
& Test in Europe Conference & Exhibition (DATE).
IEEE, 2019, pp. 1591–1594.
[36] G. Karunaratne, M. Le Gallo, G. Cherubini, L. Benini,
A. Rahimi, and A. Sebastian, “In-memory hyperdimen-
sional computing,” Nature Electronics, pp. 1–11, 2020.
[37] D. Kleyko, “Vector Symbolic Architectures and their
Applications Computing with Random Vectors in a Hy-
perdimensional Space,” Ph.D. dissertation, Luleå Uni-
versity of Technology Luleå, Luleå, Sweden, 2018.
[38] G. Recchia, M. Jones, M. Sahlgren, and P. Kanerva,
“Encoding sequential information in vector space mod-
els of semantics: Comparing holographic reduced rep-
resentation and random permutation,” in Proceedings of
the Annual Meeting of the Cognitive Science Society,
vol. 32, no. 32, 2010.
[39] F. Sandin, B. Emruli, and M. Sahlgren, “Random in-
dexing of multidimensional data, Knowledge and In-
formation Systems, vol. 52, no. 1, pp. 267–290, 2017.
[40] G. Recchia, M. Sahlgren, P. Kanerva, and M. N. Jones,
“Encoding sequential information in semantic space
models: Comparing holographic reduced representation
and random permutation,” Computational intelligence
and neuroscience, vol. 2015, pp. 1–18, 2015.
[41] A. Joshi, J. T. Halseth, and P. Kanerva, “Language ge-
ometry using random indexing, in International Sym-
posium on Quantum Interaction. Springer, 2016, pp.
265–274.
[42] M. Laiho, E. Lehtonen, J. H. Poikonen, and P. Kan-
erva, Associative memory with occurrence statistics,
in IEEE International Symposium on Circuits and Sys-
tems (ISCAS). IEEE, 2016, pp. 2278–2281.
[43] D. Kleyko, E. Osipov, and R. W. Gayler, “Recognizing
permuted words with vector symbolic architectures:
A cambridge test for machines,” Procedia Computer
Science, vol. 88, no. November, pp. 169–175, 2016.
[44] D. Kleyko and E. Osipov, “On bidirectional transi-
13
tions between localist and distributed representations:
The case of common substrings search using Vector
Symbolic Architecture,” Procedia Computer Science,
vol. 41, no. December, pp. 104–113, 2014.
[45] J. Mercado, R. Fernández, and M. Salinas, “Se-
quence prediction with hyperdimensional computing,”
Research in Computing Science, vol. 138, pp. 117–126,
12 2017.
[46] O. Räsänen and S. Kakouros, “Modeling dependencies
in multiple parallel data streams with hyperdimensional
computing,” IEEE Signal Processing Letters, vol. 21,
no. 7, pp. 899–903, 2014.
[47] O. J. Räsänen and J. P. Saarinen, “Sequence predic-
tion with sparse distributed hyperdimensional coding
applied to the analysis of mobile phone use patterns,”
IEEE transactions on neural networks and learning sys-
tems, vol. 27, no. 9, pp. 1878–1889, 2015.
[48] A. Moin, A. Zhou, A. Rahimi, S. Benatti, A. Menon,
S. Tamakloe, J. Ting, N. Yamamoto, Y. Khan,
F. Burghardt, L. Benini, A. C. Arias, and J. M. Rabaey,
An emg gesture recognition system with flexible high-
density sensors and brain-inspired high-dimensional
classifier, in 2018 IEEE International Symposium on
Circuits and Systems (ISCAS), May 2018, pp. 1–5.
[49] D. Widdows and T. Cohen, “Reasoning with vectors:
A continuous model for fast robust inference, Logic
Journal of the IGPL, vol. 23, no. 2, pp. 141–173, 2015.
[50] A. Burrello, K. Schindler, L. Benini, and A. Rahimi,
“Hyperdimensional computing with local binary pat-
terns: One-shot learning of seizure onset and identifi-
cation of ictogenic brain regions using short-time ieeg
recordings,” IEEE Transactions on Biomedical Engi-
neering, vol. 67, no. 2, pp. 601–613, Feb 2020.
[51] A. Burco, “Exploring Neural-symbolic Integration Ar-
chitectures for Computer Vision, Ph.D. dissertation,
Politecnico Di Torino, 2018.
[52] A. Burrello, L. Cavigelli, K. Schindler, L. Benini, and
A. Rahimi, “Laelaps: An energy-efficient seizure detec-
tion algorithm from long-term human ieeg recordings
without false alarms, in Design, Automation Test in
Europe Conference Exhibition (DATE), March 2019,
pp. 752–757.
[53] E.-J. Chang, A. Rahimi, L. Benini, and A.-Y. A.
Wu, “Hyperdimensional computing-based multimodal-
ity emotion recognition with physiological signals,
in IEEE International Conference on Artificial Intelli-
gence Circuits and Systems (AICAS). IEEE, 2019,
pp. 137–141.
[54] D. Kleyko, E. Osipov, and U. Wiklund, “A Hyper-
dimensional Computing Framework for Analysis of
Cardiorespiratory Synchronization during Paced Deep
Breathing,” IEEE Access, vol. 7, no. March, pp.
34 403–34 415, 2019.
[55] P. A. Low, T. Opfer-Gehrking, I. Zimmerman, and
P. O’Brien, “Evaluation of heart rate changes: elec-
trocardiographic versus photoplethys-mographic meth-
ods,” Clinical Autonomic Research, vol. 7, no. 2, pp.
65–68, 1997.
[56] M. Imani, T. Nassar, J. Morris, and T. Rosing, “DNA se-
quencing using brain-inspired hyperdimensional com-
puting,” Tech. Rep., March, 25 2019.
[57] D. Kleyko, E. Osipov, R. W. Gayler, A. I. Khan, and
A. G. Dyer, “Imitation of honey bees’ concept learning
processes using vector symbolic architectures, Biolog-
ically Inspired Cognitive Architectures, vol. 14, pp. 57–
72, 2015.
[58] G. Montone, J. K. O’Regan, and A. V. Terekhov,
“Hyper-dimensional computing for a visual question-
answering system that is trainable end-to-end,” arXiv
preprint arXiv:1711.10185, 2017.
[59] E. Weiss, B. Cheung, and B. Olshausen, “A neural
architecture for representing and reasoningabout spa-
tial relationships,” in 4th International Conference on
Learning Representations (ICLR), 2016, pp. 1–4.
[60] O. Yilmaz, “Analogy making and logical inference
on images using cellular automata based hyperdimen-
sional computing,” in Proceedings 2015th of Interna-
tional Conference on Cognitive Computation: Integrat-
ing Neural and Symbolic Approaches, no. 9, 2015, p.
19–27.
[61] O. Yilmaz, “Machine learning using cellular automata
based feature expansion and reservoir computing.
Journal of Cellular Automata, vol. 10, no. 435-472,
2015.
[62] D. Kleyko, S. Khan, E. Osipov, and S.-P. Yong, “Modal-
ity classification of medical images with distributed
representations based on cellular automata reservoir
computing,” in IEEE 14th International Symposium on
Biomedical Imaging (ISBI). IEEE, 2017, pp. 1053–
1056.
[63] G. Chechik, U. Shalit, V. Sharma, and S. Bengio, “An
online algorithm for large scale image similarity learn-
ing,” in Advances in Neural Information Processing
Systems, 2009, pp. 306–314.
[64] D. Kleyko, E. Osipov, A. Senior, A. I. Khan, and
Y. A. ¸Sekercioˇ
glu, “Holographic Graph Neuron: A
Bioinspired Architecture for Pattern Processing, IEEE
Transactions on Neural Networks and Learning Sys-
tems, vol. 28, no. 6, pp. 1250–1252, 2017.
[65] A. Rahimi, P. Kanerva, and J. M. Rabaey, “A Robust
and Energy-Efficient Classifier Using Brain-Inspired
Hyperdimensional Computing,” in Proceedings of the
International Symposium on Low Power Electronics
and Design, 2016, pp. 64–69.
[66] Y. Mirsky, T. Doitshman, Y. Elovici, and A. Shab-
tai, “Kitsune: an ensemble of autoencoders for on-
line network intrusion detection, arXiv preprint
arXiv:1802.09089, 2018.
[67] M. Ermes, J. Pärkkä, J. Mäntyjärvi, and I. Korho-
nen, “Detection of daily activities and sports with
wearable sensors in controlled and uncontrolled condi-
tions,” IEEE transactions on information technology in
14 ,
biomedicine, vol. 12, no. 1, pp. 20–26, 2008.
[68] M. A. Oskoei and H. Hu, “Support vector machine-
based classification scheme for myoelectric control ap-
plied to upper limb,” IEEE transactions on biomedical
engineering, vol. 55, no. 8, pp. 1956–1965, 2008.
[69] S. Salamat, M. Imani, and T. Rosing, Accelerating
hyperdimensional computing on fpgas by exploiting
computational reuse,” IEEE Transactions on Comput-
ers, vol. 69, no. 8, pp. 1159–1171, 2020.
[70] D. Ayres-de Campos, J. Bernardes, A. Garrido,
J. Marques-de Sa, and L. Pereira-Leite, “Sisporto 2.0:
a program for automated analysis of cardiotocograms,”
Journal of Maternal-Fetal Medicine, vol. 9, no. 5, pp.
311–318, 2000.
[71] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott,
P. Leong, M. Jahre, and K. Vissers, “Finn: A frame-
work for fast, scalable binarized neural network infer-
ence,” in Proceedings of the 2017 ACM/SIGDA In-
ternational Symposium on Field-Programmable Gate
Arrays, 2017, pp. 65–74.
[72] R. S. Michalski, I. Mozetic, J. Hong, and N. Lavrac,
“The multi-purpose incremental learning system aq15
and its testing application to three medical domains,”
Proc. AAAI 1986, pp. 1–041, 1986.
[73] H. Li, T. F. Wu, A. Rahimi, K.-S. Li, M. Rusch,
C.-H. Lin, J.-L. Hsu, M. M. Sabry, S. B. Eryilmaz,
J. Sohn et al., “Hyperdimensional computing with
3D VRRAM in-memory kernels: Device-architecture
co-design for energy-efficient, error-resilient language
recognition,” in IEEE International Electron Devices
Meeting (IEDM). IEEE, 2016, pp. 16–1.
[74] A. Burrello, K. Schindler, L. Benini, and A. Rahimi,
“One-shot learning for ieeg seizure detection using end-
to-end binary operations: Local binary patterns with hy-
perdimensional computing,” in IEEE Biomedical Cir-
cuits and Systems Conference (BioCAS), Oct 2018, pp.
1–4.
[75] R. Hussein, H. Palangi, Z. J. Wang, and R. Ward, “Ro-
bust detection of epileptic seizures using deep neural
networks, in IEEE International Conference on Acous-
tics, Speech and Signal Processing (ICASSP). IEEE,
2018, pp. 2546–2550.
[76] M. Imani, C. Huang, D. Kong, and T. Rosing, “Hierar-
chical hyperdimensional computing for energy efficient
classification,” in 55th ACM/ESDA/IEEE Design Au-
tomation Conference (DAC). IEEE, 2018, pp. 1–6.
[77] M. Imani, J. Morris, S. Bosch, H. Shu, G. De Micheli,
and T. Rosing, “AdaptHD: Adaptive Efficient Train-
ing for Brain-Inspired Hyperdimensional Computing,”
in IEEE Biomedical Circuits and Systems (BioCAS),
NAra, Japan, 2019.
[78] C.-Y. Chang, Y.-C. Chuang, and A.-Y. A. Wu, “Task-
projected hyperdimensional computing for multi-task
learning,” in IFIP International Conference on Artificial
Intelligence Applications and Innovations. Springer,
2020, pp. 241–251.
[79] T. Kurbiel. Gaining an intuitive understanding of preci-
sion, recall and area under curve. [Online]. Available:
https://towardsdatascience.com(Accessed:2020-11-12)
[80] M. Imani, A. Rahimi, D. Kong, T. Rosing, and
J. M. Rabaey, “Exploring hyperdimensional associa-
tive memory,” IEEE International Symposium on High
Performance Computer Architecture (HPCA), pp. 445–
456, 2017.
[81] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
“Gradient-based learning applied to document recog-
nition,” Proceedings of the IEEE, vol. 86, no. 11, pp.
2278–2324, 1998.
[82] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,
R. Girshick, S. Guadarrama, and T. Darrell, “Caffe:
Convolutional architecture for fast feature embedding,
arXiv preprint arXiv:1408.5093, 2014.
[83] D. Gschwend. Netscope cnn analyzer. [Online]. Avail-
able: https://github.com/cwlacewe/netscope(Accessed:
2020-06-17)
15
... Machine learning methods, and specifically DL architectures, represent flexible, trainable operations with unprecedented power-and often exorbitant computational demands! Hyperdimensional computing is seen as a time-and energy-efficient form of machine learning because processing is extremely fast despite the large size of the HVs, with speed improvements from 5 to 50 times compared to traditional methods reported [71]. For example, the review by [71] reports speedups ranging from a factor 2 to a factor 50 for various applications, often with a minor performance cost. ...
... Hyperdimensional computing is seen as a time-and energy-efficient form of machine learning because processing is extremely fast despite the large size of the HVs, with speed improvements from 5 to 50 times compared to traditional methods reported [71]. For example, the review by [71] reports speedups ranging from a factor 2 to a factor 50 for various applications, often with a minor performance cost. The reason is that encoding, training, and inference usually require only simple component-wise operations. ...
... In general, obtaining a performance that is as good as that of conventional machine learning algorithms can be tricky for some applications. For example, in [71], it is observed that while HDC can perform state-of-the-art for 1D data, such as text, sound and biosignal classification, its performance on 2D data, such as images, is still inferior. The large dimensionality of the HVs also incurs a large memory footprint, for which clever implementation or hardware accelerations are needed to attain high speeds. ...
Article
Full-text available
Advances in bioinformatics are primarily due to new algorithms for processing diverse biological data sources. While sophisticated alignment algorithms have been pivotal in analyzing biological sequences, deep learning has substantially transformed bioinformatics, addressing sequence, structure, and functional analyses. However, these methods are incredibly data-hungry, compute-intensive, and hard to interpret. Hyperdimensional computing (HDC) has recently emerged as an exciting alternative. The key idea is that random vectors of high dimensionality can represent concepts such as sequence identity or phylogeny. These vectors can then be combined using simple operators for learning, reasoning, or querying by exploiting the peculiar properties of high-dimensional spaces. Our work reviews and explores HDC’s potential for bioinformatics, emphasizing its efficiency, interpretability, and adeptness in handling multimodal and structured data. HDC holds great potential for various omics data searching, biosignal analysis, and health applications.
... The table includes a short description of the type of encoding, and the formula to obtain the encoded HV of the image. The symbols used in this table are listed in Supplementary Table S1 (Kleyko et al., 2016(Kleyko et al., , 2017bManabat et al., 2019;Hassan et al., 2022). To encode the 2D binarized image, two unique permutations ρ X and ρ Y are assigned to represent the x-and y-axis of the image, respectively. ...
... This includes the methods applying the permutation operation to encode the position of pixels in the flattened image [Section 2.3.1.1.1(a)], i.e., Manabat et al. (2019) and Hassan et al. (2022) report an accuracy of 79.87 and 86%, respectively. Our obtained result for MNIST is also better compared to several studies using the binding operation for position encoding in the flattened image [Section 2.3.1. ...
... . aKleyko et al. (2016Kleyko et al. ( , 2017b,Manabat et al. (2019), andHassan et al. (2022). bKussul et al. (2006),Mitrokhin et al. (2019),Kleyko et al. (2020), andRachkovskij (2022).c ...
Article
Full-text available
Introduction Hyperdimensional Computing (HDC) is a brain-inspired and lightweight machine learning method. It has received significant attention in the literature as a candidate to be applied in the wearable Internet of Things, near-sensor artificial intelligence applications, and on-device processing. HDC is computationally less complex than traditional deep learning algorithms and typically achieves moderate to good classification performance. A key aspect that determines the performance of HDC is encoding the input data to the hyperdimensional (HD) space. Methods This article proposes a novel lightweight approach relying only on native HD arithmetic vector operations to encode binarized images that preserves the similarity of patterns at nearby locations by using point of interest selection and local linear mapping . Results The method reaches an accuracy of 97.92% on the test set for the MNIST data set and 84.62% for the Fashion-MNIST data set. Discussion These results outperform other studies using native HDC with different encoding approaches and are on par with more complex hybrid HDC models and lightweight binarized neural networks. The proposed encoding approach also demonstrates higher robustness to noise and blur compared to the baseline encoding.
... Despite the breath of interesting results presented in the HDC literature on the aforementioned topics, there are still many open questions. We refer the reader to the works by Hassan et al. [31] and Kleyko et al. [56] for a discussion on open problems and challenges related to HDC. ...
... In real applications, permutation has been used to represent time series and n-grams [4,42,73,89]. A significant effort in HDC has been devoted to developing new and better encoding strategies [23,31,44,74,83,102]. Encoding functions are generally application specific and are critical in successfully applying HDC to a problem. ...
Article
Full-text available
Hyperdimensional Computing (HDC), also known as Vector Symbolic Architectures (VSA), is a neuro-inspired computing framework that exploits high-dimensional random vector spaces. HDC uses extremely parallelizable arithmetic to provide computational solutions that balance accuracy, efficiency and robustness. The majority of current HDC research focuses on the learning capabilities of these high-dimensional spaces. However, a tangential research direction investigates the properties of these high-dimensional spaces more generally as a probabilistic model for computation. In this manuscript, we provide an approachable, yet thorough, survey of the components of HDC. To highlight the dual use of HDC, we provide an in-depth analysis of two vastly different applications. The first uses HDC in a learning setting to classify graphs. Graphs are among the most important forms of information representation, and graph learning in IoT and sensor networks introduces challenges because of the limited compute capabilities. Compared to the state-of-the-art Graph Neural Networks, our proposed method achieves comparable accuracy, while training and inference times are on average 14.6× and 2.0× faster, respectively. Secondly, we analyse a dynamic hash table that uses a novel hypervector type called circular-hypervectors to map requests to a dynamic set of resources. The proposed hyperdimensional hashing method has the efficiency to be deployed in large systems. Moreover, our approach remains unaffected by a realistic level of memory errors which causes significant mismatches for existing methods.
... This often leads to a CAM array size 2 to 5 times larger than its SRAM counterpart [8], substantially increasing power consumption and chip area [9], [10]. Emerging applications, such as hyperdimensional computing (HDC) associative memory (AM), demand CAMs with higher efficiency and lower power consumption [11], [12], particularly for edge device implementation. For instance, rather than undergoing multiple cycles to access and compare AM entries, CAMs can achieve this in 1-2 cycles. ...
Preprint
This paper presents a novel re-configurable SRAM-based array that supports SRAM, Binary-CAM (BCAM), Ternary-CAM (TCAM), and similarity index functions. The proposed design improves traditional memory architectures employing a compact 6T SRAM-based structure for in-memory computing, offering a low power, area-efficient solution. This makes it particularly suited for advanced AI applications such as Binary Neural Networks (BNN) and Hyperdimensional Com-puting(HDC). The design enables efficient data manipulation, content-based searches with bit-wise masking capabilities, and high-precision search operations ≃ 0.03 V/bit. In addition, the design supports in-memory logical operations, including bitwise AND, NOR,OR, XNOR, and NOT, further enhancing energy efficiency by reducing data movement. Including a similarity index feature enables accurate quantification of mismatches between search and stored words, making it ideal for Hamming distance (HD) computation, which is applicable to many applications. Comprehensive post-layout simulations with parasitic extraction are performed to validate functionality, robustness, and performance. Detailed SPICE simulations with industry-standard tools for 22nm FDSOI foundry process technology confirm energy consumption of 0.44 femtojoules (fJ)/bit for the search operation. The design maintains expected function-ality across all PVT (process, voltage, temperature) conditions and significantly improves over traditional CAM designs.
... Hardware implementation of vector symbolic architectures to operate on large hypervectors can be notably challenging. Comparing and permuting HD vectors can quickly become bottlenecked by slow memory access [139]. Therefore, in-memory computing is widely studied as an energy-efficient approach for hyperdimensional computing [135,[140][141][142]. Furthermore, several literature reviews have studied the recent advances in this research field from a theoretical and practical perspective [133,137,143]. ...
Article
Full-text available
Brain-inspired computing is a growing and interdisciplinary area of research that investigates how the computational principles of the biological brain can be translated into hardware design to achieve improved energy efficiency. Brain-inspired computing encompasses various subfields, including neuromorphic and in-memory computing, that have been shown to outperform traditional digital hardware in executing specific tasks. With the rising demand for more powerful yet energy-efficient hardware for large-scale artificial neural networks , brain-inspired computing is emerging as a promising solution for enabling energy-efficient computing and expanding AI to the edge. However, the vast scope of the field has made it challenging to compare and assess the effectiveness of the solutions compared to state-of-the-art digital counterparts. This systematic literature review provides a comprehensive overview of the latest advances in brain-inspired computing hardware. To ensure accessibility for researchers from diverse backgrounds, we begin by introducing key concepts and pointing out respective in-depth topical reviews. We continue with categorizing the dominant hardware platforms. We highlight various studies and potential applications that could greatly benefit from brain-inspired computing systems and compare their reported computational accuracy. Finally, to have a fair comparison of the performance of different approaches, we employ a standardized normalization approach for energy efficiency reports in the literature. Graphical abstract
Article
In-memory processing offers a promising solution for enhancing the performance of data-intensive applications. While analog in-memory computing demonstrates remarkable efficiency, its limited precision is suitable only for approximate computing tasks. In contrast, digital in-memory computing delivers the deterministic precision necessary to accelerate high-assurance applications. Current digital in-memory computing methods typically involve manually breaking down arithmetic operations into in-memory compute kernels. In contrast, traditional digital circuits are synthesized through intricate and automated design workflows. In this paper, we introduce a logic synthesis framework called LOGIC, which facilitates the translation of high-level applications into digital in-memory compute kernels that can be executed using non-volatile memory. We propose techniques for decomposing element-wise arithmetic operations into in-memory kernels while minimizing the number of in-memory operations. Additionally, we optimize the sequence of in-memory operations to reduce non-volatile memory utilization. To address the NP-hard execution sequencing optimization problem, we have developed two look-ahead algorithms that offer practical solutions. Additionally, we leverage data layout re-organization to efficiently accelerate applications that heavily rely on sparse matrix-vector multiplication operations. Our experimental evaluations demonstrate that our proposed synthesis approach improves the area and latency of fixed-point multiplication by 84% and 20% compared to the state-of-the-art, respectively. Moreover, when applied to scientific computing applications sourced from the SuiteSparse Matrix Collection, our design achieves remarkable improvements in area, latency, and energy efficiency by factors of 4.8 ×, 2.6 ×, and 11 ×, respectively.
Article
Full-text available
Data encoding is a fundamental step in emerging computing paradigms, particularly in stochastic computing (SC) and hyperdimensional computing (HDC), where it plays a crucial role in determining the overall system performance and hardware cost efficiency. This study presents an advanced encoding strategy that leverages a hardware-friendly class of low-discrepancy (LD) sequences, specifically powers-of-2 bases of Van der Corput (VDC) sequences (VDC-2 n), as sources for random number generation. Our approach significantly enhances the accuracy and efficiency of SC and HDC systems by addressing challenges associated with randomness. By employing LD sequences, we improve correlation properties and reduce hardware complexity. Experimental results demonstrate significant improvements in accuracy and energy savings for SC and HDC systems. Our solution provides a robust framework for integrating SC and HDC in resource-constrained environments, paving the way for efficient and scalable AI implementations.
Preprint
Full-text available
Data encoding is a fundamental step in emerging computing paradigms, particularly in stochastic computing (SC) and hyperdimensional computing (HDC), where it plays a crucial role in determining the overall system performance and hardware cost efficiency. This study presents an advanced encoding strategy that leverages a hardware-friendly class of low-discrepancy (LD) sequences, specifically powers-of-2 bases of Van der Corput (VDC) sequences (VDC-2^n), as sources for random number generation. Our approach significantly enhances the accuracy and efficiency of SC and HDC systems by addressing challenges associated with randomness. By employing LD sequences, we improve correlation properties and reduce hardware complexity. Experimental results demonstrate significant improvements in accuracy and energy savings for SC and HDC systems. Our solution provides a robust framework for integrating SC and HDC in resource-constrained environments, paving the way for efficient and scalable AI implementations.
Article
Full-text available
Hyperdimensional computing is an emerging computational framework that takes inspiration from attributes of neuronal circuits including hyperdimensionality, fully distributed holographic representation and (pseudo)randomness. When employed for machine learning tasks, such as learning and classification, the framework involves manipulation and comparison of large patterns within memory. A key attribute of hyperdimensional computing is its robustness to the imperfections associated with the computational substrates on which it is implemented. It is therefore particularly amenable to emerging non-von Neumann approaches such as in-memory computing, where the physical attributes of nanoscale memristive devices are exploited to perform computation. Here, we report a complete in-memory hyperdimensional computing system in which all operations are implemented on two memristive crossbar engines together with peripheral digital complementary metal–oxide–semiconductor (CMOS) circuits. Our approach can achieve a near-optimum trade-off between design complexity and classification accuracy based on three prototypical hyperdimensional computing-related learning tasks: language classification, news classification and hand gesture recognition from electromyography signals. Experiments using 760,000 phase-change memory devices performing analog in-memory computing achieve comparable accuracies to software implementations.
Article
Full-text available
Brain-inspired hyperdimensional (HD) computing emulates cognition by computing with long-size vectors. HD computing consists of two main modules: encoder and associative search. The encoder module maps inputs into high dimensional vectors, called hypervectors. The associative search finds the closest match between the trained model (set of hypervectors) and a query hypervector by calculating a similarity metric. To perform the reasoning task for practical classification problems, HD needs to store a non-binary model and uses costly similarity metrics as cosine. In this paper we propose an FPGA-based acceleration of HD exploiting Computational Reuse (HD-Core) which significantly improves the computation efficiency of both encoding and associative search modules. We observed that consecutive inputs have high similarity which can be used to reduce the complexity of the encoding step. HD-Core additionally eliminates the majority of multiplication operations by clustering the class hypervector values and sharing the values among all the class hypervectors. Our evaluations on several classification problems show that HD-Core can provide 4.4x energy efficiency improvement and 4.8x speedup over the optimized GPU implementation.
Conference Paper
Full-text available
Brain-inspired Hyperdimensional (HD) computing is a promising solution for energy-efficient classification. HD emulates cognition tasks by exploiting long-size vectors instead of working with numeric values used in contemporary processors. However, the existing HD computing algorithms have lack of controllability on the training iterations which often results in slow training or divergence. In this work, we propose AdaptHD, an adaptive learning approach based on HD computing to address the HD training issues. AdaptHD introduces the definition of learning rate in HD computing and proposes two approaches for adaptive training: iteration-dependent and data-dependent. In the iteration-dependent approach, AdaptHD uses a large learning rate to speedup the training procedure in the first iterations, and then adaptively reduces the learning rate depending on the slope of the error rate. In the data-dependent approach, AdaptHD changes the learning rate for each data point depending on how far off the data was misclassified. Our evaluations on a wide range of classification applications show that AdaptHD achieves 6.9× speedup and 6.3× energy efficiency improvement during training as compared to the state-of-the-art HD computing algorithm.
Article
Full-text available
Brain-inspired Hyperdimensional (HD) computing models cognition by exploiting properties of high dimensional statistics– high-dimensional vectors, instead of working with numeric values used in contemporary processors. A fundamental weakness of existing HD computing algorithms is that they require to use floating point models in order to provide acceptable accuracy on realistic classification problems. However, working with floating point values significantly increases the HD computation cost. To address this issue, we proposed QuantHD, a novel framework for quantization of HD computing model during training. QuantHD enables HD computing to work with a low-cost quantized model (binary or ternary model) while providing a similar accuracy as the floating point model. We accordingly propose an FPGA implementation which accelerates HD computing in both training and inference phases. We evaluate QuantHD accuracy and efficiency on various real-world applications, and observe that QuantHD can achieve on average 17.2% accuracy improvement as compared to the existing binarized HD computing algorithms which provide a similar computation cost. In terms of efficiency, QuantHD FPGA implementation can achieve on average 42.3× and 4.7× (34.1× and 4.1×) energy efficiency improvement and speedup during inference (training) as compared to the state-of-the-art HD computing algorithms.
Conference Paper
Full-text available
In the Internet of Things (IoT), the large volume of data generated by sensors poses significant computational challenges in resource-constrained environments. Most existing machine learning algorithms are unable to train a proper model using a significantly small amount of labeled data available in practice. In this paper, we propose SemiHD, a novel semi-supervised algorithm based on brain-inspired HyperDimensional (HD) computing. SemiHD performs the cognitive task by emulating neuron's activity in high-dimensional space. SemiHD maps data points into high-dimensional space and trains a model based on the available labeled data. To improve the quality of the model, SemiHD iteratively expands the training data by labeling data points which can be classified by the current model with high confidence. We also proposed a framework which enables users to trade accuracy for efficiency and select the desired reliability of the model in detecting out of scope data. We have evaluated SemiHD's accuracy and efficiency on a wide range of classification applications and two types of embedded devices: Raspberry Pi 3 and Kintex-7 FPGA. Our evaluation shows that SemiHD can improve the classification accuracy of supervised HD by 10.2% on average (up to 27.3%). In addition, we observe that SemiHD FPGA implementation achieves 7.11× faster and 12.6× energy efficiency as compared to the CPU implementation.
Chapter
Brain-inspired Hyperdimensional (HD) computing is an emerging technique for cognitive tasks in the field of low-power design. As an energy-efficient and fast learning computational paradigm, HD computing has shown great success in many real-world applications. However, an HD model incrementally trained on multiple tasks suffers from the negative impacts of catastrophic forgetting. The model forgets the knowledge learned from previous tasks and only focuses on the current one. To the best of our knowledge, no study has been conducted to investigate the feasibility of applying multi-task learning to HD computing. In this paper, we propose Task-Projected Hyperdimensional Computing (TP-HDC) to make the HD model simultaneously support multiple tasks by exploiting the redundant dimensionality in the hyperspace. To mitigate the interferences between different tasks, we project each task into a separate subspace for learning. Compared with the baseline method, our approach efficiently utilizes the unused capacity in the hyperspace and shows a 12.8% improvement in averaged accuracy with negligible memory overhead.
Article
Brain-inspired HyperDimensional (HD) computing emulates cognitive tasks by computing with long binary vectors–aka hypervectors–as opposed to computing with numbers. However, we observed that in order to provide acceptable classification accuracy on practical applications, HD algorithms need to be trained and tested on non-binary hypervectors. In this paper, we propose SearcHD, a fully binarized HD computing algorithm with a fully binary training. SearcHD maps every data points to a high-dimensional space with binary elements. Instead of training an HD model with non-binary elements, SearcHD implements a full binary training method which generates multiple binary hypervectors for each class. We also use the analog characteristic of non-volatile memories (NVMs) to perform all encoding, training, and inference computations in memory. We evaluate the efficiency and accuracy of SearcHD on a wide range of classification applications. Our evaluation shows that SearcHD can provide on average 31.1× higher energy efficiency and 12.8× faster training as compared to the state-of-the-art HD computing algorithms.
Article
Brain-inspired hyperdimensional (HD) computing models neural activity patterns of the very size of the brain’s circuits with points of a hyperdimensional space, that is, with hypervectors. Hypervectors are D-dimensional (pseudo)random vectors with independent and identically distributed (i.i.d.) components constituting ultra-wide holographic words: D=10,000 bits, for instance. At its very core, HD computing manipulates a set of seed hypervectors to build composite hypervectors representing objects of interest. It demands memory optimizations with simple operations for an efficient hardware realization. In this article, we propose hardware techniques for optimizations of HD computing, in a synthesizable open-source VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx UltraScale FPGAs: (1) We propose simple logical operations to rematerialize the hypervectors on the fly rather than loading them from memory. These operations massively reduce the memory footprint by directly computing the composite hypervectors whose individual seed hypervectors do not need to be stored in memory. (2) Bundling a series of hypervectors over time requires a multibit counter per every hypervector component. We instead propose a binarized back-to-back bundling without requiring any counters. This truly enables on-chip learning with minimal resources as every hypervector component remains binary over the course of training to avoid otherwise multibit components. (3) For every classification event, an associative memory is in charge of finding the closest match between a set of learned hypervectors and a query hypervector by using a distance metric. This operator is proportional to hypervector dimension (D), and hence may take O(D) cycles per classification event. Accordingly, we significantly improve the throughput of classification by proposing associative memories that steadily reduce the latency of classification to the extreme of a single cycle. (4) We perform a design space exploration incorporating the proposed techniques on FPGAs for a wearable biosignal processing application as a case study. Our techniques achieve up to 2.39× area saving, or 2,337× throughput improvement. The Pareto optimal HD architecture is mapped on only 18,340 configurable logic blocks (CLBs) to learn and classify five hand gestures using four electromyography sensors.