A Hand-Drawn Barcode
Dipl.-Inf., BTU Cottbus (2009)
Submitted to the Department of Computer Science
in partial fulﬁllment of the requirements for the degree of
Master in Artiﬁcial Intelligence and Deep Learning
UNIVERSITY OF ALCALÁ
©Daniel Klöck, MMXX. All rights reserved.
The author hereby grants to UAH permission to reproduce and to
distribute publicly paper and electronic copies of this thesis document
in whole or in part in any medium now known or hereafter created.
Department of Computer Science
October 18, 2020
José Ignacio Olmeda Martos
Chairman, Department Committee on Graduate Theses
I hereby conﬁrm that this thesis was written independently by myself without the
use of any sources beyond those cited, and all passages and ideas taken from other
sources are cited accordingly.
A Hand-Drawn Barcode
Submitted to the Department of Computer Science
on October 18, 2020, in partial fulﬁllment of the
requirements for the degree of
Master in Artiﬁcial Intelligence and Deep Learning
By studying how characters from diﬀerent alphabets are written, an adequate set of
substructures that can be drawn swiftly and eﬀortlessly is identiﬁed. Following, a
way to compose a hand-drawn barcode is presented, optimizing information density
to increase the amount of contained data while being easy and fast to draw. A
recognition procedure will be deﬁned and diﬀerent models for barcode detection and
substructure classiﬁcation are presented and evaluated. Possible value encoding error
sources are examined and the recognition procedure is reviewed and tested for their
accuracy. Finally, the probability of a successful recognition will be studied and
improved by choosing a suitable forward error correction method.
I am extremely grateful to my beloved wife Aleksandra Kucharczuk-Klöck for her
care and support.
My sincere thanks also goes to all members of the UAH’s Master in Artiﬁcial
Intelligence and Deep Learning course of 2019-2020 for all the help and discussions.
Especially to Ming Lei, Fabrice Aubert, Gianni Santinelli, Genís Virgili Sánchez,
Micheline Pollock, Irene van den Broek, Jesús Chávez and Brian Naranja.
Finally, I thank everyone that helped generate the hand-drawn barcode dataset.
Especially, Sandie Klöck, Maja Kucharczuk and Jorge Gangoso Klöck.
1 Introduction 13
2 Identifying Symbols and Structure 15
2.1 Deﬁning Drawing Complexity . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Simplicity by Similarity . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Simplicity by Speed . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 ExploringSymbols ............................ 16
2.3 ABarcodeProposal............................ 18
3 Detecting the Barcode and Extracting its Value 21
3.1 Detecting the Barcode and its Parts . . . . . . . . . . . . . . . . . . . 22
3.1.1 Faster R-CNN ResNet50 V1 640x640 . . . . . . . . . . . . . . 23
3.1.2 CenterNet HourGlass104 512x512 . . . . . . . . . . . . . . . . 24
3.1.3 EﬃcientDet D2 768x768 . . . . . . . . . . . . . . . . . . . . . 25
3.1.4 Evaluation............................. 26
3.2 Calculate the Rotation and Extract the Bars . . . . . . . . . . . . . . 27
3.3 ClassifyingtheBars............................ 28
4 Value Encoding and Decoding 33
4.1 BitOrder ................................. 33
4.2 ErrorSources ............................... 34
4.3 Error Detection and Correction . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Error Detection and Correction using Linear Codes . . . . . . 36
4.3.2 Evaluation of Error Correction by Exploiting Model Conﬁdence 40
5 Future Lines of Research 41
A Additional Information 43
List of Figures
2-1 Omniglot characters with their Omniglot ids sorted by median time
2-2 Median speed compared to median number of strokes of characters. . 18
2-3 All possible single substructure bars and their value while representing
a decimal number between 0 and 220 −1................. 19
2-4 A bar that contains all possible substructures. . . . . . . . . . . . . . 20
3-1 Faster R-CNN structure. Image from . . . . . . . . . . . . . . . . . 23
3-2 Example results of the barcode and parts detection model using “Faster
R-CNN ResNet50 V1 640x640” on validation images. . . . . . . . . . 24
3-3 Architecture of CenterNet. Image from . . . . . . . . . . . . . . . . 25
3-4 Architecture of EﬃcientDet. Image from . . . . . . . . . . . . . . . 26
3-5 Result examples of extracted and rotated bars. . . . . . . . . . . . . . 28
3-6 Bar with ﬂipped counterparts for data augmentation. . . . . . . . . . 28
3-7 Image of a Bar with a corrected substructure. . . . . . . . . . . . . . 30
4-1 Order of the bits in a bar. . . . . . . . . . . . . . . . . . . . . . . . . 33
List of Tables
3.1 Evaluation of experimented barcode detection models. . . . . . . . . . 27
3.2 Reached accuracy with lowest validation loss on substructure classiﬁ-
3.3 Confusion values of classiﬁed substructures using EﬃcientNetB2†. . . 31
4.1 Error rates of popular 1D-barcodes (with 95% conﬁdence) . . . . . 35
4.2 Linear code choice recommendations for diﬀerent number of bars. . . 38
Many researchers have worked on improving the recognition accuracy of mathematical
symbols , digits [6, 7], sketches , text , patterns  and other symbols [11,
12]. However, little research has been published to determine what impact diﬀerent
patterns or set of symbols could have on the recognition diﬃculty. This makes the
task of sketching the right set of symbols for later detection very complicated, more
so, if the person does not have in-depth knowledge of the system that will be used
for recognition. Currently, if hand-drawn symbols need to be recognized, there is no
guide as of which symbols to use.
In Section 2.1 the identiﬁcation of symbols with low drawing complexity will be
made possible by declaring what it means for a symbol to be easy to draw. Then,
in Section 2.2, the writing style used for characters of the Omniglot dataset  will
be examined. Subsequently, by using the deﬁnitions from Section 2.1, symbols that
are quick and easy to draw will be detected. In Section 2.3, those symbols will be
organized in a structure, creating a novel barcode that can be hand-drawn. That
structure will optimize information density and minimize hand-drawing complexity.
In Chapter 3, a recognition procedure for that kind of barcodes will be presented.
This procedure will need an object detection model that can detect the barcode and its
parts. That model will be chosen, trained and evaluated in Section 3.1. The procedure
will also require a multi-label bar classiﬁcation model, which will be selected from a
list of modern classiﬁcation models and evaluated in Section 3.3.
In Chapter 4, encoding, decoding, sources of error as well as expected accuracies
will be discussed. Section 4.3 will recommend adequate forward error correction as
well as error detection mechanisms to improve the accuracy by using redundancy bits.
Note that Appendix A contains information about how to acquire source code,
conﬁguration ﬁles and images related to this thesis.
Identifying Symbols and Structure
2.1 Deﬁning Drawing Complexity
As a ﬁrst step in the search for a hand-drawn barcode, drawing complexity was
explored to make sure that every part of the barcode can be easily and accurately
drawn by anyone. Unfortunately, to my knowledge, there is no published research
about ﬁnding or comparing the complexity of drawing symbols. To overcome this
lack of previous work, two axioms will be formulated that will make it possible to
discern symbols that are easy to draw from those that are more complex.
2.1.1 Simplicity by Similarity
Axiom 1 (Axiom of simplicity by similarity).A symbol is easier to draw, if it is
usually replicated with more accuracy.
Axiom 1 means that if the same symbol is drawn several times by one or multiple
persons, the similarity of the resulting images will be higher with an easy symbols than
with a complex symbol. To use this approach, an image similarity measure must be
selected. There are several possibilities, but the assumptions taken by the techniques
are critical and may lead to erroneous results when calculating drawing complexity.
For example, we cannot assume that we are analysing variations of the same image.
Some options of deﬁning the image similarity would be the Image Euclidian Distance
(IMED) , the Structural Similarity Index (SSIM)  or the Modiﬁed Hausdorﬀ
Distance  . Another option would be using the entropy of the image , which
may be based on histogram values . However, in this case, pixel positions may
play an important role and should not be ignored.
2.1.2 Simplicity by Speed
The second method is based on following premise:
Axiom 2 (Axiom of simplicity by speed).A symbol is easier to draw, if it is usual ly
drawn in a shorter time.
Using this method would not only give more reliable results, since no other as-
sumptions or deﬁnitions are needed, but it would also yield symbols that reduce the
time needed to draw the barcode, which would also be a desirable property. Due to
its simplicity and relation to drawing speed, Axiom 2 was used for searching for a set
of symbols that would be easy enough for everyone.
2.2 Exploring Symbols
To explore the writing speed of symbols from diﬀerent alphabets, the Omniglot
dataset  was used. Its Github page  describes its content as follows:
“It contains 1623 diﬀerent handwritten characters from 50 diﬀerent alphabets. Each
of the 1623 characters was drawn online via Amazon’s Mechanical Turk by 20 diﬀer-
ent people. Each image is paired with stroke data, a sequences of [x,y,t] coordinates
with time (t) in milliseconds.”
By sorting the characters of all alphabets by median writing speed, the symbols
that are easiest and fastest to draw could be found. Note that taking into considera-
tion Axiom 1, these will also be the simplest symbols to replicate accurately. Thus,
increasing the probability of a correct recognition.
In Figure 2-1, the 100 symbols that were drawn the fastest are shown. As expected,
the top symbols are geometric primitives i.e. dot, line (with diﬀerent rotations),
diﬀerent curves and circle. Followed by shapes that could be seen as combinations
of those primitives, such as ‘∧’ (two lines), ‘:’ (two dots), ‘!’ (line and dot) and ‘⊥’
Figure 2-1: Omniglot characters with their Omniglot ids sorted by median time spend
If we also take into account, that we tend to misperceive curvature, direction
and length due to the nature of our eye movements , curves and structures where
direction and length are important (such as would be the case when drawing a barcode
based on modules similar to code128 based symbology ) should be discarded as
By comparing the median speed to the median number of strokes of the characters
(see Figure 2-2), a clear tendency can be observed that shows that the more strokes
a symbol needs, the more time it will take to draw it.
Knowing these ﬁndings, it seems that a set of symbols that are easy and fast to
draw should consist of symbols based on circles, dots and lines with as few occurrences
of them as possible.
Further, a symbol should consist of substructures, since we can increase the num-
Figure 2-2: Median speed compared to median number of strokes of characters.
ber of objects it can represent exponentially by additional strokes instead of linearly
by ﬁnding a new symbol. For example, a symbol that may or may not contain 15
strokes can represent 215 = 32,768 diﬀerent objects, ﬁnding that amount of symbols
that are easy and fast to draw may be much more complicated.
2.3 A Barcode Proposal
I propose a hand-drawn barcode, that has as primary goal, the optimization of draw-
ing speed, accuracy and information density. The barcode will be drawn on a straight
horizontal line starting and ending with upwards facing lines. This will be enough to
understand the direction of the barcode, since it will always be read left to right. It
will contain a number of vertical lines (that will be called “bars”) that serve as the
ground structure for sketching the symbols that represent the data. Each bar is made
of maximally 10 additional lines.
Each bar may or may not contain any subset of 20 diﬀerent substructures, which
amounts to 20 bits per bar. This means that one single bar can represent 220 =
1,048,576 diﬀerent objects when no error correction nor detection is included. In
Figure 2-3 you can see these 20 possible substructures with its values when the barcode
is used to represent a decimal number between 0 and 1,048,575.
Figure 2-3: All possible single substructure bars and their value while representing a
decimal number between 0 and 220 −1.
These substructures are combined to create new values, for example Figure 2-4
shows the symbols when all substructures are present. Note how lines on opposite
sides of the bar can be drawn with a single stroke resulting in less needed geometric
primitives. Further, since substructure lines are either at one end, touching the
horizontal line or at the middle of the vertical lines, misperceived length should not
be an issue.
When more than the number of objects that a single bar can represent are needed,
more bars can be attached resulting in a barcode that can represent up to 2bars·20
diﬀerent objects. Note that a dynamic barcode that can contain between 1 and 𝑛𝑏bars
can represent up to 𝑛𝑏
𝑥=1 2𝑥·20 diﬀerent objects if you distinguish between numbers
with diﬀerent number of leading zeroes, i.e. a barcode that contains a speciﬁc value
with one bar is diﬀerent to a barcode that contains the same value with 2 bars.
Tests with grid structures (2 dimensional bar positioning) have also been made,
but discarded because the user had to draw the grid with the correct size before adding
the substructures, while with this structure, line sizes can be corrected a posteriori.
Figure 2-4: A bar that
contains all possible
Using dot substructures also was discarded due to user
feedback. It was perceived as diﬃcult to draw, since the
proper position was not understood. As  shows, we read
connected structures faster than unconnected ones, which
may suggest, that for us humans, it is harder to recognize the
relation between unconnected structures, thus rendering it
more complicated to ﬁnd the correct location for dots within
a symbol. Another disadvantage of dots, is that it is not pos-
sible to draw several of them with one stroke, which is a line
feature that this barcode exploits.
Detecting the Barcode and
Extracting its Value
To be able to train an object detection model, a dataset of hand-drawn barcodes
with suﬃcient samples had to be generated ﬁrst. To collect the data, a “Hand-drawn
Barcode User Study” webpage  was implemented, using Python , Flask ,
Skeleton  and Cloudinary  as main technologies and it was hosted on Heroku
. On this webpage, any user willing to help with the data generation would get a
random barcode displayed, which had to be copied with pen and paper.
As a result, between August 27𝑡ℎ and 31𝑠𝑡, 149 hand-drawn barcodes were uploaded
from 23 unique IPs. After removing images where the full barcode was not visible,
144 images remained, containing exactly one barcode each. The website also named
the uploaded ﬁle with the code that the depicted bars describe (e.g. a barcode that
contains the numbers 5023 and 101 in 2 bars would be named “5023_101”) to make
the classiﬁcation of the bars easier. Finally, all images were manually annotated with
bounding boxes for contained barcode start symbol, barcode end symbol, bars and
complete barcodes using “RectLabel for object detection” .
Once a barcode has been generated and hand-drawn by a user, the next task
would be to recognize it’s encoded value. Even if a one-stage architecture could
probably yield good results if more samples were available, the usage of a multi-
stage architecture was chosen because it made it possible to use sample augmentation
without distortion and to give more homogeneous input images to the classiﬁer of the
bars. The chosen architecture would need to implement following execution steps:
1. Predict the bounding box of the full barcode, the single bars, as well as the
starting and ending symbols.
2. Calculate the rotation of the barcode by analizing the relative position of the
starting and ending symbols.
3. Extract the bars sorted by distance to the starting symbols and rotate them to
have a vertical position where the starting symbol would be to the left.
4. Classify the bars with a multi-label system where each of the bar’s bits corre-
spond to one class.
5. Calculate the value represented by the barcode using the decoding function.
The sections of this chapter will describe and evaluate the models and algorithms
that execute steps 1 through 4, as well as the methods followed to ﬁnd the right
candidates. The next chapter will propose an encoding and decoding technique.
3.1 Detecting the Barcode and its Parts
As seen in the introduction of this chapter, the ﬁrst step to determine the value that
is encoded in a barcode is to detect the barcode and its parts. To make the prediction
as accurate as possible, 3 object detection models were trained and evaluated using
the annotated barcode images to predict the position of the full barcode, the start of
the barcode symbol, the end of the barcode symbol and the bars. The chosen models
were “Faster R-CNN ResNet50 V1 640x640”, “CenterNet HourGlass104 512x512”
and “EﬃcientDet D2 768x768” from the Tensorﬂow 2 Object Detection Model Zoo
. All models were pre-trained on the COCO dataset .
The conﬁgurations of these models were changed to search for 4 label classes, the
TPU capability was deactivated and the data augmentation options were set to ‘ran-
dom_rgb_to_gray’, ‘random_adjust_brightness’, ‘random_adjust_contrast’, ‘ran-
dom_adjust_hue’, ‘random_adjust_saturation’ and ‘random_distort_color’. Finally,
the ‘ﬁne_tune_checkpoint_type’ property was set to ‘detection’ (used when loading
models pre-trained on other detection tasks) for all models but for “CenterNet Hour-
Glass104 512x512”, since it needs a special setting of ‘ﬁne_tune’ (used when loading
the entire CenterNet feature extractor pre-trained on other tasks).
All models were locally trained on a GeForce GTX 1070 (Compute Capability
6.1) using the Tensorﬂow 2 Object Detection API with CUDA 10.1 for Windows 10
(cuda_10.1.243_426.00_win10), Protoc 3.13.0-win64 and CuDNN 7.6.5. The evalu-
ation was executed in parallel using the CPU.
3.1.1 Faster R-CNN ResNet50 V1 640x640
The “Faster R-CNN ResNet50 V1 640x640” model is based on the popular Faster
R-CNN  architecture, which is an object detection system composed of a deep fully
convolutional network that proposes regions and a Fast R-CNN detector , which
Figure 3-1: Faster R-
CNN structure. Im-
age from .
in turn is based on R-CNN . The Fast R-CNN architec-
ture produces a convolutional feature map by processing the
input image with convolutional and max-pooling layers. The
region proposal network uses the feature maps to generate
an output of rectangular object proposals with an objectness
score. For each region proposal, a feature vector is extracted
from each feature map by a region of interest pooling layer.
These feature vectors are then passed to fully connected lay-
ers to produce two outputs, the position of the bounding box
and the softmax probability that the region of interest con-
tains one of the target classes. This concrete implementation
of the model uses an input image with a resolution of 640 by
640 RGB pixels. As the backbone, a 50 layer deep residual network  has been used
as opposed to the original ZF-NET  and VGG-NET  (which used to be pre-
trained on ImageNet ) . The advantage of ResNet over VGG is that it is bigger,
which means that it has more capacity to learn what is needed. Further, ResNet uses
residual connections and batch normalization, which was not invented when VGG
was ﬁrst released. According to the Tensorﬂow documentation, this model reached a
COCO mAP1of 29.3 and a mean time of running inference of 53 milliseconds.
Figure 3-2: Example results of the barcode and parts detection model using “Faster
R-CNN ResNet50 V1 640x640” on validation images.
3.1.2 CenterNet HourGlass104 512x512
One property that distinguishes Faster R-CNN from CenterNet  is that the latter
is a one-stage object detector. Another diﬀerence is that it detects each object as a
1mean Average Precision, an accuracy metric for object detectors.
triplet, instead of a pair, using a keypoint estimator to ﬁnd center points and regress to
all other object properties. The center point based approach, is said to be end-to-end
diﬀerentiable, simpler, faster, and more accurate than corresponding bounding box
based detectors . The idea behind this model, is that if a predicted bounding box
would have a high IoU2with the ground truth, it would also be highly probable for
the center point to be in the center region of the bounding box, and vice versa, which
enables a more eﬃcient way of searching for objects by using their center points.
As seen in Figure 3-3, the main part of this architecture consists of two modules
named cascade corner pooling and center pooling, which play the roles of enriching
information collected by the top-left and bottom-right corners and providing more
recognizable information at the central regions.
Figure 3-3: Architecture of CenterNet. Image from .
As the backbone, HourGlass-104 was used, which yielded the best keypoint es-
timation performance in the evaluation done in . This speciﬁc implementation
uses an input image with a resolution of 512 by 512 RGB pixels. According to the
Tensorﬂow documentation, this model reached a COCO mAP of 41.9 and a mean
time of running inference of 70 milliseconds.
3.1.3 EﬃcientDet D2 768x768
EﬃcientDet D2 employs an EﬃcientNet  B2 network as the backbone, a Weighted
Bi-directional Feature Pyramid Network (also known as BiFPN) with 112 channels
2Intersection over Union, an evaluation metric used to measure the accuracy of an object detector,
calculated as the area of overlap divided by the area of union of the predicted and the ground truth
and 5 layers, 3 box/class layers and an expected input size of 768 by 768 RGB
pixels. EﬃcientDet’s BiFPN incorporates multi-level feature fusion allowing data to
ﬂow in both directions, top-down and bottom-up, while using regular and eﬃcient
connections. EﬃcientNet is a Convolutional Network, developed by Google’s Brain
team, that seeks to optimize downstream performance given free range over depth,
width and resolution while staying within the constraints of target memory and target
According to the Tensorﬂow documentation, this model reached a COCO mAP
of 41.8 and a mean time of running inference of 67 milliseconds.
Figure 3-4: Architecture of EﬃcientDet. Image from .
After training the models until a clear overﬁtting pattern emerged, since it should
generalize better than any other, the step with the lowest total loss on the evaluation
dataset was taken. In Table 3.1 it can be seen that the Faster R-CNN model could
reach the best mean Average Precission for all explored IoU thresholds on that step.
For the ﬁrst version of a barcode detector, accuracy seemed more important than
speed and thus, speed was ignored as long as it was within the limit of usable software,
which was the case for all of these models.
Model Total Loss mAP@.50:.95IoU mAP@.50IoU mAP@.75IoU
Faster R-CNN 0.5845 0.68 0.98 0.7619
CenterNet 1.714 0.573 0.9504 0.6086
EﬃcientDet 0.3475 0.631 0.9681 0.7066
Table 3.1: Evaluation of experimented barcode detection models.
Note that the total losses cannot be compared between models, since the loss
functions are diﬀerent from each other and thus, have diﬀerent meanings.
Some models that were part of the initial proposal of this thesis did not become
candidates for diﬀerent reasons, e.g. SSD models use an aspect-ratio which would
probably not work very well with barcodes that have a dynamic width but a rather
static height and YOLO trades accuracy for speed.
3.2 Calculate the Rotation and Extract the Bars
After the model detects the barcode and its parts, the angle of the barcode is calcu-
𝑎𝑛𝑔𝑙𝑒 =𝑎𝑡𝑎𝑛2(𝑦−𝑦′, 𝑥 −𝑥′)
𝑦= the y coordinate of the center of the ending symbol’s bounding box.
𝑦′= the y coordinate of the center of the starting symbol’s bounding box.
𝑥= the x coordinate of the center of the ending symbol’s bounding box.
𝑥′= the x coordinate of the center of the starting symbol’s bounding box.
Or in other words, the angle of the barcode is calculated as the angle between
the starting and ending symbol center points. Note that an angle of 0 corresponds
to a perfectly aligned image where no rotation is needed, which means that the start
symbol can be found to the left of the barcode and the ending symbol to the right at
the same height.
Then the bars are extracted, meaning that new images are created for bounding
Figure 3-5: Result
examples of extracted
and rotated bars.
box areas that corresponded to bars of the input image. Fi-
nally, the extracted images are rotated by the barcode’s angle
and ordered by distance to the starting symbol. This will en-
sure that the classiﬁcation model receives homogeneous im-
ages, namely bars with the same rotation and only the part of
the image that is needed for the classiﬁcation, which should
increase the accuracy. Additionally, the images can be ﬂipped
vertically, horizontally and both to produce new bar images.
These new images would contain diﬀerent bits than the original image and can be
used for training or evaluation. However, the original and the augmented images were
kept together either in the training or in the validation dataset to reduce correlation
between these datasets.
Figure 3-6: Bar with ﬂipped counterparts for data augmentation.
3.3 Classifying the Bars
Once the bars are ordered and rotated, they can be classiﬁed to ﬁnd which bits are
active. The basic idea is to give each substructure a class label and use a classiﬁer to
predict which of those classes are represented in the bar image. This means that the
classiﬁer model has to solve a multi-label classiﬁcation problem.
Model Training Accuracy Validation Accuracy
VGG16 0.9758 0.8683
VGG16* 0.9718 0.9420
VGG19 0.9663 0.8555
VGG19* 0.9719 0.9322
EﬃcientNetB1 0.9264 0.8842
EﬃcientNetB1* 0.9759 0.9528
EﬃcientNetB1** 0.9637 0.9255
EﬃcientNetB2 0.9281 0.8703
EﬃcientNetB2* 0.9771 0.9463
EﬃcientNetB3 0.9000 0.8430
ResNet50 0.9224 0.8050
ResNet50* 0.9798 0.9345
ResNet101 0.8606 0.8045
ResNet101* 0.9041 0.8805
DenseNet121 0.7657 0.7115
* Removing the last block of layers
** Removing the last 2 blocks of layers
†Training the last block of layers
Table 3.2: Reached accuracy with lowest validation loss on substructure classiﬁcation
For experimentation, multiple classiﬁer models based on the pre-trained models
from tensorﬂow’s keras section, speciﬁcally keras.applications were used. In all
cases, the input shape has been changed to 450 by 100 RGB to make it more similar
to the output shape of the detected bars. Addditionally, the top3has been dropped
since a new output format is needed. To make up for the removed top a ﬂattening
3The top refers to the ﬂattening and fully connected layers stacked on top of the models.
layer and a fully connected layer with a sigmoid activation function ending in 20
output nodes (one for each substructure in a bar) have been appended to the top.
Some of the experiments only trained the top dense layer of the model. Others,
additionally allowed training the last block of layers of the pre-trained model to train
the bigger features for barcodes. A third kind of experiments removed the last block
of layers altogether and appended the new top to the previous block of layers to make
sure that the dense network was not inﬂuenced by the bigger features of the image
dataset used to pre-train the network, since they probably would be very diﬀerent.
All models were compiled using binary cross-entropy as loss function and Adam 
From the initial dataset of 144 images, 495 bars were automatically extracted,
augmented to 1980 bars4and labelled with their corresponding value of active bits.
Those images were split into 1780 training images and 200 validation images. Ad-
ditionally, the training images have been augmented with an ImageDataGenerator
allowing to add a subset of following transformations: channel shift, brightness shift,
shear angle, rotation, zoom.
Figure 3-7: Image of
a Bar with a corrected
Table 3.2 shows the reached accuracy on the training and
validation images in the step with the lowest loss on the val-
idation data, since it should generalize best. As shown, the
pre-trained EfficientNetB2 model allowing to train the last
block of layers had the best accuracy on the validation data,
reaching an accuracy of 97.23%.
Table 3.3 shows the confusion values of the single sub-
structures as well as the summed results over the 20 sub-
structures for the 200 validation images. It can be seen that all bits have similar
confusion percentages and are behaving as expected.
Further, it could be observed that the model adapted to unforeseen user behaviour.
For example, the model learned how users ﬁx incorrectly drawn lines, such as the one
a user corrected in the bar seen in Figure 3-7.
4Using the technique described in Section 3.2
True Positive False Positive True Negative False Negative
Bit 1 117 4 79 0
Bit 2 112 6 77 2
Bit 3 101 6 91 2
Bit 4 101 3 94 2
Bit 5 89 7 104 0
Bit 6 87 4 107 2
Bit 7 91 1 102 6
Bit 8 95 3 100 2
Bit 9 100 3 96 1
Bit 10 99 7 92 2
Bit 11 98 3 96 3
Bit 12 101 5 94 0
Bit 13 94 0 103 3
Bit 14 95 3 100 2
Bit 15 86 3 108 3
Bit 16 88 3 108 1
Bit 17 102 2 95 1
Bit 18 101 2 95 2
Bit 19 115 3 80 2
Bit 20 114 1 82 3
Summed 1986 69 1903 42
Table 3.3: Confusion values of classiﬁed substructures using EﬃcientNetB2†.
Value Encoding and Decoding
As mentioned in earlier chapters, each substructure of a bar represents a bit. If the
substructure is represented in the bar, the bit becomes 1, if not, it is a 0.
Due to the low data density compared to digital barcodes and the variable clas-
siﬁcation error, I suggest to develop customized encoding and decoding mechanisms
depending on the application and quality of the models.
In this chapter, some possible encoding, decoding and error correction techniques
will be described.
4.1 Bit Order
One deﬁnition for the bit order of the substructures can be seen in Figure 4-1. The
Figure 4-1: Order of the bits
in a bar.
number next to the substructure represents the index
of the bit (using little-endian format). Fore example, if
only substructure 19 is active, the bar will represent the
binary number 0b1000 0000 0000 0000 0000. If a second
bar would exist to the right, the ﬁrst bar’s value would
be shifted left 20 places. For example, if a second bar
would be added to the right of the bar from the previous
example and it would have all of its bits active, the new
value would be 0b1000 0000 0000 0000 0000 1111 1111
1111 1111 1111. This mechanism allows us to convert
barcodes into noisy bit streams with blocks of 20 bits. It is noisy because of possible
erroneous substructure classiﬁcations.
4.2 Error Sources
Since the barcode can be transformed into a noisy bit stream with blocks of 20 bits,
typical error detection and correction techniques can be used. However, we do have
some additional knowledge about the channel, that may help us make better decisions
about where the error may have originated and how to correct or detect it. Not only
using adequate coding, but also by implementing an appropriate usability ﬂow. We
can also calculate the needed error detection and correction capabilities by studying
the error sources.
There are 3 sources of errors: the ﬁrst being a possible human error while drawing
the barcode, the second a wrong detection of the barcode parts with the object
detection model (for example by missing a bar) and the third a wrong classiﬁcation
of a substructure.
To solve the problem of human error I would suggest adding a validation feature,
that is, a detection the user will try on it’s own to make sure the barcode was drawn
correctly whenever possible. However, take into account that in some situations, this
may not be possible (e.g. the user may not have a camera). Therefore, I would
recommend always adding error correction or detection capabilities based on the
application’s need for accuracy. Secton 4.3 will give more information on how to
construct such a mechanism.
The second possible source of error, the barcode detection model, could only be
trained with very few samples (144 barcodes) in this ﬁrst version. This led to an error
rate of about 2%, even when choosing a threshold of 0.5IoU (as seen in Subsection
3.1.4). This error rate is much higher compared to typical digital barcodes, which
reach accuracies of 1 error in 394 thousand, even in the worst case scenario of the
simplest barcode types (See Table 4.1). However, by choosing the right angle and
position of the camera, the rate can probably be improved. Therefore, I would suggest,
at least until more training data is available, to use a ﬁxed number of bars and only
detect the barcode when all needed pieces have been recognized.
Barcode Type Worst Case Accuracy Best Case Accuracy
Code 128 1 error in 2.8 million 1 error in 37 million
Code 39 1 error in 2.5 million 1 error in 34 million
UPC or more 1 error in 394 thousand 1 error in 800 thousand
Table 4.1: Error rates of popular 1D-barcodes (with 95% conﬁdence) .
The last source of error is a wrong classiﬁcation of a substructure. The substruc-
ture classiﬁcation model yielded a probability of correct classiﬁcation of 0.9723, which
means that the probability of error is 0.0277. This implies that the probability of at
least one error occurring in a bar is approximately 42.98%1. This number is way too
high to be in the range of useful barcodes. Therefore, using a coding mechanism to
correct and detect errors is crucial.
4.3 Error Detection and Correction
Improving angle and position of the camera and having a ﬁxed number of bars can
help reducing the errors originated by the object detection model. The human error
and the wrong bar substructure classiﬁcation can be overcome with selected minimal
accuracy by using Forward Error Detection techniques. Two of them that will be
explored are error correction through linear codes with minimal Hamming distance
and error correction by ﬂipping the least conﬁdence bits.
The basic idea of error correcting codes, is to reduce the number of accepted code
words, maximizing the distance between words. A standard distance between words
is the Hamming distance , which in the case of binary words is deﬁned as the
number of bits that have to be ﬂipped to get from one word to another. For example,
1The complimentary of the probability of no errors occurring in 20 bits = 1 −0.972320 ≈0.4298
you could get from word 001 to 100 by ﬂipping 2 bits (the ﬁrst and the last). Thus,
it has a Hamming distance of 2.
A property of a set of allowed words with a minimal Hamming Distance 𝑑𝐻(that
is, there aren’t 2 words in the set with a Hamming distance inferior to 𝑑𝐻) is that
𝑑𝐻−1errors can be detected and ⌊𝑑𝐻−1
For example, if we only allow the recognition of the words 001 and 100, since the
minimal Hamming distance is 2, we will be able to detect if an error of 2−1=1bit
ﬂip happened and we would be able to ﬁx ⌊2−1
2⌋= 0 errors. Imagine that the word
101 would be recognized instead, we would then be able to detect an error since it isn’t
in the set of allowed words, but we could not be able to ﬁx it, since the probability of
the intended word being 001 and 100 would be the same, since the Hamming distance
to them would be the same. However, if we change the allowed words to 000 and 111,
the Hamming distance would increase to 3 and we would be able to detect 3−1=2
errors and even ﬁx ⌊3−1
2⌋= 1 error. If we now would detect the word 101, we would
know there was an error since the word is not part of our set of allowed words and
we could ﬁx it by transforming it to the word with the lowest Hamming distance to
the received word, i.e 111.
4.3.1 Error Detection and Correction using Linear Codes
If such a set of allowed words is generated by multiplication of the data bits with
a generative matrix, we say that we are using a linear code. Mathematically, lin-
ear codes can be constructed as a subspace of a vector space with any number of
elements. When generating linear codes for a binary system, usually Galois Fields
of 2 elements are used. These ﬁelds, usually written as GF(2), have the properties
of closure, commutativity, associativity, identity, inverse and distributivity  and
deﬁne the sum as the logical XOR operation and the multiplication as the logical
AND operation, which makes the implementation on hardware eﬃcient .
The allowed word set can be constructed by calculating:
𝐺is the generative matrix, ⃗𝑥 the data words and ⃗
𝑏the generated allowed
code words . Note that ﬁnding an optimal ¯
𝐺for a given code length and data
length is not always trivial, but there are published collections of best known matrices
for given lengths (such as ). Also,  shows the maximum Hamming distance
that can be achieved for given code and data lengths.
Table 4.2 shows recommendations for diﬀerent number of bars and expected ac-
curacy. The probability of wrong correction has been based on the probability mass
function and calculated as:
𝑥·𝑝𝑥·(1 −𝑝)𝑛−𝑥, 𝑝 = 0.0277 (4.2)
where 𝑑𝐻is the Hamming distance, 𝑛the amount of bits (20 ·number of bars)
and 𝑝the probability of incorrect classiﬁcation of a substructure. The probability of
an undetected error has been calculated with the formula:
𝑥·𝑝𝑥·(1 −𝑝)𝑛−𝑥, 𝑝 = 0.0277 (4.3)
Note that a set of words may have diﬀerent Hamming distances depending on
the region of the vector subspace where the decoded value was placed and thus have
diﬀerent probability of correct error ﬁxing or detection. Therefore, some of the items
of Table 4.2 show a range of Hamming distances and probabilities instead of single
Obviously, increases of accuracy of correct decoding such as from the initial ap-
proximate 57% to 99.8% for one bar or 3.43% to 99.95% for 6 bars comes at a price.
Instead of using all bits for data transfer, we now have to use some of them as re-
dundancy bits to increase the Hamming distance and be able to correct or detect
errors. The rightmost column of Table 4.2 shows how many bits we have left for
data transfer. For example, if you need 6 bars for your application and want to use
error correction to achieve a maximal decoding error of 1 out of 2000 you would have
57 bits for data transfer (you would use the row with the Hamming distance 21-28,
since the wrong correction probability is at least 5.753 ·10−4). This would reduce the
objects you can distinguish from 2120 to 257.
Even if an error rate of 1 out of 2000 cannot compare to the error rates digital
barcodes can achieve, it still surpasses human data entry operators accuracy, which is
about 1 error each 300 keystrokes  (note that the value of a barcode would usually
need several keystrokes).
Number Hamming Wrong Correction Undetected Error Data
of Bars Distance Probability Probability Bits
51.703 ·10−21.784 ·10−411
61.703 ·10−21.252 ·10−510
7 - 8 2.436 ·10−21.044 ·10−4−1.205 ·10−524
82.436 ·10−21.205 ·10−523
9 - 10 4.786 ·10−31.204 ·10−6−1.051 ·10−720
9 - 11 2.53 ·10−2−6.297 ·10−33.952 ·10−5−7.202 ·10−724
10 - 12 2.53 ·10−2−6.297 ·10−35.642 ·10−6−8.276 ·10−823
11 - 12 6.297 ·10−37.201 ·10−7−8.276 ·10−820
11 - 15 2.38 ·10−2−1.707 ·10−31.324 ·10−5−5.236 ·10−947
13 - 18 6.821 ·10−3−3.775 ·10−43.134 ·10−7−6.318 ·10−12 42
16 - 19 1.71 ·10−3−7.453 ·10−56.004 ·10−10 −5.801 ·10−13 40
17 - 21 1.707 ·10−3−1.324 ·10−56.386 ·10−11 −3.331 ·10−16 36
517 - 22 1.892 ·10−3−1.079 ·10−42.471 ·10−9−4.741 ·10−14 51
19 - 24 4.753 ·10−4−2.232 ·10−53.921 ·10−11 −4.299 ·10−16 49
619 - 25 1.939 ·10−3−3.116 ·10−51.018 ·10−9−3.725 ·10−15 64
21 - 28 5.348 ·10−4−6.653 ·10−61.951 ·10−11 −3.585 ·10−18 57
721 - 30 1.894 ·10−3−9.062 ·10−63.862 ·10−10 −3.041 ·10−18 72
23 - 32 5.613 ·10−4−1.981 ·10−68.543 ·10−12 −2.958 ·10−20 69
824 - 36 1.792 ·10−3−5.903 ·10−72.239 ·10−11 −2.426 ·10−22 80
25 - 38 5.623 ·10−4−1.239 ·10−73.436 ·10−12 −2.122 ·10−24 75
Table 4.2: Linear code choice recommendations for diﬀerent number of bars.
The standard form of the generative matrix is:
where 𝐼𝑘is the identity matrix of size 𝑘×𝑘and ¯
𝑃is of size 𝑘×(𝑛−𝑘), with 𝑘
being the length of a data word and 𝑛the length of an encoded word. Equation 4.4
can be used to calculate a so called parity check matrix ¯
𝐻, which fulﬁls
𝐻𝑇= 0 (4.5)
for all ⃗
𝑏𝑇produced with Equation 4.1 and is diﬀerent to 0 for all other words.
This also implies that ¯
𝐻𝑇= 0, which means that ¯
𝐻must be of the form
𝐻= ( ¯
Note that the formula actually should be ¯
𝐼𝑛−𝑘), but thanks to the
logical XOR properties, the minus sign can be ignored. Once we have ¯
𝐻, a recognized
𝑏*can be tested for errors with:
) = [(𝑏*
2, ..., 𝑏*
𝑘+2, ..., 𝑏*
The vector ⃗𝑠 is called a syndrome and is zero if a correct word was recognized
(due to Equation 4.5). Vector ⃗
𝑏*is composed of the word that should be recognized
𝑏summing an error vector ⃗𝑒 (which is ⃗
0if the recognition was successful):
Putting everything together, we get that
⃗𝑠 𝑇= (⃗
which can be used to ﬁnd the source of error and correct it [45, 49]. One simple
method would be to create a table with all possible errors and their syndromes and
check it when an erroneous word has been received. Using Equation 4.9 and knowing
the error that generates such a syndrom, all erroneous bits on the recognized word
can just be ﬂipped to get the intended word.
4.3.2 Evaluation of Error Correction by Exploiting Model
In some cases, it might be suitable to use the model’s conﬁdence to try to correct
errors. The idea would be to use an error detection technique, for example a linear
code to detect the amount of errors that have occurred and then ﬂip that many
bits, selecting the ones where the substructure classiﬁcation model has the lowest
This technique could drastically increase the amount of data bits that can be
used. Note that a Hamming distance of 𝑛+ 1 would be needed to ﬁx 𝑛errors instead
of 2·𝑛+ 1. However, using the validation dataset, I have found that if one error
occurs, the probability of ﬁxing it correctly is only 61%. If a second error were
to occur, the probability of ﬁxing them would decrease even further, reaching an
approximate probability of successful error correction of 5.5%. Therefore, with the
current substructure classiﬁcation model, this technique is not advised.
Future Lines of Research
In this thesis a novel hand-drawn barcode has been presented. To develop the pro-
posed structure, the Omniglot dataset was explored using the simplicity by speed
axiom. The outcome of the exploration together with numerical and perceptual rea-
sons led to create a barcode subdivided into bars of 20 substructures each. Further
research or user studies about human drawing capabilities could help to create an
improved hand-drawn barcodes structure with higher data density, for example by
creating alternative substructures, thus changing the base of the represented num-
ber. Another possibility would be to build a model that predicts the complexity of a
barcode and use it to reﬁne the current barcode structure.
In Chapter 3, a procedure to detect and classify the barcode’s substructures has
been demonstrated and evaluated. The training of the object detection and classiﬁ-
cation models has been limited to a relatively small number of samples. Once more
training samples have been collected, an improvement of the precision of the models
is to be expected. This would lead to a reduction of the necessary redundant bits
which have been recommended in Chapter 4.
The presented hand-drawn barcode, reading mechanism and coding techniques
should not be seen as a ﬁnal version, but as a ﬁrst step with great improvement
A repository that contains images, source code and conﬁguration ﬁles related to this
thesis can be found at:
 Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards
real-time object detection with region proposal networks, 2015.
 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and
Qi Tian. Centernet: Keypoint triplets for object detection, 2019.
 Mingxing Tan, Ruoming Pang, and Quoc V. Le. Eﬃcientdet: Scalable and
eﬃcient object detection, 2019.
 Fritz J. and Dolores H. Russ. Executive summary: Code 16k and code
49 data integrity test. https://www.idautomation.com/Assets/pdf-links/
OSU-Data-Integrity-Linear.pdf (Accessed 2020-09-26).
 Hongyu Wang and Guangcun Shan. Recognizing handwritten mathematical
expressions as latex sequences using a multiscale robust neural network, 2020.
 Adam Byerly, Tatiana Kalganova, and Ian Dear. A branching and merging
convolutional network with homogeneous ﬁlter capsules, 2020.
 Abdul Mueed Haﬁz and Ghulam Mohiuddin Bhat. Reinforcement learning based
handwritten digit recognition with two-state q-learning, 2020.
 Peng Xu, Yongye Huang, Tongtong Yuan, Tao Xiang, Timothy M. Hospedales,
Yi-Zhe Song, and Liang Wang. On learning semantic representations for million-
scale free-hand sketches, 2020.
 Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. Bertweet: A pre-trained
language model for english tweets, 2020.
 Ikuro Sato, Hiroki Nishimura, and Kensuke Yokoi. Apac: Augmented pattern
classiﬁcation with neural networks, 2015.
 Alireza Rezvanifar, Melissa Cote, and Alexandra Branzan Albu. Symbol spotting
on digital architectural ﬂoor plans using a deep learning-based framework, 2020.
 William Adorno III, Angela Yi, Marcel Durieux, and Donald Brown. Hand-drawn
symbol recognition of surgical ﬂowsheet graphs with deep image segmentation,
 Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-
level concept learning through probabilistic program induction. Science,
 Liwei Wang, Yan Zhang, and Jufu Feng. On the euclidean distance of images.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1334–
 Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality
assessment: from error visibility to structural similarity. IEEE Transactions on
Image Processing, 13(4):600–612, 2004.
 M. . Dubuisson and A. K. Jain. A modiﬁed hausdorﬀ distance for object match-
ing. In Proceedings of 12th International Conference on Pattern Recognition,
volume 1, pages 566–568 vol.1, 1994.
 C. E. Shannon. A mathematical theory of communication. The Bell System
Technical Journal, 27(3):379–423, 1948.
 Mohammed Aljanabi, Zahir Hussain, and Songfeng Lu. An entropy-histogram
approach for image similarity and face recognition. Mathematical Problems in
Engineering, 2018:1–18, 07 2018.
 Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Omniglot
data set for one-shot learning. https://github.com/brendenlake/omniglot
 Veijo Virsu. Tendencies to eye movement, and misperception of curvature, di-
rection, and length. Perception and Psychophysics, 9:65–72, 01 1971.
 Barcode Island. Code 128 symbology. http://www.barcodeisland.com/
code128.phtml (Accessed 2020-08-02).
 Deia Ganayim. Visual processing of connected and unconnected letters and words
in arabic. Cognitive Linguistic Studies, 2:205–238, 01 2015.
 Daniel Klöck. Hand-drawn barcode user study. https://
barcode-dataset-generator.herokuapp.com/ (Accessed 2020-09-18).
 Python.org. https://www.python.org/ (Accessed 2020-09-18).
 Flask: A lightweight wsgi web application framework. https://
palletsprojects.com/p/flask/ (Accessed 2020-09-18).
 Skeleton: Responsive css boilerplate. http://getskeleton.com/ (Accessed
 Cloudinary: Image and video upload, storage, optimization and cdn. https:
//cloudinary.com/ (Accessed 2020-09-18).
 Heroku: Cloud application platform. https://www.heroku.com/ (Accessed
 Rectlabel for object detection. https://rectlabel.com (Accessed 2020-09-18).
 Tensorﬂow 2 detection model zoo. https://github.com/tensorflow/models/
 Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B.
Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’a r, and
C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR,
 Ross Girshick. Fast r-cnn, 2015.
 Ross Girshick, Jeﬀ Donahue, Trevor Darrell, and Jitendra Malik. Rich feature
hierarchies for accurate object detection and semantic segmentation, 2013.
 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learn-
ing for image recognition, 2015.
 Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional
 Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition, 2014.
 Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet:
A large-scale hierarchical image database. In 2009 IEEE conference on computer
vision and pattern recognition, pages 248–255. Ieee, 2009.
 Faster r-cnn: Down the rabbit hole of modern ob-
ject detection. https://tryolabs.com/blog/2018/01/18/
 Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. Objects as points, 2019.
 Mingxing Tan and Quoc V. Le. Eﬃcientnet: Rethinking model scaling for con-
volutional neural networks, 2019.
 A thorough breakdown of eﬃcientdet for ob-
ject detection. https://towardsdatascience.com/
 Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-
 R. W. Hamming. Error detecting and error correcting codes. The Bel l System
Technical Journal, 29(2):147–160, 1950.
 Suayb S. Arslan. Finite ﬁelds and linear codes. http://www.suaybarslan.com/
classnotes2.pdf (Accessed 2020-10-12).
 Prof. Dr.-Ing. Gerald Oberschmidt. Grundlagen der Übertragungstechnik,
kapitel 5: Datensicherung und kodierung. http://dualplus.de/ueb_19/
uebertragung.pdf (Accessed 2020-10-12).
 Markus Grassl. Searching for linear codes with large minimum distance. In Wieb
Bosma and John Cannon, editors, Discovering Mathematics with Magma — Re-
ducing the Abstract to the Concrete, volume 19 of Algorithms and Computation
in Mathematics, pages 287–313. Springer, Heidelberg, 2006.
 Markus Grassl. Bounds on the minimum distance of linear codes and quantum
codes. Online available at http://www.codetables.de, 2007. Accessed on 2020-
 Barcode reading and accuracy. https://www.labce.com/spg650115_barcode_
reading_and_accuracy. (Accessed 2020-10-12).
 Richard E. Blahut. Linear Block Codes, page 49–66. Cambridge University Press,